Skip to main content

Scenario: Design a Scalable URL Shortener

Question: 

Imagine you are tasked with designing a system similar to Bitly that converts long URLs into short ones. The system should handle billions of URLs and millions of requests per second. Please explain how you would design this system.

Requirements:

function requirements: shortener, redirection, expiry of url 

non-functional: high availability, fault tolerant (like AP from CAP)

Step 1: High-Level Design

At a high level, the system will have the following components:

  1. Frontend Service: Handles user requests for shortening, redirection, and URL expiry.

  2. Backend Service: Processes requests, generates short URLs, manages expiration policies, and stores mappings.

  3. Database: Stores the short-to-long URL mappings.

  4. Cache: Speeds up redirection for frequently accessed URLs.

  5. Load Balancer: Distributes incoming traffic evenly across backend servers to handle high availability and fault tolerance.

Step 2: Capacity Planning

Now, let's expand on capacity planning for this system:

  1. Storage:

    • Assume 1 billion URLs in the system.

    • Average size for a record (short URL + long URL + metadata like expiry date) = 150 bytes.

    • Total storage required: 1 billion x 150 bytes = ~150 GB.

    • With 3x replication for fault tolerance, total storage: ~450 GB.

  2. Traffic:

    • Peak traffic: 10,000 requests/sec for redirection.

    • Each server can handle 1,000 requests/sec, so you'll need 10 servers at peak load.

    • Cache hit ratio: Assume 80% of requests hit the cache (Redis).

    • Only 20% of requests (2,000/sec) hit the database.

  3. Cache Size:

    • Frequently accessed URLs (~10% of all URLs): 100 million URLs.

    • Average size of a cached record: 150 bytes.

    • Total cache size: ~15 GB (enough for Redis to handle).

  4. Bandwidth:

    • Each redirection involves ~500 bytes of data transfer (request + response).

    • For 10,000 requests/sec: 500 bytes x 10,000 = ~5 MB/sec bandwidth requirement.

Step 3: Detailed Design

  1. Frontend:

    • Simple UI/API for creating short URLs and redirecting to original URLs.

    • API design:

      • POST /shorten: Accepts a long URL and returns a short URL.

      • GET /redirect/<short-url>: Redirects to the original URL.

  2. Backend:

    • URL Shortening:

      • Generate unique short URLs using Base62 encoding or a random hash.

      • Ensure collision resistance by checking the database for duplicates.

    • URL Redirection:

      • Lookup the long URL in the cache first. If not found, fetch it from the database.

    • Expiry Management:

      • Use a background job to periodically clean expired URLs from the database.

  3. Database:

    • Use a NoSQL database like Cassandra or DynamoDB for scalability.

    • Key-Value schema:

      • Key: Short URL.

      • Value: Original URL + metadata (creation time, expiry time).

    • Partitioning: Shard data based on the hash of the short URL.

  4. Cache:

    • Use Redis for caching frequently accessed URLs.

    • Implement TTL (Time-to-Live) to automatically remove expired cache entries.

  5. Load Balancer:

    • Use a load balancer (e.g., Nginx or AWS ELB) to distribute traffic across backend servers.

Step 4: Handling Edge Cases

  • Hash Collisions:

    • Handle collisions by appending random characters to the short URL.

  • Expired URLs:

    • Redirect users to an error page if the URL has expired.

  • Invalid URLs:

    • Validate input URLs before storing them.

  • High Traffic Spikes:

    • Scale horizontally by adding more backend and cache servers.

Step 5: CAP Theorem and Non-Functional Requirements

  • Consistency (C) is sacrificed since we prefer availability (A) and partition tolerance (P).

  • In case of database partitioning, short URLs may not immediately replicate globally but redirections will still work.

Final Diagram

Here’s a simple architecture for your system:

Client -> Load Balancer -> Backend Service -> Cache (Redis) -> Database
                   -> URL Expiry Job -> Clean Expired URLs in Database

This design ensures scalability, fault tolerance, and high availability. Feel free to dive deeper into any component or ask about the trade-offs in the design!


Trade-Off Example: NoSQL vs. Relational Database

Context:

  • In the design, we opted for a NoSQL database (e.g., Cassandra or DynamoDB) instead of a relational database like PostgreSQL or MySQL.

Why We Chose NoSQL:

  • Scalability: NoSQL databases are horizontally scalable. They can handle billions of records and handle massive traffic by distributing data across multiple servers.

  • Write Performance: URL Shorteners primarily involve write-heavy operations (e.g., inserting short-to-long URL mappings). NoSQL databases are optimized for high-throughput writes.

Trade-Off:

  1. Consistency vs. Scalability (CAP Theorem):

    • By using a NoSQL database, we prioritize availability (A) and partition tolerance (P) but sacrifice strong consistency (C). This means:

      • Short URL redirections may not immediately reflect updates if replicas are still syncing, but the system stays highly available.

      • For example, in a rare case of a database partition, a newly shortened URL might fail temporarily for a subset of users.

  2. Flexible Queries:

    • NoSQL databases are optimized for key-value lookups (e.g., finding the long URL from a short one).

    • However, if the system later needs advanced queries (e.g., analytics: "show all URLs created in the last 7 days"), a relational database might be better suited.

    • This trade-off means we prioritize simplicity and performance for the current use case while limiting flexibility for future feature expansions.Trade-Off Example: Cache vs. Database

      Context:

      • In the design, we opted for a Redis cache to store frequently accessed URLs and reduce latency.

      Why We Chose Redis Cache:

      • Speed: Redis operates in-memory, enabling near-instantaneous lookups compared to database queries.

      • Load Reduction: Redirect requests are served from the cache, offloading pressure from the database.

      • TTL: Redis supports Time-to-Live (TTL), allowing expired URLs to be removed automatically without database intervention.

      Trade-Off:

      1. Cache Hit vs. Miss:

        • Hit: When the short URL is found in the cache, the lookup is fast.

        • Miss: If the URL is not in the cache, the system falls back to querying the database, which is slower.

        • Example: If the cache hit ratio drops to 50% due to infrequently accessed URLs, latency increases, and the database may face higher load.

      2. Memory Usage vs. Scalability:

        • Redis stores all data in memory, which is expensive compared to disk storage.

        • Example: If we want to cache 1 billion URLs (about 150 GB), the cost of high-memory servers for Redis becomes a concern.

        • Trade-off: We limit caching to the most frequently accessed URLs (~10% of all URLs).

      3. Consistency vs. Performance:

        • If updates are made directly to the database (e.g., URL expiry or analytics tracking), the cache may hold stale data temporarily until refreshed.

        • Trade-off: Sacrifice real-time consistency to prioritize performance for redirection requests.

Trade-Off Example: Cache vs. Database

Context:

  • In the design, we opted for a Redis cache to store frequently accessed URLs and reduce latency.

Why We Chose Redis Cache:

  • Speed: Redis operates in-memory, enabling near-instantaneous lookups compared to database queries.

  • Load Reduction: Redirect requests are served from the cache, offloading pressure from the database.

  • TTL: Redis supports Time-to-Live (TTL), allowing expired URLs to be removed automatically without database intervention.

Trade-Off:

  1. Cache Hit vs. Miss:

    • Hit: When the short URL is found in the cache, the lookup is fast.

    • Miss: If the URL is not in the cache, the system falls back to querying the database, which is slower.

    • Example: If the cache hit ratio drops to 50% due to infrequently accessed URLs, latency increases, and the database may face higher load.

  2. Memory Usage vs. Scalability:

    • Redis stores all data in memory, which is expensive compared to disk storage.

    • Example: If we want to cache 1 billion URLs (about 150 GB), the cost of high-memory servers for Redis becomes a concern.

    • Trade-off: We limit caching to the most frequently accessed URLs (~10% of all URLs).

  3. Consistency vs. Performance:

    • If updates are made directly to the database (e.g., URL expiry or analytics tracking), the cache may hold stale data temporarily until refreshed.

    • Trade-off: Sacrifice real-time consistency to prioritize performance for redirection requests.

Trade-Off Example: Failure Recovery Mechanisms

Context:

To ensure high availability and fault tolerance, the system should recover gracefully when components fail (e.g., a server crash or cache failure). We incorporated replication and fallback strategies in the design.

Mechanisms for Recovery:

  1. Database Replication:

    • Multiple copies (replicas) of the database ensure availability even if one server fails.

    • Trade-Off:

      • Benefit: High availability and low risk of data loss.

      • Cost: Increased storage needs and replication overhead. If data needs to replicate across multiple nodes, write latency may increase.

      • Example: Updating a short URL mapping might take milliseconds longer due to replica sync delays.

  2. Cache Fallback to Database:

    • If the Redis cache goes down, the system queries the database directly.

    • Trade-Off:

      • Benefit: Ensures continuity of service for redirection requests.

      • Cost: Database will experience increased load during cache outages, resulting in higher latency and potential bottlenecks under peak traffic.

      • Example: During a cache failure, redirection latency might increase from 1ms to 10ms.

  3. Load Balancers with Failover:

    • Load balancers redirect traffic from failed servers to healthy servers.

    • Trade-Off:

      • Benefit: Users don’t notice server outages as requests are rerouted.

      • Cost: Adding failover capabilities increases infrastructure complexity and cost.

      • Example: Keeping additional standby servers can increase operational costs by 20%.

  4. Backups for Disaster Recovery:

    • Regular backups of database and metadata ensure recovery in case of catastrophic failures (e.g., data corruption).

    • Trade-Off:

      • Benefit: Prevents permanent data loss and ensures the system is recoverable.

      • Cost: Backup systems require extra storage and may not include real-time data due to backup frequency.

      • Example: If backups occur daily, URLs created just before failure might be lost.

  5. Retry Logic and Circuit Breakers:

    • Implement retries for transient failures and circuit breakers to avoid overwhelming downstream services.

    • Trade-Off:

      • Benefit: Improves reliability for users during intermittent failures.

      • Cost: Retries add latency and may temporarily strain the system.

      • Example: If the database is slow, retry logic might delay redirections by a few milliseconds.

Comments

Popular posts from this blog

Azure key vault with .net framework 4.8

Azure Key Vault  With .Net Framework 4.8 I was asked to migrate asp.net MVC 5 web application to Azure and I were looking for the key vault integrations and access all the secrete out from there. Azure Key Vault Config Builder Configuration builders for ASP.NET  are new in .NET Framework >=4.7.1 and .NET Core >=2.0 and allow for pulling settings from one or many sources. Config builders support a number of different sources like user secrets, environment variables and Azure Key Vault and also you can create your own config builder, to pull in configuration from your own configuration management system. Here I am going to demo Key Vault integrations with Asp.net MVC(download .net framework 4.8). You will find that it's magical, without code, changes how your app can read secretes from the key vault. Just you have to do the few configurations in your web config file. Prerequisite: Following resource are required to run/complete this demo · ...

How to Make a Custom URL Shortener Using C# and .Net Core 3.1

C# and .Net Core 3.1:  Make a Custom URL Shortener Since a Random URL needs to be random and the intent is to generate short URLs that do not span more than 7 - 15 characters, the real thing is to make these short URLs random in real life too and not just a string that is used in the URLs Here is a simple clean approach to develop custom solutions Prerequisite:  Following are used in the demo.  VS CODE/VISUAL STUDIO 2019 or any Create one .Net Core Console Applications Install-Package Microsoft.AspNetCore -Version 2.2.0 Add a class file named ShortLink.cs and put this code: here we are creating two extension methods. public   static   class   ShortLink {      public   static   string   GetUrlChunk ( this   long   key ) =>            WebEncoders . Base64UrlEncode ( BitConverter . GetBytes ( key ));      public   static   long   GetK...

Azure Logic Apps Send Email Using Send Grid Step by Step Example

Azure Logic Apps Send Email Using Send Grid Step by Step     Step 1- Create Send Grid Account Create a SendGrid Account  https://sendgrid.com/ Login and Generate Sendgrid Key and keep it safe that will be used further to send emails You can use Free service. it's enough for the demo purpose Step 2- Logic App Design Login to  https://portal.azure.com Go to Resources and Create Logic App Named "EmailDemo" Go To Newly Created Rosoure Named "EmailDemo" and Select a Trigger "Recurrence", You can choose according to your needs like HTTP, etc. Note* Without trigger you can not insert new steps or Actions Click on Change Connection and add Send Grid Key  Click on Create and Save Button on the Top. As we have recurrence so it will trigger according to our setup(every 3 months) so just for the test click on "RUN" button  Finally, you should get an email like below one: