23 April, 2025

Build a Redis-like Distributed In-Memory Cache

  This tests:

  • System design depth

  • Understanding of distributed systems

  • Trade-off navigation (CAP, consistency, latency)

  • Real-world edge case handling

Let’s go step by step and design Redis-like cache from first principles, not using cloud-managed services.


๐Ÿš€ Goal: Build a Redis-like Distributed In-Memory Cache


๐Ÿงพ 1. Requirements Gathering (Clarify with interviewer)

๐Ÿ”น Functional

  • Support GET, SET, DEL, TTL

  • Handle concurrent reads/writes

  • Cache keys across multiple nodes

  • Optional: Support pub/sub, data structures (hash, list)

๐Ÿ”น Non-Functional

  • Low latency (<1ms typical)

  • High availability & fault tolerance

  • Scalable horizontally

  • Eventual or strong consistency

  • Memory-optimized with TTL eviction

Absolutely! Back-of-the-envelope estimations are crucial in system design interviews — they demonstrate your pragmatism, ability to roughly size a system, and to make sound trade-offs.

Let’s break it down for your Redis-like Distributed In-Memory Cache System:


๐Ÿง  Scenario:

Let’s say you're designing this for an AI/ML pipeline system, like Google's CMCS ML. It caches:

  • Intermediate model data

  • Feature store results

  • Token metadata

  • Configuration data


๐Ÿ“Œ Estimation Goals:

We’ll estimate for:

What Example
๐Ÿ”น Number of keys e.g., 100 million
๐Ÿ”น Size per key e.g., average 1KB
๐Ÿ”น Total memory footprint GB / TB scale
๐Ÿ”น QPS (Queries Per Second) For read/write traffic
๐Ÿ”น Node count and distribution
๐Ÿ”น Network bandwidth
๐Ÿ”น TTL / Eviction rates

⚙️ Step-by-Step Estimation

๐Ÿ”น 1. Number of Keys

Let’s say each ML workflow (pipeline) generates:

  • 10k intermediate cacheable entries

  • 1M workflows per day (across all users)


10k keys/workflow × 1M workflows/day = 10B keys/day

But not all stay in memory. We retain 10% for hot data in memory:

  • 10B × 10% = 1B keys cached at peak


๐Ÿ”น 2. Average Key Size

Let’s assume:

  • Key name: ~100 bytes

  • Value: ~900 bytes

  • TTL/metadata: ~20 bytes overhead

Total = 1KB per key


๐Ÿ“ฆ 3. Total Memory Requirement

1B keys × 1KB = 1,000,000,000 KB = ~1 TB
So you’d need ~1 TB of RAM across your cluster

Let’s budget for 30% overhead (replication, GC, fragmentation):

➡️ Effective: ~1.3 TB RAM


๐Ÿงต 4. QPS (Queries Per Second)

Assume:

  • Each key gets ~10 reads per day → 10B reads/day

  • 1% of keys get hit 90% of the time (Zipfian)

10B reads/day ≈ 115,740 reads/sec
Writes: 1B/day ≈ 11,500 writes/sec
Target QPS:
  • Read QPS: 100K–150K

  • Write QPS: 10K–20K


๐Ÿง‘‍๐Ÿค‍๐Ÿง‘ 5. Number of Nodes

If 1 machine supports:

  • 64 GB usable memory

  • 10K QPS (to be safe)

  • 10 Gbps NIC

Then:

  • RAM: 1.3 TB / 64 GB ≈ 20 nodes

  • QPS: 150K / 10K = 15 nodes

  • Plan for ~25–30 nodes (for headroom and HA)


๐Ÿ” 6. Replication Overhead

Assuming:

  • 1 replica per shard for failover

  • 2× memory and network cost

➡️ RAM required: ~2.6 TB ➡️ Bandwidth: double write traffic (~20K writes/sec * 1KB = ~20 MB/sec replication stream)


๐Ÿ“ถ 7. Network Bandwidth

Let’s estimate:

  • 150K reads/sec × 1KB = 150 MB/s

  • 20K writes/sec × 1KB = 20 MB/s

  • Replication = 20 MB/s

๐Ÿ“Œ Each node should handle:

  • Read bandwidth: ~6 MB/s

  • Write + replication: ~2 MB/s

  • Easily handled by 10 Gbps NIC


⏳ 8. Eviction Rate

Assuming TTL = 1 hour, and 1B keys:

  • Evictions per second = 1B / (60×60) ≈ 277K keys/sec

Eviction algorithm must be efficient:

  • LRU clock algo or async TTL scanner needed


✅ Final Summary

Metric Estimation
Total keys 1 billion
Avg size per key 1 KB
Total RAM (w/ overhead) ~2.6 TB (with replication)
Nodes 25–30 (for HA, QPS, memory headroom)
Read QPS ~150K/sec
Write QPS ~15–20K/sec
Eviction rate ~250–300K/sec
Network per node ~10 MB/s total (within 10Gbps budget)

๐ŸŽฏ Bonus: What Google Might Ask

What would change if you needed to support multi-tenant isolation?
→ Talk about namespacing keys, quota control, per-tenant memory buckets.

What if a single user uploads a 1GB object?
→ Chunk large values or offload to Blob storage and cache pointer.

How would you reduce memory cost?
→ TTL tuning, compression (LZ4), lazy expiration.



๐Ÿงฑ 2. High-Level Architecture

                 +------------------------+
                 |  Client Applications   |
                 +------------------------+
                             |
                             v
                    +------------------+
                    |  Coordinator /   |
                    |  Cache Router    | (Optional)
                    +------------------+
                             |
          +------------------+------------------+
          |                                     |
     +----------+                        +-------------+
     |  Cache    |  <-- Gossip/Heartbeat -->  |  Cache     |
     |  Node A   |        Protocol             |  Node B    |
     +----------+                        +-------------+
          |                                     |
     +------------+                       +-------------+
     |  Memory DB |                       |  Memory DB  |
     +------------+                       +-------------+

๐Ÿง  3. Core Components

๐Ÿ”ธ a. Data Storage (In-Memory)

  • Use hash maps in memory for key-value store

  • TTLs stored with each key (for expiry eviction)

  • Optionally support data types like list, hash, etc.

store = {
  "foo": { value: "bar", expiry: 1681450500 },
  ...
}

๐Ÿ”ธ b. Shard & Partition

  • Use consistent hashing to assign keys to nodes

  • Each key Khash(K) % N where N = number of virtual nodes

This avoids rehashing all keys when nodes are added/removed

๐Ÿ”ธ c. Cache Router / Coordinator

  • Client can compute hash OR use a proxy router to route to correct cache node

  • Think Twemproxy or Envoy as L7 proxy

๐Ÿ”ธ d. Replication

  • Master-Replica model

  • Writes go to master → replicate to replica (async or sync)

  • Replicas take over on master failure

Node A (Master)
  └── Replica A1

๐Ÿ”ธ e. Eviction Strategy

  • Support TTL for automatic key expiry

  • Support LRU / LFU / random eviction when memory full

  • Track access counters for eviction ranking

๐Ÿ”ธ f. Concurrency

  • Use read-write locks or CAS for thread-safe operations

  • Each node is single-threaded (like Redis) or use event-loop


๐Ÿ” 4. Replication and Fault Tolerance

Gossip/Heartbeat

  • Nodes send periodic heartbeats to detect failures

  • Use Raft or Gossip protocols for membership

Failover

  • If a master dies → elected replica takes over

  • Redis uses Redis Sentinel to coordinate failover


๐Ÿงช 5. Optional Features

Feature Description
Pub/Sub Add pub/sub channels per key prefix
Persistence Periodically write snapshots to disk (RDB), or append-only logs (AOF)
Backpressure Queue or reject new writes if memory full
Client Library SDKs to abstract hash ring + failover

๐Ÿ” 6. CAP Trade-off

CAP Property Redis Default
Consistency Eventual (async replication)
Availability High (failover supported)
Partition Tolerance Yes

๐Ÿ“Š Metrics to Expose

  • Hit/Miss rate

  • Memory usage

  • Eviction count

  • Node replication lag

  • Key distribution balance


๐Ÿง  Questions Google May Ask You During This Design

  • How would you handle hot keys (one key accessed by many nodes)?

  • How do you rebalance keys after node failure?

  • Would you support atomic operations (INCR, DECR)?

  • How do you prevent cache stampede (multiple clients recomputing a missing key)?

  • How do you handle large objects that exceed single node memory?


๐Ÿ› ️ Tech Stack (if you had to build it)

Component

Tech

Cache server

C++ / Rust / Go (like Redis)

Hashing & comm

Consistent hashing + TCP sockets / gRPC

Replication

Raft or custom pub/sub protocol

Persistence

RocksDB / Flat file / Append logs

Observability

Prometheus + Grafana

 



Perfect — let’s walk through each of those Staff-level system design questions, and I’ll give you solid answers with reasoning, trade-offs, and implementation suggestions. You can use these to explain your decisions clearly in the interview.


๐Ÿ”ฅ 1. How would you handle hot keys (one key accessed by many nodes)?

Problem:

  • One key (like "user:token:123") gets disproportionate access → overloads one node.

✅ Solutions:

  1. Key Replication (Read Replicas):

    • Replicate hot key across multiple nodes.

    • Client-side read load balancing (randomly pick from replicas).

    • Write goes to master → async to replicas.

  2. Key Sharding (Virtual Keys):

    • Split key logically, e.g., "video:views"video:views:1, ...:2

    • Clients randomly select a shard for read/write → reduce contention.

    • Aggregate during reads (costly but effective).

  3. Request Deduplication & Caching at Edge:

    • Use edge cache (like CDN or client-side cache) for super-hot keys.

  4. Rate Limiting / Backpressure:

    • Throttle requests to that key, or queue them on overload.

Interview Tip:

Emphasize dynamic detection of hot keys (via metrics), and adaptive replication or redirection.


๐Ÿ’ก 2. How do you rebalance keys after node failure?

Problem:

  • Node failure → key space imbalance.

  • Some nodes overloaded, others underused.

✅ Solutions:

  1. Consistent Hashing + Virtual Nodes:

    • Redistribute virtual nodes (vNodes) from failed node to others.

    • Only keys for those vNodes get rebalanced — minimal movement.

  2. Auto-Failover & Reassignment:

    • Use heartbeat to detect failure.

    • Other nodes take over lost slots or ranges.

  3. Key Migration Tools:

    • Background rebalance workers move keys to even out load.

    • Ensure write consistency during move via locking/versioning.

  4. Client-Side Awareness:

    • Clients get updated ring view and re-route requests accordingly.

Interview Tip:

Talk about graceful degradation during rebalancing and minimizing downtime.


⚙️ 3. Would you support atomic operations (INCR, DECR)?

Yes — atomic operations are essential in a caching layer (e.g., counters, rate limits, tokens).

Implementation:

  1. Single-Threaded Execution Model:

    • Like Redis: handle each command sequentially on single-threaded event loop → natural atomicity.

  2. Compare-And-Swap (CAS):

    • For multi-threaded or multi-process setups.

    • Use version numbers or timestamps to detect stale updates.

  3. Locks (Optimistic/Pessimistic):

    • Apply locks on keys for write-modify-write operations.

    • Use with caution to avoid performance degradation.

  4. Use CRDTs (Advanced Option):

    • Conflict-free data types (e.g., GCounter, PNCounter) for distributed atomicity.

Interview Tip:

Highlight that simplicity, speed, and correctness are the priority. Lean toward single-threaded per-key operation for atomicity.


๐ŸงŠ 4. How do you prevent cache stampede (multiple clients recomputing a missing key)?

Problem:

  • TTL expires → 1000 clients query same missing key → backend DDoS.

✅ Solutions:

  1. Lock/SingleFlight:

    • First client computes and sets value.

    • Others wait for value to be written (or reused from intermediate store).

    • Go has sync/singleflight, Redis can simulate with Lua locks.

  2. Stale-While-Revalidate (SWR):

    • Serve expired value temporarily.

    • In background, refresh the cache asynchronously.

  3. Request Coalescing at API Gateway:

    • Gateway buffers duplicate requests until cache is ready.

  4. Early Refresh Strategy:

    • Monitor popular keys.

    • Proactively refresh before TTL expiry.

Interview Tip:

Describe this as a read-heavy resilience pattern. Emphasize proactive + reactive strategies.


๐Ÿ“ฆ 5. How do you handle large objects that exceed single node memory?

Problem:

  • A single large key (e.g., serialized ML model, 1GB) doesn't fit in one node.

✅ Solutions:

  1. Key Chunking (Manual Sharding):

    • Split large value into multiple keys (file:1, file:2, file:3).

    • Store each chunk on different nodes.

    • Reassemble during read.

  2. Redirect to Object Store:

    • If object > X MB → store in Blob/File system (Azure Blob / GCS).

    • Cache a pointer/reference in cache instead.

  3. Use a Tiered Cache:

    • Store large objects in a slower (but scalable) cache (like disk-based).

    • Fast cache for hot small keys; slow cache for bulkier data.

  4. Compression:

    • Use lightweight compression (LZ4, Snappy) before storing.

Interview Tip:

Discuss threshold-based offloading and trade-off between latency vs. capacity.


No comments:

Post a Comment

Microservices vs Monolithic Architecture

 Microservices vs Monolithic Architecture Here’s a clear side-by-side comparison between Microservices and Monolithic architectures — fro...