System Design Interview Questions and Answers — Step-by-Step Guide for 4+ Years Experience (2026)

February 26, 2026 UpdatedBy Surya SinghSystem Design • Architecture • Scalability • Interview • Backend

System design interview preparation architecture scalability distributed systems

Key Takeaways

  • 110 detailed system design questions with requirements, high-level design, deep dive, and back-of-envelope calculations
  • 2Covers URL shortener, rate limiter, chat system, news feed, notification service, video streaming, search autocomplete, distributed cache, payment processing, and file storage
  • 3Rapid-fire rounds: Data-Intensive Systems, Real-Time Systems, and Infrastructure
  • 4Targets 4+ years experience — includes scaling strategies, failure handling, and what separates good from great candidates

This guide covers the most frequently asked system design interview questions for senior backend and full-stack roles. Each question follows a structured format: Requirements, High-Level Design, Deep Dive, Back-of-Envelope Calculations, and What Separates Good from Great. Written for developers with 4+ years of experience who need to demonstrate production-grade architectural thinking.

For deeper foundations, explore Grokking the System Design Interview, System Design Primer, and HiredInTech's system design course.

1) Design a URL Shortener (like bit.ly)

What interviewer evaluates: Read-heavy system design, ID generation, storage choice, and caching strategy.

Requirements:

High-Level Design:

API servers (stateless) → Load balancer → Write path: generate ID → DB write. Read path: cache lookup → DB on miss → 302 redirect. Components: API service, ID generator (base62 encoding or UUID), database (key-value: shortUrl → longUrl), cache (Redis), analytics (async via message queue).

Deep Dive:

Back-of-envelope: 10B redirects/day = ~120K QPS read. 100M writes/day = ~1.2K QPS write. Storage: 100M URLs × 500 bytes ≈ 50 GB/year. Bandwidth: 120K QPS × 500 bytes ≈ 60 MB/s. Redis cluster for cache; ~10 DB replicas for read scaling.

What separates good from great: Discuss collision handling, rate limiting on shorten API to prevent abuse, and how you'd add analytics (async Kafka/SQS → batch writes).

2) Design a Rate Limiter

What interviewer evaluates: Distributed state, consistency trade-offs, and algorithm choice.

Requirements:

High-Level Design:

Request → API gateway / middleware → Rate limiter service (checks counter in Redis) → Allow/Deny. Components: Redis (or in-memory for single-node) for counters, sliding window or token bucket algorithm, client identifier extraction (API key, IP, user ID).

Deep Dive:

Back-of-envelope: 100K QPS with 1 Redis call per request = 100K Redis ops/sec. Redis handles ~100K ops/sec per node; single node sufficient for moderate scale. At 1M QPS, shard by user_id hash across Redis cluster (e.g., 10 nodes). Memory: 1M users × 100 bytes/key × 2 (sliding window) ≈ 200 MB.

What separates good from great: Discuss distributed Redis with consistent hashing, different limits per tier (free vs premium), and rate limit bypass for internal services.

Loading...

3) Design a Real-Time Chat System (like WhatsApp/Slack)

What interviewer evaluates: Real-time delivery, message ordering, offline support, and presence.

Requirements:

High-Level Design:

Clients connect via WebSocket (or long polling fallback) to WebSocket servers. Message flow: Client A → API → Message queue (Kafka/SQS) → WebSocket servers → Client B. Message service persists to DB. Presence: heartbeat to Redis. Offline: push notification service.

Deep Dive:

Back-of-envelope: 50M DAU, 20 msgs/user/day = 1B msgs/day ≈ 12K msg/sec. 10M concurrent connections → ~100K connections per WebSocket server (100 servers). Storage: 1B msgs/day × 500 bytes ≈ 500 GB/day. Retention 1 year ≈ 180 TB.

What separates good from great: Discuss message ordering per chat (sequence numbers), conflict resolution for multi-device, end-to-end encryption design, and push notification routing (APNs, FCM).

4) Design a Social Media News Feed (like Twitter/Instagram)

What interviewer evaluates: Fan-out strategies, feed ranking, and consistency vs performance trade-offs.

Requirements:

High-Level Design:

Write path: Post → API → DB (posts table) → Fan-out (push to followers' feeds or defer). Read path: Get feed from pre-computed feed table (push) or aggregate on read (pull). Hybrid: push for regular users, pull for celebrities. Components: Post service, follow graph (DB or graph DB), feed service, cache.

Deep Dive:

Back-of-envelope: 500M users, avg 200 follows, 1 post/day each = 100B feed writes/day for push. For 10K celebs with 1M followers: 10B fan-out writes/day. Storage: 100B posts × 1KB ≈ 100 TB. Cache: 500M users × 20 posts × 1KB ≈ 10 TB if fully cached.

What separates good from great: Discuss ranking (ML model, engagement signals), cold-start for new users, and handling unfollow (remove from feed without full rebuild).

5) Design a Notification Service (push, email, SMS)

What interviewer evaluates: Multi-channel delivery, reliability, and user preferences.

Requirements:

High-Level Design:

API receives notification request → Validation & enrichment → Message queue (per channel) → Workers → External providers (APNs, FCM, SendGrid, Twilio). Preference service checks before send. DLQ for failures. Components: API, queue (Kafka/SQS), workers, provider adapters, preference store.

Deep Dive:

Back-of-envelope: 10M notifs/day ≈ 120 QPS average. Burst: 1K QPS. Push: 5M; email: 3M; SMS: 2M. Workers: ~10 per channel at 100 msg/sec each. Storage: 10M × 500 bytes log ≈ 5 GB/day.

What separates good from great: Discuss batching (email batching for rate limits), A/B testing for templates, and notification coalescing (combine multiple into one digest).

Loading...

6) Design a Video Streaming Platform (like YouTube)

What interviewer evaluates: Content delivery, transcoding, storage, and playback optimization.

Requirements:

High-Level Design:

Upload → Object storage (S3) → Transcoding pipeline (workers) → Encoded segments to CDN. Playback: Client requests manifest → CDN serves segments. Metadata (title, user, etc.) in DB. Components: Upload API, object storage, transcoding cluster, CDN, metadata DB, recommendation service.

Deep Dive:

Back-of-envelope: 1M concurrent streams × 5 Mbps avg = 5 Tbps. CDN handles this. Storage: 1M videos × 500 MB avg = 500 TB. Transcoding: 10K uploads/day × 10 min each = 100K min = ~70 worker-hours; 100 workers.

What separates good from great: Discuss DRM, live streaming (different pipeline), and recommendation system integration.

7) Design a Search Autocomplete System

What interviewer evaluates: Trie/prefix matching, ranking, and low-latency reads.

Requirements:

High-Level Design:

Data pipeline: Query logs / product catalog → Aggregate popularity → Build trie or sorted structure. Serving: API receives prefix → Lookup trie (or prefix search in DB) → Return top K. Cache hot prefixes. Components: Trie service (or Elasticsearch), aggregation pipeline, cache.

Deep Dive:

Back-of-envelope: 10M unique queries, avg 20 chars = 200 MB raw. Trie with top 10 per node: ~500 MB in memory. 100K QPS → 10 serving nodes. Cache hit 90% → 10K QPS to trie.

What separates good from great: Discuss personalized suggestions (per user history), typo tolerance (edit distance), and multi-language support.

8) Design a Distributed Cache (like Redis cluster)

What interviewer evaluates: Consistency, eviction, replication, and partitioning.

Requirements:

High-Level Design:

Clients → Proxy or direct connection → Sharded cache nodes. Each shard: in-memory hash table + eviction policy (LRU/LFU). Replication: master-replica per shard. Partitioning: consistent hashing. Components: Cache nodes, proxy (optional), cluster manager.

Deep Dive:

Back-of-envelope: 1M QPS, 1ms latency → need ~1K concurrent connections per node. 100 GB / 16 GB per node = ~7 nodes. With replication: 14 nodes. Network: 1M × 1 KB avg = 1 GB/s.

What separates good from great: Discuss cache stampede prevention, cache-aside vs write-through vs write-behind, and multi-tenancy (isolate noisy neighbors).

Loading...

9) Design a Payment Processing System

What interviewer evaluates: ACID, idempotency, auditability, and compliance.

Requirements:

High-Level Design:

API → Idempotency check → Payment orchestrator → PSP (Stripe, etc.) or bank gateway. DB: transactions, idempotency keys. Async: webhooks for final status. Components: API, idempotency layer, payment service, PSP adapter, ledger DB.

Deep Dive:

Back-of-envelope: 10K transactions/day ≈ 0.15 QPS. Burst: 10 QPS during flash sale. Single DB sufficient. Storage: 10K × 1 KB × 365 days ≈ 3.6 GB/year.

What separates good from great: Discuss idempotency across refund + charge reversal, idempotency key scope (per operation type), and PCI scope reduction (use Stripe Elements so card never touches your server).

10) Design a File Storage Service (like Google Drive/Dropbox)

What interviewer evaluates: Block storage, sync, deduplication, and metadata management.

Requirements:

High-Level Design:

Upload: API → Chunk file (e.g., 4 MB blocks) → Dedup (content hash) → Object storage. Metadata: DB (file path, chunk refs, version). Download: metadata lookup → fetch chunks from object storage → assemble. Sync: delta sync (only changed blocks). Components: API, metadata DB, object storage, block store, sync service.

Deep Dive:

Back-of-envelope: 1B files, 10 MB avg = 10 PB. With 50% dedup = 5 PB. Metadata: 1B × 500 bytes = 500 GB. 100M users, 10 files/day upload = 1B writes/day ≈ 12K writes/sec. Chunking: 1B × 3 chunks avg = 3B chunks; 4 MB each = 12 EB raw → dedup reduces significantly.

What separates good from great: Discuss block-level dedup, incremental sync algorithm, and handling large file uploads (resumable, multipart).

Rapid-fire practice — design questions

60–90 second answers. Practice sketching components and data flow out loud.

Round 1: Data-Intensive Systems (3–4 questions)

Q: Design an analytics pipeline that ingests 1B events/day.

Kafka/SQS for ingestion → Stream processors (Flink, Kafka Streams) or batch (Spark) → Data warehouse (Snowflake, BigQuery). Partition by date/hour. Use Parquet for storage. Backpressure handling, dead-letter for bad events.

Q: When would you use event sourcing?

When you need full history, audit trail, or ability to replay. Examples: financial ledgers, order state, collaboration history. Append-only event store (Kafka, EventStore). Projections for read models. Trade-off: complexity, storage growth.

Q: Design a time-series database for metrics (1M metrics/sec).

InfluxDB, TimescaleDB, or Prometheus. Partition by metric name + time. Downsampling for retention (raw 7 days, 1min 30 days, 1hr 1 year). Write-optimized; compression for older data. Aggregation at query time or pre-aggregated materialized views.

Q: How would you design a data lake for ML training?

Raw zone (S3) → Bronze (cleaned) → Silver (transformed) → Gold (feature store). Orchestration: Airflow or dbt. Versioning: Delta Lake or Iceberg. Feature store (Feast, Tecton) for low-latency feature serving. Lineage tracking.

Round 2: Real-Time Systems (3–4 questions)

Q: Design a real-time leaderboard (gamers, 10M users).

Redis sorted sets (ZADD, ZRANGE). Key per game/season. Update on score change. Pagination via ZREVRANGE. Shard by game_id. Cache top 100 for hot games. Sliding window leaderboard: multiple sorted sets per time range.

Q: Design a live dashboard (real-time metrics, 10K viewers).

Metrics pipeline → Redis Pub/Sub or Kafka → WebSocket servers. Each dashboard subscribes to relevant metric streams. Aggregation at source to reduce fan-out. Throttle updates to client (max 1/sec per metric).

Q: Design a collaborative document editor (like Google Docs).

OT (Operational Transform) or CRDT for conflict-free sync. Server receives ops, transforms, broadcasts. Persistence: checkpoint + delta. Or use CRDT (Yjs, Automerge) — no server transform. WebSocket for real-time. Cursor positions via presence.

Q: Design a real-time inventory system (prevent oversell).

Strong consistency: DB with row-level lock or optimistic locking. Reserve-on-add-to-cart with TTL; release on abandon. Or event-sourced: reserve event, release event; validate invariant (stock >= 0) in projection. Saga for cross-service.

Round 3: Infrastructure (3–4 questions)

Q: How does a CDN work? When to use it?

Edge servers cache static content (images, JS, video) near users. DNS or anycast routes to nearest edge. Cache hit = low latency; miss = fetch from origin. Use for static assets, video, large downloads. Invalidate via purge API.

Q: Design a load balancer. What algorithms?

Round-robin, weighted round-robin, least connections, IP hash. Health checks: HTTP/TCP. Session affinity (sticky) for stateful. L4 (TCP) vs L7 (HTTP). Scale: stateless LBs; consistent hashing for persistence.

Q: What is a service mesh? When do you need it?

Sidecar proxy (Envoy) per pod handles mTLS, retries, circuit breaker, observability. Istio, Linkerd. Need when: many services, need consistent policies, zero-trust. Overhead: latency, resource. Overkill for < 10 services.

Q: Design a circuit breaker. When does it open?

States: Closed (normal) → Open (fail fast) after N failures in window → Half-open (test) after timeout. Opens on failure rate threshold (e.g., 50% in 10s) or consecutive failures. Prevents cascade. Libraries: Resilience4j, Hystrix.

Loading...

From real experience

"I've conducted 50+ system design interviews at the senior level. The single biggest differentiator: candidates who ask clarifying questions before drawing boxes. 'Is this read-heavy or write-heavy?' 'What's our scale?' 'Do we need strong or eventual consistency?' — these questions show you think before you architect. The candidates who jump straight to 'we'll use Kafka and Cassandra' often over-engineer or miss the actual requirements."

"The second differentiator: back-of-envelope numbers. When I ask 'how many servers do you need?', the best candidates say something like 'At 10K QPS with 50ms per request, one server handles ~20 QPS per core, so 8 cores = 160 QPS — we'd need ~60 API servers, plus DB and cache." They show they can reason about scale. The worst answers are 'we'll add more servers as needed' — that's not system design, that's hoping it works."
— Surya Singh, Senior Software Engineer & Technical Interviewer

Common interview mistakes to avoid

Frequently asked questions

How much time should I spend on requirements gathering vs design in a system design interview?

Spend 2–3 minutes on requirements. Clarify scale (QPS, users, data volume), functional requirements (what the system must do), and non-functional requirements (latency, availability, consistency). Senior candidates who skip this often over-engineer or under-engineer. Write down the numbers — they anchor your back-of-envelope calculations later.

What back-of-envelope numbers should I know for system design interviews?

Common estimates: 1 billion requests/month ≈ 400 QPS; 1 million users × 10 req/day ≈ 120 QPS; 1 KB average payload × 1000 QPS = 1 MB/s bandwidth; 1 million rows in DB ≈ 100 MB (order of magnitude). Know latency targets: cache 1–5ms, DB 5–50ms, cross-region 100–200ms. These help you sanity-check your design.

How do I approach trade-offs when the interviewer asks "what if we need X instead?"

Acknowledge the trade-off explicitly. For example: "If we need strong consistency instead of eventual consistency, we'd add a consensus protocol like Raft or use a strongly consistent database. That increases latency and reduces availability — we'd need to decide if the use case justifies it." Show you understand both sides and can justify your choice.

What database should I choose for different system design scenarios?

Relational (PostgreSQL, MySQL): ACID transactions, complex queries, joins — e.g., payment systems, order management. NoSQL key-value (Redis, DynamoDB): high throughput, simple access patterns — e.g., session store, cache. NoSQL document (MongoDB): flexible schema, nested docs — e.g., user profiles, catalogs. Wide-column (Cassandra, HBase): write-heavy, time-series — e.g., logs, metrics. Choose based on access patterns and consistency requirements.

How do I demonstrate scalability in a system design interview?

Show horizontal scaling: stateless services behind a load balancer, database read replicas, caching layer, message queues for async processing. Discuss sharding or partitioning when a single DB hits limits. Mention auto-scaling policies (CPU, queue depth). Quantify: "At 10K QPS, we'd run 20 API servers; at 100K QPS, we'd add Redis cluster and 100 API servers with DB sharding."

What failure scenarios should I always cover?

Server crashes (replication, health checks, restart), database failure (replicas, failover), network partitions (circuit breakers, retries with backoff), cache miss storms (cache-aside with stampede prevention), single points of failure (eliminate them, add redundancy). Mention idempotency for critical operations and graceful degradation (serve stale cache when DB is down).

Is it okay to use managed services in system design interviews?

Yes, especially for 4+ years roles. Saying "we'd use S3 for object storage" or "Kafka for event streaming" shows production experience. However, explain what the service provides (durability, scaling, APIs) so the interviewer knows you understand the underlying concepts. Avoid saying "we'd use AWS" without naming specific services or their roles.

How do I structure my answer when I'm unsure about a detail?

State your assumption: "I'll assume we need 99.9% availability — we can relax that if the use case allows." Or ask: "Should we optimize for read-heavy or write-heavy workloads?" Making assumptions explicit and asking clarifying questions shows maturity. Never fake knowledge — say "I'd look up the exact limits, but typically..." and give a reasonable range.

Loading...

Surya Singh

Surya Singh

Azure Solutions Architect & AI Engineer

Microsoft-certified Azure Solutions Architect with 8+ years in enterprise software, cloud architecture, and AI/ML deployment. I build production AI systems and write about what actually works—based on shipping code, not theory.

  • Microsoft Certified: Azure Solutions Architect Expert
  • Built 20+ production AI/ML pipelines on Azure
  • 8+ years in .NET, C#, and cloud-native architecture