System Design Interview Questions and Answers — Step-by-Step Guide for 4+ Years Experience (2026)
February 26, 2026 Updated • By Surya Singh • System Design • Architecture • Scalability • Interview • Backend
Key Takeaways
- 110 detailed system design questions with requirements, high-level design, deep dive, and back-of-envelope calculations
- 2Covers URL shortener, rate limiter, chat system, news feed, notification service, video streaming, search autocomplete, distributed cache, payment processing, and file storage
- 3Rapid-fire rounds: Data-Intensive Systems, Real-Time Systems, and Infrastructure
- 4Targets 4+ years experience — includes scaling strategies, failure handling, and what separates good from great candidates
This guide covers the most frequently asked system design interview questions for senior backend and full-stack roles. Each question follows a structured format: Requirements, High-Level Design, Deep Dive, Back-of-Envelope Calculations, and What Separates Good from Great. Written for developers with 4+ years of experience who need to demonstrate production-grade architectural thinking.
For deeper foundations, explore Grokking the System Design Interview, System Design Primer, and HiredInTech's system design course.
Table of Contents
- 1. Design a URL Shortener (like bit.ly)
- 2. Design a Rate Limiter
- 3. Design a Real-Time Chat System (like WhatsApp/Slack)
- 4. Design a Social Media News Feed (like Twitter/Instagram)
- 5. Design a Notification Service
- 6. Design a Video Streaming Platform (like YouTube)
- 7. Design a Search Autocomplete System
- 8. Design a Distributed Cache (like Redis cluster)
- 9. Design a Payment Processing System
- 10. Design a File Storage Service (like Google Drive)
- Rapid-Fire Practice (3 Rounds)
- From Real Experience
- Common Mistakes to Avoid
- FAQ (8 Questions)
- Related Interview Guides
1) Design a URL Shortener (like bit.ly)
What interviewer evaluates: Read-heavy system design, ID generation, storage choice, and caching strategy.
Requirements:
- Functional: Shorten URL to 6–8 character alias; redirect short URL to original; support custom aliases (optional); track click analytics.
- Non-functional: High availability; low latency redirect (P99 < 100ms); 100M URLs/day write, 10B redirects/day read; 99.99% uptime.
High-Level Design:
API servers (stateless) → Load balancer → Write path: generate ID → DB write. Read path: cache lookup → DB on miss → 302 redirect. Components: API service, ID generator (base62 encoding or UUID), database (key-value: shortUrl → longUrl), cache (Redis), analytics (async via message queue).
Deep Dive:
- Database: DynamoDB or Cassandra — key-value access only. Partition key = short URL. Or PostgreSQL with indexed
short_urlcolumn. Avoid relational joins; this is a pure lookup. - ID generation: Base62 (a–z, A–Z, 0–9) gives 62^7 ≈ 3.5 trillion unique 7-char IDs. Options: Zookeeper/Redis INCR for distributed counter; UUID truncated to 8 chars with collision check; pre-generate batches in DB.
- API design:
POST /shorten— body:{long_url, custom_alias?};GET /{short_url}— 302 redirect. Separate analytics endpoint to avoid blocking redirect. - Scaling: Read-heavy (10:1 ratio) — aggressive caching. Redis cache-aside, TTL 24h. CDN for redirect if traffic is geographically distributed. DB read replicas for cache miss load.
- Caching: Redis LRU eviction. Hot URLs (top 1%) served from cache; 99%+ cache hit rate achievable for popular links. Cache stampede prevention: singleflight or probabilistic early expiration.
- Failure handling: DB down → serve stale cache if available; return 503 if no cached data. Idempotent writes via unique constraint on short_url.
Back-of-envelope: 10B redirects/day = ~120K QPS read. 100M writes/day = ~1.2K QPS write. Storage: 100M URLs × 500 bytes ≈ 50 GB/year. Bandwidth: 120K QPS × 500 bytes ≈ 60 MB/s. Redis cluster for cache; ~10 DB replicas for read scaling.
What separates good from great: Discuss collision handling, rate limiting on shorten API to prevent abuse, and how you'd add analytics (async Kafka/SQS → batch writes).
2) Design a Rate Limiter
What interviewer evaluates: Distributed state, consistency trade-offs, and algorithm choice.
Requirements:
- Functional: Limit requests per user/IP per time window; support multiple limit types (e.g., 100 req/min, 10 req/sec); return 429 when exceeded; optional per-API limits.
- Non-functional: Low latency (< 5ms overhead); work across distributed API servers; handle bursts; 99.9% accuracy.
High-Level Design:
Request → API gateway / middleware → Rate limiter service (checks counter in Redis) → Allow/Deny. Components: Redis (or in-memory for single-node) for counters, sliding window or token bucket algorithm, client identifier extraction (API key, IP, user ID).
Deep Dive:
- Database/storage: Redis — atomic INCR, EXPIRE. Key:
ratelimit:{user_id}:{window_id}. Each window is a separate key. No persistent DB; Redis is the source of truth for counters. - API design: Rate limiter is middleware — no separate API. Returns
X-RateLimit-Limit,X-RateLimit-Remaining,Retry-Afterheaders. Configuration stored in config service or DB. - Algorithms: Fixed window (simple but allows 2× burst at boundary); Sliding window log (accurate, more storage); Sliding window counter (approximation, 1 counter); Token bucket (smooth rate, allows bursts). Token bucket is often preferred for API throttling.
- Scaling: Redis Cluster for horizontal scaling. Each API server talks to Redis; no coordination between servers needed if Redis is central. Consider local cache for "definitely under limit" to reduce Redis calls.
- Failure handling: Redis down → fail open (allow requests) or fail closed (reject all). Fail open is common for availability; log for audit. Use Redis Sentinel/Cluster for HA.
Back-of-envelope: 100K QPS with 1 Redis call per request = 100K Redis ops/sec. Redis handles ~100K ops/sec per node; single node sufficient for moderate scale. At 1M QPS, shard by user_id hash across Redis cluster (e.g., 10 nodes). Memory: 1M users × 100 bytes/key × 2 (sliding window) ≈ 200 MB.
What separates good from great: Discuss distributed Redis with consistent hashing, different limits per tier (free vs premium), and rate limit bypass for internal services.
Loading...
3) Design a Real-Time Chat System (like WhatsApp/Slack)
What interviewer evaluates: Real-time delivery, message ordering, offline support, and presence.
Requirements:
- Functional: 1:1 and group chats; real-time message delivery; message history; typing indicators; read receipts; push notifications when offline.
- Non-functional: < 200ms message delivery; 99.9% availability; 50M DAU; support 10M concurrent connections.
High-Level Design:
Clients connect via WebSocket (or long polling fallback) to WebSocket servers. Message flow: Client A → API → Message queue (Kafka/SQS) → WebSocket servers → Client B. Message service persists to DB. Presence: heartbeat to Redis. Offline: push notification service.
Deep Dive:
- Database: MongoDB or Cassandra for messages — append-only, partition by chat_id or (chat_id, time_bucket). CQRS: write path optimized for append; read path may use a separate materialized view for "recent messages per chat."
- API design: REST for history (
GET /chats/{id}/messages?before=&limit=50); WebSocket for real-time. Events:message,typing,read_receipt. Protobuf or MessagePack for binary efficiency. - Scaling: WebSocket servers are stateful — use sticky sessions (load balancer) or pub/sub (Redis Pub/Sub) so message published by server A reaches client on server B. Kafka partitions by chat_id for ordering.
- Caching: Redis for presence (user_id → online/offline, last_seen). Recent messages cached per chat (e.g., last 50) to speed up app open. Cache invalidation on new message.
- Failure handling: At-least-once delivery via message queue; idempotent message IDs to handle duplicates. Clients acknowledge messages; unacknowledged messages retried. Offline queue on client, sync on reconnect.
Back-of-envelope: 50M DAU, 20 msgs/user/day = 1B msgs/day ≈ 12K msg/sec. 10M concurrent connections → ~100K connections per WebSocket server (100 servers). Storage: 1B msgs/day × 500 bytes ≈ 500 GB/day. Retention 1 year ≈ 180 TB.
What separates good from great: Discuss message ordering per chat (sequence numbers), conflict resolution for multi-device, end-to-end encryption design, and push notification routing (APNs, FCM).
4) Design a Social Media News Feed (like Twitter/Instagram)
What interviewer evaluates: Fan-out strategies, feed ranking, and consistency vs performance trade-offs.
Requirements:
- Functional: Post creation; follow/unfollow; feed = chronological or ranked posts from followed users; like, comment, share; real-time updates.
- Non-functional: Feed load < 300ms P99; 500M users; 10K posts/sec write; handle users following 2K+ people.
High-Level Design:
Write path: Post → API → DB (posts table) → Fan-out (push to followers' feeds or defer). Read path: Get feed from pre-computed feed table (push) or aggregate on read (pull). Hybrid: push for regular users, pull for celebrities. Components: Post service, follow graph (DB or graph DB), feed service, cache.
Deep Dive:
- Database: PostgreSQL for posts, users, follows. Feed storage: Redis sorted sets (post_id, timestamp) per user for push model, or Cassandra for massive scale. Follow graph: adjacency list in DB or Neo4j for graph queries.
- Fan-out: Push (write-time): on post, write to N followers' feeds — O(N) write, O(1) read. Pull (read-time): on feed request, fetch from N followed users — O(1) write, O(N) read. Hybrid: push for followers < 500K, pull for celebrities.
- API design:
POST /posts,GET /feed?cursor=&limit=20,POST /follow,POST /unfollow. Pagination via cursor (timestamp or post_id). - Scaling: Feed read from Redis/cache; DB for cold start. Async fan-out via message queue (Kafka) — don't block post creation. Shard posts by user_id; shard feeds by user_id.
- Caching: Feed cached per user; TTL 60s or invalidate on new post from followed user. Ranking: precompute relevance scores in a separate pipeline; merge ranked lists.
- Failure handling: Fan-out queue backlog — eventually consistent. Cache miss → rebuild from DB. Degraded mode: serve chronological only if ranking service is down.
Back-of-envelope: 500M users, avg 200 follows, 1 post/day each = 100B feed writes/day for push. For 10K celebs with 1M followers: 10B fan-out writes/day. Storage: 100B posts × 1KB ≈ 100 TB. Cache: 500M users × 20 posts × 1KB ≈ 10 TB if fully cached.
What separates good from great: Discuss ranking (ML model, engagement signals), cold-start for new users, and handling unfollow (remove from feed without full rebuild).
5) Design a Notification Service (push, email, SMS)
What interviewer evaluates: Multi-channel delivery, reliability, and user preferences.
Requirements:
- Functional: Send push (mobile), email, SMS; user preference management (opt-in/out per channel, quiet hours); templates; delivery status tracking.
- Non-functional: 99.99% delivery for critical notifications; < 5 min delivery SLAs; 10M notifications/day; idempotent (no duplicates).
High-Level Design:
API receives notification request → Validation & enrichment → Message queue (per channel) → Workers → External providers (APNs, FCM, SendGrid, Twilio). Preference service checks before send. DLQ for failures. Components: API, queue (Kafka/SQS), workers, provider adapters, preference store.
Deep Dive:
- Database: PostgreSQL for user preferences, notification templates, delivery logs. Preference schema: user_id, channel, enabled, quiet_hours_start, quiet_hours_end. Delivery log for idempotency (idempotency_key) and analytics.
- API design:
POST /notify— body:{user_id, channel, template_id, data, idempotency_key}. Async — returns 202 Accepted, job_id for status. Batch endpoint for bulk. - Scaling: Queues per channel (push, email, SMS) for independent scaling. Workers scale based on queue depth. Rate limits per provider (Twilio, SendGrid) — respect provider caps.
- Caching: Redis for user preferences (hot users); invalidate on preference update. Template cache to avoid DB hit per send.
- Failure handling: Retry with exponential backoff (3–5 retries). Dead-letter queue after max retries; alert and manual review. Idempotency key prevents duplicate sends on retry. Circuit breaker for provider outages — fail fast, don't exhaust threads.
Back-of-envelope: 10M notifs/day ≈ 120 QPS average. Burst: 1K QPS. Push: 5M; email: 3M; SMS: 2M. Workers: ~10 per channel at 100 msg/sec each. Storage: 10M × 500 bytes log ≈ 5 GB/day.
What separates good from great: Discuss batching (email batching for rate limits), A/B testing for templates, and notification coalescing (combine multiple into one digest).
Loading...
6) Design a Video Streaming Platform (like YouTube)
What interviewer evaluates: Content delivery, transcoding, storage, and playback optimization.
Requirements:
- Functional: Upload video; transcode to multiple qualities; stream via adaptive bitrate (HLS/DASH); search and recommendations; thumbnails, comments.
- Non-functional: Support 4K, low buffering; 100M users; 1M concurrent streams; global delivery.
High-Level Design:
Upload → Object storage (S3) → Transcoding pipeline (workers) → Encoded segments to CDN. Playback: Client requests manifest → CDN serves segments. Metadata (title, user, etc.) in DB. Components: Upload API, object storage, transcoding cluster, CDN, metadata DB, recommendation service.
Deep Dive:
- Storage: Object storage (S3, GCS) for raw uploads and transcoded outputs. Metadata in PostgreSQL or DynamoDB. Segments (HLS): typically 2–10 sec chunks; stored in object storage, served via CDN.
- Transcoding: Async job queue (Kafka/SQS). Workers pull jobs, run FFmpeg or cloud transcoder (AWS MediaConvert). Output: multiple resolutions (360p, 720p, 1080p, 4K). Store in object storage with CDN invalidation.
- API design:
POST /upload(multipart or resumable);GET /videos/{id}/manifest.m3u8;GET /videos/{id}/search. Signed URLs for private content. - Scaling: CDN handles 99% of read traffic. Origin (object storage) only for cache miss. Transcoding: scale workers with queue depth. Regional CDN edge locations.
- Caching: CDN caches segments; long TTL (video rarely changes). Manifest cached briefly (5 min). Hot videos served from edge.
- Failure handling: Transcoding failure → retry, then DLQ. Partial upload resume. CDN fallback to origin. Multiple CDN providers for redundancy.
Back-of-envelope: 1M concurrent streams × 5 Mbps avg = 5 Tbps. CDN handles this. Storage: 1M videos × 500 MB avg = 500 TB. Transcoding: 10K uploads/day × 10 min each = 100K min = ~70 worker-hours; 100 workers.
What separates good from great: Discuss DRM, live streaming (different pipeline), and recommendation system integration.
7) Design a Search Autocomplete System
What interviewer evaluates: Trie/prefix matching, ranking, and low-latency reads.
Requirements:
- Functional: As user types, suggest top K completions; support prefix matching; results ranked by popularity/recency.
- Non-functional: < 50ms P99 latency; 100K QPS; handle 10M unique queries; update index within minutes of new data.
High-Level Design:
Data pipeline: Query logs / product catalog → Aggregate popularity → Build trie or sorted structure. Serving: API receives prefix → Lookup trie (or prefix search in DB) → Return top K. Cache hot prefixes. Components: Trie service (or Elasticsearch), aggregation pipeline, cache.
Deep Dive:
- Data structure: Trie (prefix tree) — O(prefix_length) lookup. Each node stores top K suggestions. Or: n-gram table + sorted set (Redis ZRANGEBYLEX). Elasticsearch completion suggester uses FST (finite state transducer).
- Database: In-memory trie for speed — rebuilt periodically from DB. Or Redis with sorted sets per prefix. Source data in PostgreSQL or data warehouse for aggregation.
- API design:
GET /suggest?q=goog&limit=10. Response:[{"query": "google", "score": 1000}, ...]. - Scaling: Trie in memory per server; replicate. Read replicas. Cache: Redis for hot prefixes (top 10% handle 90% traffic).
- Ranking: Score = frequency × recency decay. Offline job computes scores from query logs; update trie hourly/daily. Real-time: blend with recent trends from stream processing.
- Failure handling: Trie rebuild failure → keep serving stale. Fallback to DB prefix search if trie service down (slower).
Back-of-envelope: 10M unique queries, avg 20 chars = 200 MB raw. Trie with top 10 per node: ~500 MB in memory. 100K QPS → 10 serving nodes. Cache hit 90% → 10K QPS to trie.
What separates good from great: Discuss personalized suggestions (per user history), typo tolerance (edit distance), and multi-language support.
8) Design a Distributed Cache (like Redis cluster)
What interviewer evaluates: Consistency, eviction, replication, and partitioning.
Requirements:
- Functional: get/set/delete; TTL support; high throughput; sub-millisecond latency.
- Non-functional: 1M QPS; 100 GB capacity; 99.99% availability; horizontal scaling.
High-Level Design:
Clients → Proxy or direct connection → Sharded cache nodes. Each shard: in-memory hash table + eviction policy (LRU/LFU). Replication: master-replica per shard. Partitioning: consistent hashing. Components: Cache nodes, proxy (optional), cluster manager.
Deep Dive:
- Storage: In-memory only — hash table. Optional: persistence (RDB, AOF) for durability. No traditional DB; cache is ephemeral by design.
- Partitioning: Consistent hashing — keys mapped to ring; add/remove nodes causes minimal key movement (only K/N keys remap). Virtual nodes (vnodes) for balanced distribution.
- Replication: Master-replica: sync or async replication. Read from replica for scale. Failover: sentinel or Raft-based auto-promotion.
- Eviction: LRU (Least Recently Used) or LFU (Least Frequently Used). Configurable maxmemory policy. Approximate LRU (Redis style) for O(1) implementation.
- Scaling: Add nodes → rehash (consistent hashing minimizes impact). Replicas for read scaling. Cluster mode: 16384 hash slots across nodes.
- Failure handling: Node failure → replica promoted. Data loss on master+replica failure (in-memory). Cache-aside pattern in app: DB fallback on cache miss. Single point of failure: avoid by using cluster.
Back-of-envelope: 1M QPS, 1ms latency → need ~1K concurrent connections per node. 100 GB / 16 GB per node = ~7 nodes. With replication: 14 nodes. Network: 1M × 1 KB avg = 1 GB/s.
What separates good from great: Discuss cache stampede prevention, cache-aside vs write-through vs write-behind, and multi-tenancy (isolate noisy neighbors).
Loading...
9) Design a Payment Processing System
What interviewer evaluates: ACID, idempotency, auditability, and compliance.
Requirements:
- Functional: Charge card; refund; partial refund; support multiple payment methods; webhook for async status.
- Non-functional: Strong consistency; no double charge; PCI compliance (tokenization); 99.99% availability; full audit trail.
High-Level Design:
API → Idempotency check → Payment orchestrator → PSP (Stripe, etc.) or bank gateway. DB: transactions, idempotency keys. Async: webhooks for final status. Components: API, idempotency layer, payment service, PSP adapter, ledger DB.
Deep Dive:
- Database: PostgreSQL with ACID. Schema:
transactions(id, idempotency_key, amount, status, created_at),idempotency_keys(key, result, ttl). Unique constraint on idempotency_key. Ledger table for immutable audit log. - Idempotency: Client sends
Idempotency-Keyheader. First request: process and store result. Duplicate: return stored result. TTL 24h. Key = client-provided UUID. - API design:
POST /charges(idempotency key);POST /refunds(idempotency key, idempotent). Webhook:POST /webhooks/stripe— verify signature, process async. - Scaling: Stateless API; DB connection pooling. Read replicas for reporting only (never for charge decision). Queue for webhook processing.
- Security: Never store raw card numbers. Tokenize via PSP. Webhook signature verification. Rate limiting. Audit log every mutation.
- Failure handling: PSP timeout → store as pending; background job retries. Idempotency prevents duplicate on retry. Reconciliation job daily to match our DB with PSP state.
Back-of-envelope: 10K transactions/day ≈ 0.15 QPS. Burst: 10 QPS during flash sale. Single DB sufficient. Storage: 10K × 1 KB × 365 days ≈ 3.6 GB/year.
What separates good from great: Discuss idempotency across refund + charge reversal, idempotency key scope (per operation type), and PCI scope reduction (use Stripe Elements so card never touches your server).
10) Design a File Storage Service (like Google Drive/Dropbox)
What interviewer evaluates: Block storage, sync, deduplication, and metadata management.
Requirements:
- Functional: Upload, download, delete files; folder hierarchy; share files; sync across devices; versioning.
- Non-functional: Durability 99.999999%; 100M users; 1B files; support files up to 5 GB; sync conflict resolution.
High-Level Design:
Upload: API → Chunk file (e.g., 4 MB blocks) → Dedup (content hash) → Object storage. Metadata: DB (file path, chunk refs, version). Download: metadata lookup → fetch chunks from object storage → assemble. Sync: delta sync (only changed blocks). Components: API, metadata DB, object storage, block store, sync service.
Deep Dive:
- Storage: Object storage (S3) for blocks; key = content hash (SHA-256). Metadata in PostgreSQL:
files(path, owner, chunk_ids),chunks(hash, object_storage_key). Same content = same hash → deduplication. - Chunking: Fixed 4 MB or variable (content-defined chunking for dedup). Content-defined: rolling hash to find boundaries; better dedup for modified files.
- API design:
PUT /files(multipart, resumable);GET /files/{path};GET /delta(sync token, return changed files). REST + optional sync protocol (like Dropbox's /delta). - Scaling: Metadata DB sharded by user_id or path. Object storage scales inherently. CDN for download. Sync: incremental, only changed blocks.
- Conflict resolution: Last-write-wins with timestamp; or store conflict copy (file_v2_conflicted). User merge for simultaneous edits. Operational transform or CRDT for real-time collab (advanced).
- Failure handling: Upload: resumable (chunk-level). Corruption: checksum verify on read. Replication: object storage provides durability (multi-AZ).
Back-of-envelope: 1B files, 10 MB avg = 10 PB. With 50% dedup = 5 PB. Metadata: 1B × 500 bytes = 500 GB. 100M users, 10 files/day upload = 1B writes/day ≈ 12K writes/sec. Chunking: 1B × 3 chunks avg = 3B chunks; 4 MB each = 12 EB raw → dedup reduces significantly.
What separates good from great: Discuss block-level dedup, incremental sync algorithm, and handling large file uploads (resumable, multipart).
Rapid-fire practice — design questions
60–90 second answers. Practice sketching components and data flow out loud.
Round 1: Data-Intensive Systems (3–4 questions)
Q: Design an analytics pipeline that ingests 1B events/day.
Kafka/SQS for ingestion → Stream processors (Flink, Kafka Streams) or batch (Spark) → Data warehouse (Snowflake, BigQuery). Partition by date/hour. Use Parquet for storage. Backpressure handling, dead-letter for bad events.
Q: When would you use event sourcing?
When you need full history, audit trail, or ability to replay. Examples: financial ledgers, order state, collaboration history. Append-only event store (Kafka, EventStore). Projections for read models. Trade-off: complexity, storage growth.
Q: Design a time-series database for metrics (1M metrics/sec).
InfluxDB, TimescaleDB, or Prometheus. Partition by metric name + time. Downsampling for retention (raw 7 days, 1min 30 days, 1hr 1 year). Write-optimized; compression for older data. Aggregation at query time or pre-aggregated materialized views.
Q: How would you design a data lake for ML training?
Raw zone (S3) → Bronze (cleaned) → Silver (transformed) → Gold (feature store). Orchestration: Airflow or dbt. Versioning: Delta Lake or Iceberg. Feature store (Feast, Tecton) for low-latency feature serving. Lineage tracking.
Round 2: Real-Time Systems (3–4 questions)
Q: Design a real-time leaderboard (gamers, 10M users).
Redis sorted sets (ZADD, ZRANGE). Key per game/season. Update on score change. Pagination via ZREVRANGE. Shard by game_id. Cache top 100 for hot games. Sliding window leaderboard: multiple sorted sets per time range.
Q: Design a live dashboard (real-time metrics, 10K viewers).
Metrics pipeline → Redis Pub/Sub or Kafka → WebSocket servers. Each dashboard subscribes to relevant metric streams. Aggregation at source to reduce fan-out. Throttle updates to client (max 1/sec per metric).
Q: Design a collaborative document editor (like Google Docs).
OT (Operational Transform) or CRDT for conflict-free sync. Server receives ops, transforms, broadcasts. Persistence: checkpoint + delta. Or use CRDT (Yjs, Automerge) — no server transform. WebSocket for real-time. Cursor positions via presence.
Q: Design a real-time inventory system (prevent oversell).
Strong consistency: DB with row-level lock or optimistic locking. Reserve-on-add-to-cart with TTL; release on abandon. Or event-sourced: reserve event, release event; validate invariant (stock >= 0) in projection. Saga for cross-service.
Round 3: Infrastructure (3–4 questions)
Q: How does a CDN work? When to use it?
Edge servers cache static content (images, JS, video) near users. DNS or anycast routes to nearest edge. Cache hit = low latency; miss = fetch from origin. Use for static assets, video, large downloads. Invalidate via purge API.
Q: Design a load balancer. What algorithms?
Round-robin, weighted round-robin, least connections, IP hash. Health checks: HTTP/TCP. Session affinity (sticky) for stateful. L4 (TCP) vs L7 (HTTP). Scale: stateless LBs; consistent hashing for persistence.
Q: What is a service mesh? When do you need it?
Sidecar proxy (Envoy) per pod handles mTLS, retries, circuit breaker, observability. Istio, Linkerd. Need when: many services, need consistent policies, zero-trust. Overhead: latency, resource. Overkill for < 10 services.
Q: Design a circuit breaker. When does it open?
States: Closed (normal) → Open (fail fast) after N failures in window → Half-open (test) after timeout. Opens on failure rate threshold (e.g., 50% in 10s) or consecutive failures. Prevents cascade. Libraries: Resilience4j, Hystrix.
Loading...
From real experience
"I've conducted 50+ system design interviews at the senior level. The single biggest differentiator: candidates who ask clarifying questions before drawing boxes. 'Is this read-heavy or write-heavy?' 'What's our scale?' 'Do we need strong or eventual consistency?' — these questions show you think before you architect. The candidates who jump straight to 'we'll use Kafka and Cassandra' often over-engineer or miss the actual requirements."
"The second differentiator: back-of-envelope numbers. When I ask 'how many servers do you need?', the best candidates say something like 'At 10K QPS with 50ms per request, one server handles ~20 QPS per core, so 8 cores = 160 QPS — we'd need ~60 API servers, plus DB and cache." They show they can reason about scale. The worst answers are 'we'll add more servers as needed' — that's not system design, that's hoping it works."
— Surya Singh, Senior Software Engineer & Technical Interviewer
Common interview mistakes to avoid
- Jumping into design without clarifying requirements — you'll over-engineer or solve the wrong problem.
- Using buzzwords (Kafka, Cassandra, microservices) without explaining why they fit the use case.
- Skipping back-of-envelope calculations — interviewers expect QPS, storage, and server count estimates.
- Ignoring failure modes — always discuss single points of failure, replication, and graceful degradation.
- Designing for infinite scale when the problem is 1K QPS — match complexity to requirements.
- Not discussing consistency vs availability — know when to choose strong consistency vs eventual consistency.
- Forgetting the user/client — how does the frontend get data? Polling, WebSocket, server-sent events?
- Running out of time on one component — keep high-level design to 5 minutes, then go deep on 1–2 areas.
Frequently asked questions
How much time should I spend on requirements gathering vs design in a system design interview?
Spend 2–3 minutes on requirements. Clarify scale (QPS, users, data volume), functional requirements (what the system must do), and non-functional requirements (latency, availability, consistency). Senior candidates who skip this often over-engineer or under-engineer. Write down the numbers — they anchor your back-of-envelope calculations later.
What back-of-envelope numbers should I know for system design interviews?
Common estimates: 1 billion requests/month ≈ 400 QPS; 1 million users × 10 req/day ≈ 120 QPS; 1 KB average payload × 1000 QPS = 1 MB/s bandwidth; 1 million rows in DB ≈ 100 MB (order of magnitude). Know latency targets: cache 1–5ms, DB 5–50ms, cross-region 100–200ms. These help you sanity-check your design.
How do I approach trade-offs when the interviewer asks "what if we need X instead?"
Acknowledge the trade-off explicitly. For example: "If we need strong consistency instead of eventual consistency, we'd add a consensus protocol like Raft or use a strongly consistent database. That increases latency and reduces availability — we'd need to decide if the use case justifies it." Show you understand both sides and can justify your choice.
What database should I choose for different system design scenarios?
Relational (PostgreSQL, MySQL): ACID transactions, complex queries, joins — e.g., payment systems, order management. NoSQL key-value (Redis, DynamoDB): high throughput, simple access patterns — e.g., session store, cache. NoSQL document (MongoDB): flexible schema, nested docs — e.g., user profiles, catalogs. Wide-column (Cassandra, HBase): write-heavy, time-series — e.g., logs, metrics. Choose based on access patterns and consistency requirements.
How do I demonstrate scalability in a system design interview?
Show horizontal scaling: stateless services behind a load balancer, database read replicas, caching layer, message queues for async processing. Discuss sharding or partitioning when a single DB hits limits. Mention auto-scaling policies (CPU, queue depth). Quantify: "At 10K QPS, we'd run 20 API servers; at 100K QPS, we'd add Redis cluster and 100 API servers with DB sharding."
What failure scenarios should I always cover?
Server crashes (replication, health checks, restart), database failure (replicas, failover), network partitions (circuit breakers, retries with backoff), cache miss storms (cache-aside with stampede prevention), single points of failure (eliminate them, add redundancy). Mention idempotency for critical operations and graceful degradation (serve stale cache when DB is down).
Is it okay to use managed services in system design interviews?
Yes, especially for 4+ years roles. Saying "we'd use S3 for object storage" or "Kafka for event streaming" shows production experience. However, explain what the service provides (durability, scaling, APIs) so the interviewer knows you understand the underlying concepts. Avoid saying "we'd use AWS" without naming specific services or their roles.
How do I structure my answer when I'm unsure about a detail?
State your assumption: "I'll assume we need 99.9% availability — we can relax that if the use case allows." Or ask: "Should we optimize for read-heavy or write-heavy workloads?" Making assumptions explicit and asking clarifying questions shows maturity. Never fake knowledge — say "I'd look up the exact limits, but typically..." and give a reasonable range.
Loading...
Related interview guides
- MERN Stack Interview Questions — STAR Format (MongoDB, Express, React, Node.js)
- MEAN Stack Interview Questions — STAR Format (MongoDB, Express, Angular, Node.js)
- SQL Full Stack Developer Interview Questions (SQL Server, .NET 10, React, Azure SQL)
- Azure Cloud Architecture Interview Guide: Design Patterns and Best Practices
- AI/ML Engineer Interview Questions — STAR Format (Azure AI, RAG, LangChain)
Surya Singh
Azure Solutions Architect & AI Engineer
Microsoft-certified Azure Solutions Architect with 8+ years in enterprise software, cloud architecture, and AI/ML deployment. I build production AI systems and write about what actually works—based on shipping code, not theory.
- Microsoft Certified: Azure Solutions Architect Expert
- Built 20+ production AI/ML pipelines on Azure
- 8+ years in .NET, C#, and cloud-native architecture