System Design Interview Questions

System design is less about memorizing diagrams and more about showing you can grow a product from ten users to ten million without pretending the hard parts disappear. For people with several years of experience, panels expect you to drive the conversation: ask clarifying questions, propose reasonable capacity estimates, and name the concrete failure modes you have actually seen—slow queries, thundering herds, poison messages, partial outages.

Good answers sound like postmortems that never happened yet. You pick a data model that matches access patterns, explain why you are not over-sharding on day one, and spell out how you would observe the system (metrics, traces, SLOs) once it ships. If you only draw boxes, you will sound junior; if you connect boxes to operational cost and team skill, you will sound hireable.

The sections below break out topics interviewers return to again and again. Treat each as a mini design exercise: fifteen minutes of requirements, twenty minutes of core architecture, ten minutes of deep dive, five minutes of "what breaks first."

Trending Sub-topics

Rate Limiters — Fairness, burst traffic, distributed counters, and the product story behind throttling—API keys, tenants, or abusive clients.
Load Balancing — Health checks, connection draining, sticky sessions when you regret them, and why L4 vs L7 still matters in 2026.
Microservices — Boundaries that survive reorgs, synchronous vs event-driven coupling, and the debugging tax most teams underestimate.
CAP Theorem — Moving past the triangle meme to real consistency models, leader election, and what your database actually promised during a partition.

How to run a mock that helps

Pick a partner who will interrupt you. The best practice is slightly adversarial: "What if this region fails?" "Who owns this queue?" "How do you roll back a bad schema change?" Record yourself once in a while—you will hear where you ramble or skip client-visible latency.

Numbers without fantasy

Interviewers forgive rough math; they do not forgive orders of magnitude that ignore physics. State your assumptions ("200 bytes per event," "3:1 read/write") and sanity-check storage and bandwidth before you fall in love with Kafka. If you have operated a service, borrow its real traffic shape—it makes the exercise believable.

STAR in system design

Bring one or two production stories per theme: the cache you added when Redis CPU spiked, the feature flag that saved a launch, the read replica you promoted during an incident. Tie them to user impact and cost, not heroics. That is how you convert a whiteboard session into proof you have carried pager pain and learned from it.

What this looks like on Azure (and in code)

You do not need to memorize every SKU. You do need to connect boxes to services you can name. These snippets are conversation starters—tie them to a launch, an incident, or a cost conversation you were part of.

Azure API Management: rate limit by caller (policy fragment)

<inbound>
  <base />
  <!-- counter-key can be subscription id, JWT claim, or IP -->
  <rate-limit-by-key calls="120"
                     renewal-period="60"
                     counter-key="@(context.Subscription.Id)" />
</inbound>

When the key is a JWT hash or subscription key, you get per-tenant fairness without every app re-implementing token buckets. In an interview, say what happens when Redis/APIM config is wrong: friendly 429s, Retry-After headers, and metrics in Azure Monitor so support sees spikes before Twitter does.

Azure Functions + Service Bus: poison message guard (C# shape)

[Function(nameof(ProcessOrder))]
public async Task Run(
    [ServiceBusTrigger("orders", Connection = "SbConn")]
    ServiceBusReceivedMessage msg,
    ServiceBusMessageActions actions)
{
    try {
        await HandleAsync(msg.Body);
        await actions.CompleteMessageAsync(msg);
    }
    catch (TransientException) {
        await actions.AbandonMessageAsync(msg); // retry with backoff
    }
    catch {
        await actions.DeadLetterMessageAsync(msg, "BadPayload", ex.Message);
    }
}

This is the load-balancing story in disguise: unhealthy consumers stop poisoning the whole topic. Relate it to microservices: back pressure, dead-letter review queues, and the on-call playbook ("who replays DLQ after we fix the schema?"). Mention Azure Service Bus sessions if ordering per tenant matters.

Questions with sample answers

These are interview-ready outlines—sound human by swapping in your own metrics, team names, and war stories. The examples are generic on purpose so you can map them to what you actually shipped.

Primary prompt
Design a URL shortener used by a marketing team: 100M new URLs/month, redirects must be low-latency globally. Where do you store mappings and how do you handle hot keys?
Base62 short code → KV store (Dynamo/Redis) with TTL if needed; redirect read-heavy—cache at edge CDN with short TTL or push to regional caches. Hot keys: celebrity links—replicate read replicas, local cache, rate limit admin creates.
Primary prompt
Sketch read and write paths for a notification system at 50k events/sec. What fails first when a downstream consumer slows down?
Write to Kafka/EventHub; workers push to FCM/APNs; queue depth grows first—consumer lag alerts; backpressure producers or scale consumers; DLQ for poison messages. Mobile token invalidation causes secondary failures.
Primary prompt
You need rate limiting for an API keyed by user and by IP. Compare two architectures and name the product trade-offs of each.
Edge + Redis: low latency, centralized counters, Redis SPOF mitigated. Embedded in app: simpler ops, inconsistent per node without sync—good only with sticky sessions and small scale. Product: fairness vs cost vs complexity.
Primary prompt
Walk through how you would roll out a schema change to a high-traffic table without taking a maintenance window.
Expand/contract: add nullable column, dual-write, backfill, switch reads, enforce NOT NULL, remove old; online DDL if engine supports; feature flags; reversible steps; monitor row locks.

Follow-ups interviewers often ask

Expect nested "why?" questions—brief answers here; expand with your production defaults.

Follow-up
What consistency does the product need for reads immediately after a write?
Read-your-writes for social posting; eventual for analytics—ask PM; drives DB choice and cache invalidation.
Follow-up
How do you observe tail latency and error budgets for this design?
SLO on p99 latency and availability; burn rate alerts; RED metrics per service; synthetic probes from multiple regions.
Follow-up
What happens during a regional outage—do you fail open or closed, and who decides?
Runbook + incident commander; payments fail closed; read-only features may fail open with banner; DNS failover to healthy region with data replication lag acknowledged.
Follow-up
How would cost scale if traffic 10× overnight? What is the first knob you turn?
Autoscale compute, cache more aggressively, shed non-critical features, negotiate reserved capacity—first knob usually stop the bleeding with autoscale limits + queue.
Follow-up
Where is your single point of failure, and what is the pragmatic mitigation given team size?
Honest answer: managed Redis single shard → enable cluster mode; small team relies on cloud HA vs self-built multi-master—trade cost for time.

System Design Interview Questions

Trending Sub-topics

How to run a mock that helps

Numbers without fantasy

STAR in system design

What this looks like on Azure (and in code)

Azure API Management: rate limit by caller (policy fragment)

Azure Functions + Service Bus: poison message guard (C# shape)

Questions with sample answers

Design a URL shortener used by a marketing team: 100M new URLs/month, redirects must be low-latency globally. Where do you store mappings and how do you handle hot keys?

Sketch read and write paths for a notification system at 50k events/sec. What fails first when a downstream consumer slows down?

You need rate limiting for an API keyed by user and by IP. Compare two architectures and name the product trade-offs of each.

Walk through how you would roll out a schema change to a high-traffic table without taking a maintenance window.

Follow-ups interviewers often ask

What consistency does the product need for reads immediately after a write?

How do you observe tail latency and error budgets for this design?

What happens during a regional outage—do you fail open or closed, and who decides?

How would cost scale if traffic 10× overnight? What is the first knob you turn?

Where is your single point of failure, and what is the pragmatic mitigation given team size?

Related Interview Guides