System design interview questions

Microservices System Design

Microservices are not a purity badge; they are a bet that independent deployability and team autonomy are worth the network tax. Interviewers with experience will probe whether you understand that tax: partial failures, schema evolution, distributed tracing, and the meeting time spent arguing about who owns a queue when it turns red at midnight. Your job is to show you can draw boundaries that match how the business actually ships features—not how a diagram looks on a t-shirt.

Boundaries that survive contact with reality

Anti-patterns to name confidently

Distributed monoliths (everything deploys together but calls over HTTP), chatty fine-grained services, shared databases masquerading as loose coupling—if you can describe how you detected one of these in the wild and what incremental step improved it, you will sound like someone who ships.

Cap it off with consistency literacy

Microservices amplify consistency questions. Read CAP theorem next and practice one story where you chose an eventually consistent read after a write because the product could tolerate seconds of lag—and another where you paid for stronger guarantees because money was involved.

One message, two services, one angry finance team

Outbox-style pseudo-SQL (ordering + eventual delivery)

BEGIN TRANSACTION;
  INSERT INTO orders(id, total) VALUES (@id, @total);
  INSERT INTO outbox(topic, payload)
  VALUES ('orders.created', @payloadJson); -- built in app layer as JSON
COMMIT;

-- Worker polls outbox, publishes to Azure Service Bus, then:
UPDATE outbox SET sent_at = SYSUTCDATETIME() WHERE id = @outbox_id;

Without something like this, your Azure Functions consumer sees money captured in SQL but never gets the event—support tickets write themselves. Interview narrative: "at-least-once delivery, idempotent handlers, dead-letter when JSON does not match schema."

Questions with sample answers

These are interview-ready outlines—sound human by swapping in your own metrics, team names, and war stories. The examples are generic on purpose so you can map them to what you actually shipped.

  1. Primary prompt

    Split a monolith order service: propose boundaries and the first three events you would publish.

    Boundaries: orders (write model), payments (integration), inventory (reservation), notifications (async). Events: OrderCreated, PaymentAuthorized, OrderShipped—consumers subscribe without synchronous chains.

  2. Primary prompt

    How do you handle schema evolution for async events without breaking older consumers?

    Additive fields only by default; version events or use schema registry with compatibility mode FORWARD/BACKWARD; dual-publish during migration; consumers ignore unknown fields.

  3. Primary prompt

    A synchronous chain of five calls times out—where do you insert caching, bulkheads, or async handoff?

    Shorten critical path: return 202 + process async with idempotency; cache read-heavy reference data; bulkhead thread pools per dependency; timeout + fallback for non-critical steps; saga for compensations.

  4. Primary prompt

    What is your on-call story for tracing a 500 across five services—tools and conventions?

    OpenTelemetry trace ID propagated on all calls, structured logs with same ID, service map in APM, SLO dashboards; runbook: start at edge, follow span with error=true, check recent deploys per service.

Follow-ups interviewers often ask

Expect nested "why?" questions—brief answers here; expand with your production defaults.

  1. Follow-up

    When would you merge two services back together—signals that the split failed?

    Chatty sync calls, shared DB anyway, same deploy cadence, team too small for ops overhead—if bounded context wasn't real, merge to reduce tax.

  2. Follow-up

    How do you prevent distributed transactions from becoming the default hammer?

    Prefer sagas, outbox pattern, idempotent workers; 2PC only when legal/audit demands—document cost; design for eventual consistency with clear UX.

  3. Follow-up

    What is your policy on retries and idempotency keys at service boundaries?

    Retry only safe methods or idempotent keys; exponential backoff; dedupe store for keys 24h; propagate idempotency header end-to-end for payments.

  4. Follow-up

    How do you test contracts between teams—consumer-driven, schema registries, or both?

    Pact/consumer-driven for HTTP events; protobuf/Avro registry for Kafka; CI breaks on incompatible change; staging integrated environment for smoke.

  5. Follow-up

    What organizational constraint usually breaks microservice purity in real companies?

    Team size, on-call rotation, shared database legacy, compliance centralization—honesty beats textbook purity.