Load Balancing System Design

Load balancers are the traffic cops of the internet. In interviews they are a lens for discussing failure detection, elasticity, and the difference between a connection that looks healthy and a backend that is actually ready to serve work. Good candidates explain not only round-robin vs least connections but also what happens during deploys: how you drain in-flight requests, how long you wait, and what you show the user while nodes rotate.

Decisions worth articulating

L4 vs L7: throughput and TLS termination trade-offs; routing based on IP/port vs host header, cookies, or URL paths.
Health checks: shallow TCP vs deep HTTP; avoiding flapping; backoff when half your pool is yellow.
Sticky sessions: when they save stateful misery and when they sabotage even utilization—say you would rather fix the app than rely on stickiness forever.

Global and multi-AZ angles

DNS geolocation, Anycast, latency-based routing—pick one level of depth and own it. If you have never run global traffic, say that, then describe how you would learn: shadow traffic, synthetic probes, RUM dashboards. Honesty plus a plan beats pretending you built CloudFront for fun.

Bridge to the next topic

Once traffic lands on a node, services still talk to each other. Pair this page with microservices to practice end-to-end stories: a user click flowing through gateway, service mesh, and database replicas—with load-aware retries that do not create retry storms.

Health probes that match reality (Azure Load Balancer / App Gateway mindset)

Bicep-style probe — the idea you defend in the room

# Conceptual probe (App Gateway / Azure LB style YAML)
protocol: Http
port: 8080
path: /health/ready   # hits DB + dependency checks, not a static 200
intervalSeconds: 5
unhealthyThreshold: 2
# drain in-flight connections before removing node from pool

The classic outage: probe hits /, returns 200, but the pod cannot reach Azure SQL—traffic keeps coming. Seniors argue for readiness vs liveness, and for draining connections before yanking a node. Say that and draw a stick figure timeline.

Questions with sample answers

These are interview-ready outlines—sound human by swapping in your own metrics, team names, and war stories. The examples are generic on purpose so you can map them to what you actually shipped.

Primary prompt
You deploy a new version; connections are long-lived. Outline drain, cutover, and rollback.
Mark instance draining—stop new connections, wait idle timeout or max connection age, then deploy; use connection draining on ALB/App Gateway; blue/green with traffic shift; rollback by flipping target group back.
Example: WebSockets—notify clients to reconnect; dual-write during migration window.
Primary prompt
Why would least-connections beat round-robin for your workload—and when would it backfire?
Good when request cost varies—long-poll or heavy queries—avoids piling onto busy node. Backfires if health checks lie or connection counts stale; very cheap requests add overhead tracking connections.
Primary prompt
How do you place health checks so you catch "returns 200 but cannot reach the database" failures?
Deep readiness probe hitting DB ping or lightweight query; separate liveness (process up) from readiness (can serve). Fail readiness removes node from pool while keeping process alive for debugging.
Primary prompt
Describe Anycast vs DNS-based global routing at a level you could whiteboard in ten minutes.
Anycast: same IP announced from multiple POPs—BGP routes to nearest. DNS: geo or latency-based records to regional VIPs; TTL and resolver caching matter; good for active/active with health checks.

Follow-ups interviewers often ask

Expect nested "why?" questions—brief answers here; expand with your production defaults.

Follow-up
What happens to in-flight requests when a backend is marked unhealthy mid-request?
LB usually lets current request finish unless process dies; client may see timeout; idempotent retries handle partial failures—document behavior per product.
Follow-up
How do sticky sessions interact with autoscaling events?
New instances get no stickies until new sessions; scale-in may strand users—use session store instead of LB stickiness when possible; drain before terminate.
Follow-up
What TLS termination strategy reduces CPU load without sacrificing security posture?
Terminate at LB with modern cipher suites and session resumption; re-encrypt to backend if required (zero trust internal); hardware offload / TLS 1.3; keep cert rotation automated.
Follow-up
How do you detect and mitigate a slowloris-style attack at the load balancer?
Timeouts on headers/body, max header size, connection limits per IP, WAF rules, rate limit new connections, anomaly detection on connection duration.
Follow-up
What is your observability story for per-backend latency skew?
Per-target metrics in LB exporter, distributed traces with backend tag, compare p99 across AZ; alert when one instance is cold or misconfigured.

Load Balancing System Design

Decisions worth articulating

Global and multi-AZ angles

Bridge to the next topic

Health probes that match reality (Azure Load Balancer / App Gateway mindset)

Bicep-style probe — the idea you defend in the room

Questions with sample answers

You deploy a new version; connections are long-lived. Outline drain, cutover, and rollback.

Why would least-connections beat round-robin for your workload—and when would it backfire?

How do you place health checks so you catch "returns 200 but cannot reach the database" failures?

Describe Anycast vs DNS-based global routing at a level you could whiteboard in ten minutes.

Follow-ups interviewers often ask

What happens to in-flight requests when a backend is marked unhealthy mid-request?

How do sticky sessions interact with autoscaling events?

What TLS termination strategy reduces CPU load without sacrificing security posture?

How do you detect and mitigate a slowloris-style attack at the load balancer?

What is your observability story for per-backend latency skew?