Backend interview questions

SQL vs NoSQL Interview Questions

This topic is a magnet for shallow takes. Interviewers have heard "NoSQL scales better" too many times. What they want is a decision tree grounded in reads, writes, consistency, reporting, and team skill. Can the product tolerate eventual visibility? Do analysts need ad-hoc joins across entities? Will you regret embedding documents when invoices suddenly need line-item history audited for seven years? Those questions separate people who operate data from people who read blog headlines.

When SQL still wins hearts

When document or wide-column stores earn their keep

Flexible schema during discovery, hierarchical aggregates that map cleanly to a single key, write-heavy telemetry with predictable access paths—be specific. Mention how you handle schema evolution (version fields, dual writes, backfills) because that is where NoSQL projects live or die.

Migration honesty

If you have moved workloads between families, describe the risky weekend: validation queries, reconciliation jobs, feature flags. Then deepen storage mechanics with indexing and API design with REST vs GraphQL.

Cosmos partition key vs SQL row—same feature, two headaches

Cosmos item (JSON) for chat messages — hot partition risk

{
  "id": "msg_042",
  "threadId": "support-tenant-99",
  "tenantId": "tenant-99",
  "text": "Invoice still wrong",
  "_ts": 1710000000
}
// partition key = /tenantId → all megacorp traffic lands on one physical partition

Interview story: "We keyed by tenant because it felt natural—then our biggest customer DDOSed themselves with legitimate traffic." Contrast with Azure SQL where you worry about locking and index width instead. Show you can discuss access pattern before picking a logo.

Questions with sample answers

These are interview-ready outlines—sound human by swapping in your own metrics, team names, and war stories. The examples are generic on purpose so you can map them to what you actually shipped.

  1. Primary prompt

    Model a social feed with heavy read skew—compare Postgres vs a document store for v1 and v2.

    v1: Postgres with timeline table, indexes on (user_id, created_at), read replicas. v2 scale: fan-out on write to per-user feed shards in Cassandra/Dynamo or Redis lists—trade write amplification for read speed.

  2. Primary prompt

    When would you require ACID for a workflow vs accept eventual consistency?

    Money/inventory/ledger—ACID; analytics counters, likes display—eventual OK with UX tolerance; explain business consequence of each.

  3. Primary prompt

    How do you migrate from document storage to relational when reporting needs explode?

    CDC/stream to warehouse, dual-write period, ETL to normalized schema, backfill, switch BI to SQL, keep document for OLTP if needed or strangle to CQRS.

  4. Primary prompt

    Describe a case where ORM defaults pushed you toward the wrong storage choice.

    N+1 hidden lazy loads made document "simpler" until reporting needed joins—fixed with eager loading + indexes or moved hot read model to SQL.

Follow-ups interviewers often ask

Expect nested "why?" questions—brief answers here; expand with your production defaults.

  1. Follow-up

    How do backups and point-in-time recovery differ between your options?

    Managed SQL PITR continuous; Mongo oplog; S3 versioning for blobs—match RPO to product; test restores quarterly.

  2. Follow-up

    What is your multi-tenant isolation story in each datastore?

    Row-level tenant_id + RLS, schema per tenant for regulated, partition key per tenant in Cosmos—cost vs blast radius.

  3. Follow-up

    How do you handle full-text search requirements alongside OLTP?

    Elasticsearch/OpenSearch sync via CDC, or Postgres tsvector for moderate scale—don't heavy LIKE on OLTP.

  4. Follow-up

    What operational headcount does each choice imply?

    Self-hosted Mongo/ES needs more SRE; managed RDS/Dynamo shifts to vendor—honest staffing answer wins interviews.

  5. Follow-up

    How do you benchmark honestly instead of demo-ware?

    Production-like data skew, warm caches disabled where relevant, sustained load not burst only, measure tail latencies.