SQL Interview Questions and Answers — Real-World Guide for 4+ Years Experience (2026)

February 26, 2026 UpdatedBy Surya SinghSQL • Database • PostgreSQL • MySQL • Interview

SQL interview preparation for senior developers — query optimization, indexing, window functions, transactions, scaling

Key Takeaways

  • 110 in-depth STAR questions covering query optimization, indexing, joins, window functions, transactions, scaling, and performance tuning
  • 2Real production scenarios with specific SQL examples and measurable outcomes
  • 3Written for developers with 4+ years of SQL/database experience
  • 4Rapid-fire practice rounds (Query Patterns, Data Integrity, Advanced), E-E-A-T block, and 8 FAQs

This guide covers the pillars tested in senior SQL interviews: query optimization and execution plans, indexing strategies, complex joins and subqueries, window functions, transaction isolation, normalization vs denormalization, stored procedures and triggers, database scaling, CTEs and recursive queries, and performance tuning. Every answer uses the STAR method with real scenarios from production systems.

Answers reference the PostgreSQL documentation and the MySQL documentation.

1) How do you use EXPLAIN ANALYZE, index selection, and query rewriting to optimize slow queries?

What interviewer evaluates: ability to diagnose execution plans, identify bottlenecks, and apply fixes with measurable impact.

Situation: A dashboard query for "top customers by revenue in the last 30 days" took 4.2 seconds. The orders table had 12M rows. P95 latency spiked during peak hours, causing timeouts.

Task: Reduce query time below 100ms without changing the business logic.

Action:

Result: Query time dropped from 4.2s to 12ms (350x improvement). P95 latency for the dashboard dropped from 6.2s to 45ms. Zero timeout incidents in the next quarter.

What separates good from great: Show you understand cost vs actual time, when to use composite vs covering indexes, and how to verify with EXPLAIN ANALYZE. Include specific metrics.

2) How do you choose between B-tree vs hash vs GIN vs GiST, partial indexes, covering indexes, and composite indexes?

What interviewer evaluates: understanding of index types and when each fits.

Situation: An audit-log table had 80M rows. Queries filtered by tenant_id and created_at, but 60% of rows had deleted_at IS NOT NULL. A B-tree on (tenant_id, created_at) helped, but the index was huge and maintenance was slow.

Task: Reduce index size and improve query performance.

Action:

Result: Audit-log query time: 800ms → 3ms. Index maintenance time during VACUUM dropped 50%. GIN index enabled full-text and array searches at sub-100ms.

What separates good from great: Explain B-tree vs hash vs GIN vs GiST with real use cases. Show partial and covering indexes with before/after metrics.

Loading...

3) How do you handle correlated subqueries, LATERAL joins, anti-joins, and performance in complex queries?

What interviewer evaluates: ability to rewrite inefficient patterns and use advanced join semantics.

Situation: A "customers who haven't ordered in 90 days" report used a correlated subquery: WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.created_at > NOW() - INTERVAL '90 days'). With 2M customers and 15M orders, the query ran 45 minutes.

Task: Rewrite for production use within 2 minutes.

Action:

Result: Report runtime: 45 min → 78 seconds. LATERAL pattern adopted for "top N per group" queries. Team documented the anti-join pattern for "not exists" scenarios.

What separates good from great: Demonstrate rewriting correlated subqueries to set-based operations. Show LATERAL for per-row dependent subqueries and anti-joins for NOT EXISTS.

4) How do you use ROW_NUMBER, RANK, LAG/LEAD, running totals, and moving averages in production?

What interviewer evaluates: applied use of window functions for analytics and reporting.

Situation: A finance report needed: (1) running balance per account, (2) 7-day moving average of daily revenue, (3) rank customers by revenue within each region. The original solution used application-level loops and multiple round-trips — 30 seconds for 50K accounts.

Task: Implement in a single SQL query with measurable performance.

Action:

Result: Single query replaced 4 round-trips and application loops. Report time: 30s → 1.2s. Running balance and moving average logic moved entirely to SQL — easier to maintain.

What separates good from great: Show ROWS vs RANGE, RANK vs ROW_NUMBER, and when LAG/LEAD eliminate self-joins. Include before/after metrics.

5) How do transaction isolation levels (read committed vs repeatable read vs serializable) and deadlock debugging affect production?

What interviewer evaluates: understanding of isolation semantics and practical deadlock resolution.

Situation: A billing job updated account balances in batches. Under load, we saw 3–5 deadlocks per hour. Logs showed deadlock detected with two transactions holding conflicting locks. Financial reports occasionally showed inconsistent snapshots when run during the batch.

Task: Reduce deadlocks and ensure report consistency.

Action:

Result: Deadlocks dropped from 3–5/hour to zero. Report consistency verified with concurrent batch runs. Retry logic handled 2 edge cases in 6 months.

What separates good from great: Explain isolation levels with real scenarios. Show deadlock debugging (pg_locks, pg_stat_activity) and consistent lock ordering as the primary fix.

Loading...

6) How do you balance database normalization vs denormalization — 3NF trade-offs, materialized views, and CQRS?

What interviewer evaluates: design judgment for read vs write workloads.

Situation: A product catalog was fully normalized (products, categories, prices, inventory in separate tables). The homepage needed to show 100 products with name, price, category, and stock. A 6-table join ran 400ms; cache invalidation was complex because updates touched multiple tables.

Task: Improve read latency while preserving write consistency.

Action:

Result: Homepage P95: 400ms → 12ms. Cache invalidation simplified — one materialized view to refresh. Write path unchanged; no duplicate data in normalized tables.

What separates good from great: Explain 3NF trade-offs, when materialized views fit, and CQRS for read/write separation. Show you didn't denormalize blindly.

7) When do you use stored procedures, functions, and triggers — and how do you test and migrate them?

What interviewer evaluates: pragmatic use of server-side logic vs application code.

Situation: Order status changes triggered 4 actions: update inventory, emit event, log audit, recalculate loyalty points. Originally in application code — 4 round-trips, partial failures left inconsistent state. Team debated moving to DB triggers.

Task: Decide where logic lives and implement safely.

Action:

Result: Zero partial failures after migration. Procedure completed in 45ms vs 180ms for 4 round-trips. Audit trigger caught 2 bugs where application missed logging.

What separates good from great: Explain when procedures vs triggers vs application code. Mention testing (pgTAP, integration tests) and migration practices.

8) How do you implement read replicas, sharding strategies, connection pooling (PgBouncer), and partitioning?

What interviewer evaluates: scaling architecture and operational knowledge.

Situation: A SaaS had 8 app servers, each opening 50 DB connections. PostgreSQL max_connections was 200. Under load, we hit "too many connections" and requests failed. Read/write ratio was 9:1.

Task: Scale reads and fix connection exhaustion.

Action:

Result: Connection exhaustion resolved. Read traffic offloaded to replicas — primary CPU dropped 40%. Partitioning reduced event query time from 12s to 1.2s for date-bounded queries.

What separates good from great: Show PgBouncer config, read replica routing with lag handling, and partitioning with concrete examples. Mention sharding as next step.

Loading...

9) How do CTEs and recursive queries help with hierarchical data, graph traversal, org charts, and bill of materials?

What interviewer evaluates: applied recursive query patterns.

Situation: An org chart had 5 levels of managers. A report needed "all reports under VP X" — including indirect reports. The application used recursive loops and N+1 queries; 2000 employees took 45 minutes.

Task: Implement in SQL with sub-second response.

Action:

Result: Org report: 45 min → 0.8 seconds. Recursive CTE became the standard for hierarchical queries. BOM costing query also migrated — 30 min → 2s.

What separates good from great: Show anchor + recursive term structure, path/depth tracking, and cycle prevention. Include metrics.

10) How do you use slow query logging, pg_stat_statements, vacuuming, and connection pool monitoring for performance tuning?

What interviewer evaluates: observability and operational tuning practices.

Situation: Production DB had intermittent latency spikes. No visibility into which queries caused them. Bloat was suspected — some tables had 30% dead tuples but auto-vacuum wasn't keeping up.

Task: Implement monitoring and fix the root causes.

Action:

Result: Revenue report optimized (rewrite + index) — 12s → 0.4s. Bloat under control. Slow query log surfaced 3 more optimizations. P95 latency dropped 40%.

What separates good from great: Show pg_stat_statements for top queries, slow query logging, vacuum tuning for high-churn tables, and connection pool monitoring.

Rapid-fire interview practice — STAR answers

60-second verbal answers. Practice out loud.

Round 1: Query Patterns (MERGE/UPSERT, pivoting, JSON columns, full-text search)

Q: How do you implement MERGE/UPSERT in PostgreSQL?

Situation: Syncing user data from an external API — insert new, update existing. Originally did SELECT then INSERT or UPDATE in two round-trips.
Task: Single-statement upsert.
Action: INSERT INTO users (id, name, updated_at) VALUES ($1, $2, NOW()) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, updated_at = NOW(). Requires unique constraint/index on conflict target. EXCLUDED references the would-be-inserted row. For bulk: use INSERT ... ON CONFLICT with unnest or values list.
Result: Sync time cut 60%; one round-trip. No race between SELECT and INSERT.

Q: How do you pivot rows to columns?

Situation: Sales by month needed as columns (Jan, Feb, ...) for a report.
Task: Pivot without application logic.
Action: SELECT product_id, MAX(CASE WHEN month = 1 THEN amount END) AS jan, MAX(CASE WHEN month = 2 THEN amount END) AS feb, ... FROM sales GROUP BY product_id. Or crosstab() from tablefunc extension for dynamic pivoting. Filter + aggregate in CASE.
Result: Single query, report-ready. Used in BI dashboard.

Q: When do you use JSON/JSONB columns?

Situation: Product attributes varied by category — 20+ optional fields. Normalizing to 20 columns was impractical.
Task: Store flexible schema without sacrificing queryability.
Action: JSONB column attributes. GIN index for containment: CREATE INDEX idx_attrs ON products USING GIN (attributes). Queries: WHERE attributes @> '{"color": "red"}'. Extracted frequently queried keys to generated columns + B-tree. Avoided for primary relational data.
Result: Schema flexibility; GIN enabled fast attribute filters. 3x easier to add new attributes.

Q: How do you implement full-text search?

Situation: Product search by keyword. LIKE %term% was slow (no index use) and no ranking.
Task: Fast, ranked full-text search.
Action: to_tsvector('english', name || ' ' || description) stored in generated column. GIN index: CREATE INDEX idx_fts ON products USING GIN (search_vector). Query: WHERE search_vector @@ plainto_tsquery('english', $1). Rank with ts_rank(search_vector, query). Consider trigram (pg_trgm) for fuzzy/partial match.
Result: Search latency: 800ms → 15ms. Ranking improved relevance.

Round 2: Data Integrity (constraints, cascades, migration safety, zero-downtime schema changes)

Q: How do you choose FK constraints and cascades?

Situation: Orders referenced customers. Deleting a customer could orphan orders or fail.
Task: Enforce referential integrity with correct cascade.
Action: REFERENCES customers(id) ON DELETE RESTRICT — prevent customer delete if orders exist. For soft delete, no cascade; application logic. ON DELETE CASCADE for dependent children (e.g., order_items when order deleted). ON DELETE SET NULL when FK is optional. Document choice per relationship.
Result: No orphaned rows. Cascade behavior predictable.

Q: How do you do zero-downtime schema changes?

Situation: Adding a NOT NULL column to 50M-row table. ALTER TABLE ADD COLUMN x INT NOT NULL would lock and rewrite.
Task: Add column without blocking reads/writes.
Action: (1) ALTER TABLE ADD COLUMN x INT — nullable, fast. (2) Backfill in batches: UPDATE t SET x = computed WHERE id IN (...) AND x IS NULL. (3) ALTER TABLE ADD CONSTRAINT nn_x CHECK (x IS NOT NULL) NOT VALID; VALIDATE CONSTRAINT. (4) Add default if needed. PostgreSQL 11+ has instant ADD COLUMN for some cases.
Result: Zero lock time > 1s. Applied to 4 large tables.

Q: How do you ensure migration safety?

Situation: A migration dropped a column that an old app version still queried — outage.
Task: Safe migration process.
Action: Backward-compatible migrations: add before remove. Deprecation window: add new column, backfill, deploy app to use new column, remove old. Rollback plan: every migration has a down script. Test on staging with production-like data. Use transactions where possible: BEGIN; migration; COMMIT or wrap in migration tool.
Result: Zero migration-related outages in 18 months.

Q: What constraints do you use and when?

Situation: Invalid data (negative prices, duplicate emails) reached production.
Task: Enforce at DB layer.
Action: NOT NULL for required fields. UNIQUE for natural keys (email, slug). CHECK for domain rules: CHECK (amount >= 0), CHECK (status IN (...)). FK for referential integrity. Exclusion constraints for "no overlapping" (e.g., booking ranges). DB constraints are last line of defense — application validates first.
Result: 5 invalid inserts caught at DB before reaching production.

Round 3: Advanced (query planner internals, pg_stat_user_tables, replication lag, backup strategies)

Q: How does the query planner choose between nested loop, hash join, and merge join?

Situation: A join was slow; EXPLAIN showed nested loop with large inner side.
Task: Understand and influence planner choice.
Action: Nested loop: small outer, indexed inner — O(n*m) when inner not indexed. Hash join: one side built into hash table, other probed — good for large unsorted sets. Merge join: both sides sorted on join key — good for large sorted inputs. Planner uses cost model (cpu_tuple_cost, random_page_cost). Fix: update statistics ANALYZE, add index, or SET enable_nestloop = off to test hash join.
Result: Adding index changed plan to index nested loop; 8s → 0.2s.

Q: How do you use pg_stat_user_tables?

Situation: Need to find tables with high sequential scans or dead tuples.
Task: Identify optimization targets.
Action: SELECT relname, seq_scan, idx_scan, n_live_tup, n_dead_tup, last_vacuum FROM pg_stat_user_tables. High seq_scan + low idx_scan → consider index. n_dead_tup / n_live_tup high → vacuum needed. last_vacuum/ last_autovacuum shows recency. Joined with pg_size for bloat analysis.
Result: Found 3 tables with 90%+ seq scans; added indexes — 50% workload speedup.

Q: How do you detect and handle replication lag?

Situation: Read replicas sometimes returned stale data right after a write.
Task: Detect lag and route critical reads correctly.
Action: SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn() on replica; compare to primary's pg_current_wal_lsn(). pg_stat_replication shows lag. Application: after write, route next read to primary (read-your-writes). Or check lag before query — if > 5s, use primary. Alert when lag > 30s.
Result: Zero stale reads for critical paths. Alerts caught 2 replica delays.

Q: What backup strategies do you use?

Situation: Needed RPO < 1 hour, RTO < 4 hours.
Task: Implement backup and restore.
Action: Continuous WAL archiving (pg_backup or WAL-G) for point-in-time recovery. Full backup weekly, incremental via WAL. Test restore quarterly — measure actual RTO. Offsite copy (S3, separate region). Document restore procedure. For critical tables: logical replication to standby as additional safeguard.
Result: One restore test caught misconfigured archive; fixed before real incident. RTO verified at 2.5 hours.

Loading...

From real experience

"I've conducted 100+ database and backend interviews. The candidates who stand out don't just know SQL syntax — they can read an EXPLAIN plan and tell you exactly why a query is slow. They've added a partial index and measured the before/after. They've debugged a deadlock using pg_locks. They know when to use a recursive CTE vs application-level recursion and can explain the trade-offs."

"For 4+ years experience, interviewers expect production war stories: a query that took 45 minutes and how you rewrote it to 78 seconds, how you scaled with PgBouncer and read replicas, how you did zero-downtime schema changes on a 50M-row table. If you haven't lived these, study these answers and practice them. The STAR format with specific metrics is what gets the offer."
— Surya Singh, Senior Software Engineer & Technical Interviewer

Common interview mistakes to avoid

Frequently asked questions

What SQL topics are tested at the 4+ years experience level?

Query optimization and execution plans (EXPLAIN ANALYZE, index selection), indexing strategies (B-tree vs hash vs GIN vs GiST, partial/covering/composite indexes), complex joins and subqueries (correlated subqueries, LATERAL joins, anti-joins), window functions (ROW_NUMBER, RANK, LAG/LEAD), transaction isolation levels, normalization vs denormalization, stored procedures/triggers, database scaling (read replicas, sharding, connection pooling), CTEs and recursive queries, and performance tuning (slow query logging, pg_stat_statements, vacuuming).

How important is understanding EXPLAIN ANALYZE for senior SQL roles?

Critical. Interviewers expect you to interpret execution plans — identify sequential scans that should use indexes, spot nested loops vs hash joins, understand cost estimates vs actual timings, and explain when the planner chooses one strategy over another. You should be able to take a slow query, run EXPLAIN ANALYZE, identify the bottleneck (e.g., 2.3M rows seq scan on orders), and propose a fix (index on customer_id + created_at). Metrics matter: &quot;After adding the composite index, execution time dropped from 4.2s to 12ms.&quot;

When do you choose B-tree vs GIN vs GiST indexes?

B-tree: default for equality, range, and ordering (WHERE id = 5, WHERE created_at &gt; ..., ORDER BY). GIN: multi-value and full-text search (array contains, @&gt;, to_tsvector). GiST: geometric data, range types, full-text when you need ranking. Partial indexes: when you query a subset (WHERE status = 'active'). Covering indexes (INCLUDE): when all SELECT columns are in the index — index-only scan. Bring a real example: &quot;We added a partial index on (tenant_id) WHERE deleted_at IS NULL — queries dropped from 800ms to 3ms.&quot;

How do you approach transaction isolation level questions?

Explain the four levels: read uncommitted (rarely used), read committed (default in PostgreSQL — see committed data at statement start), repeatable read (see snapshot at transaction start, prevents dirty and non-repeatable reads), serializable (strictest — detects conflicts). For deadlocks: use pg_locks and pg_stat_activity to identify blocking queries; resolve with shorter transactions, consistent lock ordering, or retry logic. Mention real trade-offs: &quot;We use REPEATABLE READ for financial reports to avoid phantom reads; for high-throughput writes we stay at READ COMMITTED and handle conflicts in application logic.&quot;

What scaling strategies should I discuss for SQL databases?

Read replicas: offload SELECT to replicas; handle replication lag (eventual consistency). Connection pooling: PgBouncer or pgpool to limit connections, avoid exhausting max_connections. Partitioning: range (by date), list (by region), hash (for distribution) — improves maintenance and query pruning. Sharding: horizontal split by shard key; discuss consistent hashing, cross-shard queries, and rebalancing. CQRS/materialized views: denormalize for reads when writes are infrequent. Bring metrics: &quot;PgBouncer allowed 8 app servers (400 connections each) to share 50 DB connections; connection errors dropped to zero.&quot;

How do CTEs and recursive queries help in production?

CTEs (WITH): improve readability, materialize intermediate results, avoid repeating subqueries. Recursive CTEs: hierarchical data (org charts, bill of materials, graph traversal). Example: employee hierarchy, category trees, path finding. Know the structure: anchor (base case) UNION ALL recursive term (references CTE). Pitfall: missing termination condition causes infinite loop. Real use: &quot;We used a recursive CTE to flatten a 5-level category tree; the report that took 45 minutes with application-level recursion now runs in 2 seconds.&quot;

What performance tuning tools and practices should I know?

Slow query logging: log_min_duration_statement to capture queries above a threshold. pg_stat_statements: top N queries by total time, mean time, calls. pg_stat_user_tables: seq_scan, idx_scan, n_tup_ins/upd/del — identify hot tables. VACUUM: reclaim dead rows; ANALYZE for statistics. Auto-vacuum tuning: adjust threshold and scale factor for high-churn tables. Connection pool monitoring: track active vs idle connections. Bring a debugging story: &quot;pg_stat_statements showed a single query accounting for 60% of DB time; we rewrote it and cut P95 latency by 40%.&quot;

How do I demonstrate database design maturity in an interview?

Discuss 3NF trade-offs: when to normalize vs denormalize. When denormalization helps: read-heavy workloads, avoid expensive joins, materialized views for aggregations. CQRS: separate read and write models; read model can be denormalized. Migration strategy: zero-downtime schema changes (add column nullable, backfill, add constraint, add default). Discuss constraints, cascades, and when to enforce integrity in DB vs application. Show you think about indexes at design time, not just after production issues.

Loading...

Surya Singh

Surya Singh

Azure Solutions Architect & AI Engineer

Microsoft-certified Azure Solutions Architect with 8+ years in enterprise software, cloud architecture, and AI/ML deployment. I build production AI systems and write about what actually works—based on shipping code, not theory.

  • Microsoft Certified: Azure Solutions Architect Expert
  • Built 20+ production AI/ML pipelines on Azure
  • 8+ years in .NET, C#, and cloud-native architecture