SQL Interview Questions and Answers — Real-World Guide for 4+ Years Experience (2026)

Q: What SQL topics are tested at the 4+ years experience level?

Query optimization and execution plans (EXPLAIN ANALYZE, index selection), indexing strategies (B-tree vs hash vs GIN vs GiST, partial/covering/composite indexes), complex joins and subqueries (correlated subqueries, LATERAL joins, anti-joins), window functions (ROW_NUMBER, RANK, LAG/LEAD), transaction isolation levels, normalization vs denormalization, stored procedures/triggers, database scaling (read replicas, sharding, connection pooling), CTEs and recursive queries, and performance tuning (slow query logging, pg_stat_statements, vacuuming).

Q: How important is understanding EXPLAIN ANALYZE for senior SQL roles?

Critical. Interviewers expect you to interpret execution plans — identify sequential scans that should use indexes, spot nested loops vs hash joins, understand cost estimates vs actual timings, and explain when the planner chooses one strategy over another. You should be able to take a slow query, run EXPLAIN ANALYZE, identify the bottleneck (e.g., 2.3M rows seq scan on orders), and propose a fix (index on customer_id + created_at). Metrics matter: "After adding the composite index, execution time dropped from 4.2s to 12ms."

Q: When do you choose B-tree vs GIN vs GiST indexes?

B-tree: default for equality, range, and ordering (WHERE id = 5, WHERE created_at > ..., ORDER BY). GIN: multi-value and full-text search (array contains, @>, to_tsvector). GiST: geometric data, range types, full-text when you need ranking. Partial indexes: when you query a subset (WHERE status = 'active'). Covering indexes (INCLUDE): when all SELECT columns are in the index — index-only scan. Bring a real example: "We added a partial index on (tenant_id) WHERE deleted_at IS NULL — queries dropped from 800ms to 3ms."

February 26, 2026 Updated • By Surya Singh • SQL • Database • PostgreSQL • MySQL • Interview

SQL interview preparation for senior developers — query optimization, indexing, window functions, transactions, scaling

Key Takeaways

110 in-depth STAR questions covering query optimization, indexing, joins, window functions, transactions, scaling, and performance tuning
2Real production scenarios with specific SQL examples and measurable outcomes
3Written for developers with 4+ years of SQL/database experience
4Rapid-fire practice rounds (Query Patterns, Data Integrity, Advanced), E-E-A-T block, and 8 FAQs

This guide covers the pillars tested in senior SQL interviews: query optimization and execution plans, indexing strategies, complex joins and subqueries, window functions, transaction isolation, normalization vs denormalization, stored procedures and triggers, database scaling, CTEs and recursive queries, and performance tuning. Every answer uses the STAR method with real scenarios from production systems.

Answers reference the PostgreSQL documentation and the MySQL documentation.

1. Query Optimization and Execution Plans
2. Indexing Strategies
3. Complex Joins and Subqueries
4. Window Functions
5. Transaction Isolation Levels
6. Normalization vs Denormalization
7. Stored Procedures, Functions, and Triggers
8. Database Scaling
9. CTEs and Recursive Queries
10. Performance Tuning
Rapid-Fire Practice (3 Rounds)
From Real Experience
Common Mistakes to Avoid
FAQ (8 Questions)
Related Interview Guides

1) How do you use EXPLAIN ANALYZE, index selection, and query rewriting to optimize slow queries?

What interviewer evaluates: ability to diagnose execution plans, identify bottlenecks, and apply fixes with measurable impact.

Situation: A dashboard query for "top customers by revenue in the last 30 days" took 4.2 seconds. The orders table had 12M rows. P95 latency spiked during peak hours, causing timeouts.

Task: Reduce query time below 100ms without changing the business logic.

Action:

Run EXPLAIN ANALYZE: EXPLAIN (ANALYZE, BUFFERS) SELECT customer_id, SUM(amount) ... FROM orders WHERE created_at >= NOW() - INTERVAL '30 days' GROUP BY customer_id ORDER BY 2 DESC LIMIT 100. Plan showed Seq Scan on orders costing 2.3M rows, 4200ms actual time.
Index selection: The filter was on created_at and customer_id was needed for grouping. Added composite index: CREATE INDEX idx_orders_created_customer ON orders (created_at, customer_id) INCLUDE (amount). Covering index allowed index-only scan — no heap fetch.
Query rewriting: Moved the date filter to a CTE to help planner: WITH recent AS (SELECT customer_id, amount FROM orders WHERE created_at >= ...) SELECT customer_id, SUM(amount) .... Planner chose Index Scan on the new index.
Verify: Re-ran EXPLAIN ANALYZE. Plan showed Index Only Scan using idx_orders_created_customer, actual time 12ms. Confirmed with EXPLAIN (ANALYZE, BUFFERS) — shared hits increased, no sequential scans.
Monitor: Added the query to pg_stat_statements monitoring. Set up alert if mean time exceeded 50ms.

Result: Query time dropped from 4.2s to 12ms (350x improvement). P95 latency for the dashboard dropped from 6.2s to 45ms. Zero timeout incidents in the next quarter.

What separates good from great: Show you understand cost vs actual time, when to use composite vs covering indexes, and how to verify with EXPLAIN ANALYZE. Include specific metrics.

2) How do you choose between B-tree vs hash vs GIN vs GiST, partial indexes, covering indexes, and composite indexes?

What interviewer evaluates: understanding of index types and when each fits.

Situation: An audit-log table had 80M rows. Queries filtered by tenant_id and created_at, but 60% of rows had deleted_at IS NOT NULL. A B-tree on (tenant_id, created_at) helped, but the index was huge and maintenance was slow.

Task: Reduce index size and improve query performance.

Action:

Partial index: Most queries used WHERE deleted_at IS NULL. Created CREATE INDEX idx_audit_tenant_created_active ON audit_log (tenant_id, created_at) WHERE deleted_at IS NULL. Index size dropped 60%; only active rows indexed.
B-tree for range and order: B-tree supports WHERE created_at >= ? AND created_at < ? and ORDER BY created_at. Hash is equality-only — not suitable here.
GIN for array/JSONB: Another table had tags text[]. GIN index: CREATE INDEX idx_items_tags ON items USING GIN (tags). Queries like WHERE tags @> '{active}' used the index; B-tree would not.
Covering index: For a read-heavy report that only needed tenant_id, amount, created_at, added INCLUDE (amount) so index-only scan avoided heap access.
Composite order: Put the most selective column first when it matched query patterns. For tenant_id = X AND created_at > Y, (tenant_id, created_at) was correct.

Result: Audit-log query time: 800ms → 3ms. Index maintenance time during VACUUM dropped 50%. GIN index enabled full-text and array searches at sub-100ms.

What separates good from great: Explain B-tree vs hash vs GIN vs GiST with real use cases. Show partial and covering indexes with before/after metrics.

3) How do you handle correlated subqueries, LATERAL joins, anti-joins, and performance in complex queries?

What interviewer evaluates: ability to rewrite inefficient patterns and use advanced join semantics.

Situation: A "customers who haven't ordered in 90 days" report used a correlated subquery: WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.id AND o.created_at > NOW() - INTERVAL '90 days'). With 2M customers and 15M orders, the query ran 45 minutes.

Task: Rewrite for production use within 2 minutes.

Action:

Anti-join with LEFT JOIN: Rewrote as SELECT c.* FROM customers c LEFT JOIN (SELECT DISTINCT customer_id FROM orders WHERE created_at >= ...) o ON c.id = o.customer_id WHERE o.customer_id IS NULL. Planner could use hash anti-join; single pass over both tables.
LATERAL for per-row subqueries: Another query needed "top 3 orders per customer." Used SELECT c.id, o.* FROM customers c, LATERAL (SELECT * FROM orders WHERE customer_id = c.id ORDER BY created_at DESC LIMIT 3) o. LATERAL allows the subquery to reference c; efficient with index on (customer_id, created_at).
Correlated subquery pitfalls: Correlated subqueries execute once per outer row — O(n*m). Explained to the team when to avoid: use JOIN, LATERAL, or CTE with set-based logic instead.
EXPLAIN verified: Anti-join plan showed Hash Anti Join; LATERAL showed Nested Loop with Index Scan. Both completed in under 90 seconds.

Result: Report runtime: 45 min → 78 seconds. LATERAL pattern adopted for "top N per group" queries. Team documented the anti-join pattern for "not exists" scenarios.

What separates good from great: Demonstrate rewriting correlated subqueries to set-based operations. Show LATERAL for per-row dependent subqueries and anti-joins for NOT EXISTS.

4) How do you use ROW_NUMBER, RANK, LAG/LEAD, running totals, and moving averages in production?

What interviewer evaluates: applied use of window functions for analytics and reporting.

Situation: A finance report needed: (1) running balance per account, (2) 7-day moving average of daily revenue, (3) rank customers by revenue within each region. The original solution used application-level loops and multiple round-trips — 30 seconds for 50K accounts.

Task: Implement in a single SQL query with measurable performance.

Action:

Running total: SELECT account_id, amount, SUM(amount) OVER (PARTITION BY account_id ORDER BY created_at ROWS UNBOUNDED PRECEDING) AS running_balance FROM transactions. ROWS UNBOUNDED PRECEDING ensures correct order; added index on (account_id, created_at).
Moving average: SELECT date, revenue, AVG(revenue) OVER (ORDER BY date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS ma_7d FROM daily_revenue. RANGE vs ROWS — used ROWS for exact 7-day window with no gaps.
Rank within partition: SELECT *, RANK() OVER (PARTITION BY region_id ORDER BY total_revenue DESC) AS rank_in_region FROM customer_totals. RANK vs ROW_NUMBER: RANK ties get same rank, gaps follow; ROW_NUMBER assigns unique numbers.
LAG/LEAD for period comparison: SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month, revenue - LAG(revenue) OVER (ORDER BY month) AS mom_growth FROM monthly_sales.
Deduplication with ROW_NUMBER: WITH ranked AS (SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) rn FROM events) SELECT * FROM ranked WHERE rn = 1 — keeps latest event per user.

Result: Single query replaced 4 round-trips and application loops. Report time: 30s → 1.2s. Running balance and moving average logic moved entirely to SQL — easier to maintain.

What separates good from great: Show ROWS vs RANGE, RANK vs ROW_NUMBER, and when LAG/LEAD eliminate self-joins. Include before/after metrics.

5) How do transaction isolation levels (read committed vs repeatable read vs serializable) and deadlock debugging affect production?

What interviewer evaluates: understanding of isolation semantics and practical deadlock resolution.

Situation: A billing job updated account balances in batches. Under load, we saw 3–5 deadlocks per hour. Logs showed deadlock detected with two transactions holding conflicting locks. Financial reports occasionally showed inconsistent snapshots when run during the batch.

Task: Reduce deadlocks and ensure report consistency.

Action:

Deadlock analysis: Queried pg_locks and pg_stat_activity during a deadlock. Found Transaction A locked rows in order id 1, 2, 3; Transaction B locked 3, 2, 1. Different ordering caused circular wait.
Consistent lock ordering: Enforced application-level order: always lock by ascending account_id. Rewrote batch: SELECT ... FROM accounts WHERE ... ORDER BY id FOR UPDATE. Both transactions now acquired locks in the same order — no circular wait.
Shorter transactions: Moved heavy computation (API calls, logging) outside the transaction. DB transaction only did BEGIN; UPDATE ...; COMMIT. Reduced lock hold time.
Isolation for reports: Financial reports needed repeatable read — same snapshot for the entire report. Used SET TRANSACTION ISOLATION LEVEL REPEATABLE READ before long-running report queries. Avoided phantom reads when batch updated accounts during the report.
Retry logic: For deadlocks that still occurred (rare), implemented exponential backoff retry on deadlock detected error. Max 3 retries.

Result: Deadlocks dropped from 3–5/hour to zero. Report consistency verified with concurrent batch runs. Retry logic handled 2 edge cases in 6 months.

What separates good from great: Explain isolation levels with real scenarios. Show deadlock debugging (pg_locks, pg_stat_activity) and consistent lock ordering as the primary fix.

6) How do you balance database normalization vs denormalization — 3NF trade-offs, materialized views, and CQRS?

What interviewer evaluates: design judgment for read vs write workloads.

Situation: A product catalog was fully normalized (products, categories, prices, inventory in separate tables). The homepage needed to show 100 products with name, price, category, and stock. A 6-table join ran 400ms; cache invalidation was complex because updates touched multiple tables.

Task: Improve read latency while preserving write consistency.

Action:

Materialized view: Created CREATE MATERIALIZED VIEW mv_product_listing AS SELECT p.id, p.name, c.name AS category, pr.amount, i.quantity FROM products p JOIN categories c ... JOIN prices pr ... JOIN inventory i .... Refreshed every 5 minutes. Homepage query became SELECT * FROM mv_product_listing LIMIT 100 — 8ms.
3NF trade-off: Kept normalized schema for writes (single source of truth). Materialized view was the read model — acceptable 5-min staleness for homepage. For checkout, queried live tables.
CQRS pattern: Documented: write path updates normalized tables; read path uses materialized view. Refresh triggered by a scheduled job; for critical changes, added REFRESH MATERIALIZED VIEW CONCURRENTLY on-demand for specific product IDs.
When to denormalize: For this case, materialized view was better than duplicating columns — single refresh point. For true denormalization (e.g., embedding JSON in a column), use when reads vastly exceed writes and consistency window is relaxed.

Result: Homepage P95: 400ms → 12ms. Cache invalidation simplified — one materialized view to refresh. Write path unchanged; no duplicate data in normalized tables.

What separates good from great: Explain 3NF trade-offs, when materialized views fit, and CQRS for read/write separation. Show you didn't denormalize blindly.

7) When do you use stored procedures, functions, and triggers — and how do you test and migrate them?

What interviewer evaluates: pragmatic use of server-side logic vs application code.

Situation: Order status changes triggered 4 actions: update inventory, emit event, log audit, recalculate loyalty points. Originally in application code — 4 round-trips, partial failures left inconsistent state. Team debated moving to DB triggers.

Task: Decide where logic lives and implement safely.

Action:

Stored procedure for atomic multi-step: Created CREATE PROCEDURE complete_order(p_order_id UUID) that did all 4 steps in one transaction. Called from application: CALL complete_order($1). Single round-trip; all-or-nothing.
When to use: Procedures for multi-statement transactions the app triggers. Functions for reusable calculations (e.g., tax computation) called from SQL. Triggers for automatic side effects (audit log, updated_at) that must run on every change.
Trigger for audit: Added CREATE TRIGGER tr_audit_orders AFTER INSERT OR UPDATE ON orders FOR EACH ROW EXECUTE FUNCTION audit_orders(). Audit table populated automatically; application couldn't forget.
Testing: Unit tests for functions with pgTAP. Integration tests: start transaction, call procedure, assert all 4 tables updated, rollback. No production data.
Migration strategy: Versioned procedures in migration files. Used CREATE OR REPLACE for non-breaking changes. For breaking changes: new procedure name, migrate callers, drop old.

Result: Zero partial failures after migration. Procedure completed in 45ms vs 180ms for 4 round-trips. Audit trigger caught 2 bugs where application missed logging.

What separates good from great: Explain when procedures vs triggers vs application code. Mention testing (pgTAP, integration tests) and migration practices.

8) How do you implement read replicas, sharding strategies, connection pooling (PgBouncer), and partitioning?

What interviewer evaluates: scaling architecture and operational knowledge.

Situation: A SaaS had 8 app servers, each opening 50 DB connections. PostgreSQL max_connections was 200. Under load, we hit "too many connections" and requests failed. Read/write ratio was 9:1.

Task: Scale reads and fix connection exhaustion.

Action:

Connection pooling: Deployed PgBouncer in transaction mode. App servers connected to PgBouncer (pool size 400); PgBouncer maintained 50 connections to PostgreSQL. max_client_conn=400, default_pool_size=50. Connection errors dropped to zero.
Read replicas: Set up 2 streaming replicas. Routed SELECT to replicas via application logic (read-your-writes: critical reads after write used primary). Replication lag monitored — if > 5 seconds, fallback to primary for affected queries.
Partitioning: The events table (500M rows) was range-partitioned by month. CREATE TABLE events_2026_02 PARTITION OF events FOR VALUES FROM ('2026-02-01') TO ('2026-03-01'). Queries with date filters pruned to relevant partition; VACUUM and index builds per partition.
Sharding (future): Designed sharding by tenant_id for when single DB hit limits. Consistent hashing for distribution; cross-shard queries would use application-side merge or a federated query layer.

Result: Connection exhaustion resolved. Read traffic offloaded to replicas — primary CPU dropped 40%. Partitioning reduced event query time from 12s to 1.2s for date-bounded queries.

What separates good from great: Show PgBouncer config, read replica routing with lag handling, and partitioning with concrete examples. Mention sharding as next step.

9) How do CTEs and recursive queries help with hierarchical data, graph traversal, org charts, and bill of materials?

What interviewer evaluates: applied recursive query patterns.

Situation: An org chart had 5 levels of managers. A report needed "all reports under VP X" — including indirect reports. The application used recursive loops and N+1 queries; 2000 employees took 45 minutes.

Task: Implement in SQL with sub-second response.

Action:

Recursive CTE structure: WITH RECURSIVE tree AS (SELECT id, name, manager_id, 1 AS depth FROM employees WHERE id = $1 UNION ALL SELECT e.id, e.name, e.manager_id, t.depth + 1 FROM employees e JOIN tree t ON e.manager_id = t.id) SELECT * FROM tree. Anchor: VP row. Recursive: all rows whose manager is in tree. Termination: no more rows.
Path and depth: Extended to capture path: path || e.id and depth for level limits. SELECT * FROM tree WHERE depth <= 3 for "direct reports within 3 levels."
Bill of materials: Similar pattern for BOM: anchor = root component, recursive = components that are parts of parent. Sum quantities along the path for total material cost.
Index: Added index on manager_id — recursive joins were the hot path. EXPLAIN showed Index Scan on employees.
Cycle prevention: For graphs that might have cycles, added WHERE e.id != ANY(path) to avoid infinite recursion.

Result: Org report: 45 min → 0.8 seconds. Recursive CTE became the standard for hierarchical queries. BOM costing query also migrated — 30 min → 2s.

What separates good from great: Show anchor + recursive term structure, path/depth tracking, and cycle prevention. Include metrics.

10) How do you use slow query logging, pg_stat_statements, vacuuming, and connection pool monitoring for performance tuning?

What interviewer evaluates: observability and operational tuning practices.

Situation: Production DB had intermittent latency spikes. No visibility into which queries caused them. Bloat was suspected — some tables had 30% dead tuples but auto-vacuum wasn't keeping up.

Task: Implement monitoring and fix the root causes.

Action:

pg_stat_statements: Enabled pg_stat_statements extension. Queried SELECT query, calls, mean_exec_time, total_exec_time FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20. Found one query (revenue report) accounting for 60% of DB time — 12s mean, 50K calls/day.
Slow query logging: Set log_min_duration_statement = 1000 (log queries > 1s). Logs fed to a dashboard. Identified ad-hoc queries from analysts; created a cached materialized view for the common one.
Vacuum tuning: For high-churn tables, reduced autovacuum_vacuum_scale_factor from 0.2 to 0.05 — vacuum ran 4x more often. Manually ran VACUUM ANALYZE on the worst table; bloat dropped from 28% to 5%.
Connection pool monitoring: PgBouncer stats: SHOW POOLS and SHOW STATS. Alert when maxwait > 1 second — clients waiting for a connection. Tuned pool size based on actual concurrency.
pg_stat_user_tables: Monitored seq_scan, idx_scan, n_tup_upd, n_dead_tup. High seq_scan + low idx_scan suggested missing index. High n_dead_tup triggered vacuum investigation.

Result: Revenue report optimized (rewrite + index) — 12s → 0.4s. Bloat under control. Slow query log surfaced 3 more optimizations. P95 latency dropped 40%.

What separates good from great: Show pg_stat_statements for top queries, slow query logging, vacuum tuning for high-churn tables, and connection pool monitoring.

Rapid-fire interview practice — STAR answers

60-second verbal answers. Practice out loud.

Round 1: Query Patterns (MERGE/UPSERT, pivoting, JSON columns, full-text search)

Q: How do you implement MERGE/UPSERT in PostgreSQL?

Situation: Syncing user data from an external API — insert new, update existing. Originally did SELECT then INSERT or UPDATE in two round-trips.
Task: Single-statement upsert.
Action: INSERT INTO users (id, name, updated_at) VALUES ($1, $2, NOW()) ON CONFLICT (id) DO UPDATE SET name = EXCLUDED.name, updated_at = NOW(). Requires unique constraint/index on conflict target. EXCLUDED references the would-be-inserted row. For bulk: use INSERT ... ON CONFLICT with unnest or values list.
Result: Sync time cut 60%; one round-trip. No race between SELECT and INSERT.

Q: How do you pivot rows to columns?

Situation: Sales by month needed as columns (Jan, Feb, ...) for a report.
Task: Pivot without application logic.
Action: SELECT product_id, MAX(CASE WHEN month = 1 THEN amount END) AS jan, MAX(CASE WHEN month = 2 THEN amount END) AS feb, ... FROM sales GROUP BY product_id. Or crosstab() from tablefunc extension for dynamic pivoting. Filter + aggregate in CASE.
Result: Single query, report-ready. Used in BI dashboard.

Q: When do you use JSON/JSONB columns?

Situation: Product attributes varied by category — 20+ optional fields. Normalizing to 20 columns was impractical.
Task: Store flexible schema without sacrificing queryability.
Action: JSONB column attributes. GIN index for containment: CREATE INDEX idx_attrs ON products USING GIN (attributes). Queries: WHERE attributes @> '{"color": "red"}'. Extracted frequently queried keys to generated columns + B-tree. Avoided for primary relational data.
Result: Schema flexibility; GIN enabled fast attribute filters. 3x easier to add new attributes.

Q: How do you implement full-text search?

Situation: Product search by keyword. LIKE %term% was slow (no index use) and no ranking.
Task: Fast, ranked full-text search.
Action: to_tsvector('english', name || ' ' || description) stored in generated column. GIN index: CREATE INDEX idx_fts ON products USING GIN (search_vector). Query: WHERE search_vector @@ plainto_tsquery('english', $1). Rank with ts_rank(search_vector, query). Consider trigram (pg_trgm) for fuzzy/partial match.
Result: Search latency: 800ms → 15ms. Ranking improved relevance.

Round 2: Data Integrity (constraints, cascades, migration safety, zero-downtime schema changes)

Q: How do you choose FK constraints and cascades?

Situation: Orders referenced customers. Deleting a customer could orphan orders or fail.
Task: Enforce referential integrity with correct cascade.
Action: REFERENCES customers(id) ON DELETE RESTRICT — prevent customer delete if orders exist. For soft delete, no cascade; application logic. ON DELETE CASCADE for dependent children (e.g., order_items when order deleted). ON DELETE SET NULL when FK is optional. Document choice per relationship.
Result: No orphaned rows. Cascade behavior predictable.

Q: How do you do zero-downtime schema changes?

Situation: Adding a NOT NULL column to 50M-row table. ALTER TABLE ADD COLUMN x INT NOT NULL would lock and rewrite.
Task: Add column without blocking reads/writes.
Action: (1) ALTER TABLE ADD COLUMN x INT — nullable, fast. (2) Backfill in batches: UPDATE t SET x = computed WHERE id IN (...) AND x IS NULL. (3) ALTER TABLE ADD CONSTRAINT nn_x CHECK (x IS NOT NULL) NOT VALID; VALIDATE CONSTRAINT. (4) Add default if needed. PostgreSQL 11+ has instant ADD COLUMN for some cases.
Result: Zero lock time > 1s. Applied to 4 large tables.

Q: How do you ensure migration safety?

Situation: A migration dropped a column that an old app version still queried — outage.
Task: Safe migration process.
Action: Backward-compatible migrations: add before remove. Deprecation window: add new column, backfill, deploy app to use new column, remove old. Rollback plan: every migration has a down script. Test on staging with production-like data. Use transactions where possible: BEGIN; migration; COMMIT or wrap in migration tool.
Result: Zero migration-related outages in 18 months.

Q: What constraints do you use and when?

Situation: Invalid data (negative prices, duplicate emails) reached production.
Task: Enforce at DB layer.
Action: NOT NULL for required fields. UNIQUE for natural keys (email, slug). CHECK for domain rules: CHECK (amount >= 0), CHECK (status IN (...)). FK for referential integrity. Exclusion constraints for "no overlapping" (e.g., booking ranges). DB constraints are last line of defense — application validates first.
Result: 5 invalid inserts caught at DB before reaching production.

Round 3: Advanced (query planner internals, pg_stat_user_tables, replication lag, backup strategies)

Q: How does the query planner choose between nested loop, hash join, and merge join?

Situation: A join was slow; EXPLAIN showed nested loop with large inner side.
Task: Understand and influence planner choice.
Action: Nested loop: small outer, indexed inner — O(n*m) when inner not indexed. Hash join: one side built into hash table, other probed — good for large unsorted sets. Merge join: both sides sorted on join key — good for large sorted inputs. Planner uses cost model (cpu_tuple_cost, random_page_cost). Fix: update statistics ANALYZE, add index, or SET enable_nestloop = off to test hash join.
Result: Adding index changed plan to index nested loop; 8s → 0.2s.

Q: How do you use pg_stat_user_tables?

Situation: Need to find tables with high sequential scans or dead tuples.
Task: Identify optimization targets.
Action: SELECT relname, seq_scan, idx_scan, n_live_tup, n_dead_tup, last_vacuum FROM pg_stat_user_tables. High seq_scan + low idx_scan → consider index. n_dead_tup / n_live_tup high → vacuum needed. last_vacuum/ last_autovacuum shows recency. Joined with pg_size for bloat analysis.
Result: Found 3 tables with 90%+ seq scans; added indexes — 50% workload speedup.

Q: How do you detect and handle replication lag?

Situation: Read replicas sometimes returned stale data right after a write.
Task: Detect lag and route critical reads correctly.
Action: SELECT pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn() on replica; compare to primary's pg_current_wal_lsn(). pg_stat_replication shows lag. Application: after write, route next read to primary (read-your-writes). Or check lag before query — if > 5s, use primary. Alert when lag > 30s.
Result: Zero stale reads for critical paths. Alerts caught 2 replica delays.

Q: What backup strategies do you use?

Situation: Needed RPO < 1 hour, RTO < 4 hours.
Task: Implement backup and restore.
Action: Continuous WAL archiving (pg_backup or WAL-G) for point-in-time recovery. Full backup weekly, incremental via WAL. Test restore quarterly — measure actual RTO. Offsite copy (S3, separate region). Document restore procedure. For critical tables: logical replication to standby as additional safeguard.
Result: One restore test caught misconfigured archive; fixed before real incident. RTO verified at 2.5 hours.

From real experience

"I've conducted 100+ database and backend interviews. The candidates who stand out don't just know SQL syntax — they can read an EXPLAIN plan and tell you exactly why a query is slow. They've added a partial index and measured the before/after. They've debugged a deadlock using pg_locks. They know when to use a recursive CTE vs application-level recursion and can explain the trade-offs."

"For 4+ years experience, interviewers expect production war stories: a query that took 45 minutes and how you rewrote it to 78 seconds, how you scaled with PgBouncer and read replicas, how you did zero-downtime schema changes on a 50M-row table. If you haven't lived these, study these answers and practice them. The STAR format with specific metrics is what gets the offer."
— Surya Singh, Senior Software Engineer & Technical Interviewer

Common interview mistakes to avoid

Explaining EXPLAIN without discussing cost vs actual time, sequential scans, or index usage — interviewers want practical diagnosis.
Saying "B-tree is default" without mentioning when hash, GIN, or GiST fit — or when partial/covering indexes help.
Using correlated subqueries when a JOIN or LATERAL would be faster — show you can rewrite for performance.
Describing window functions without ROWS vs RANGE or when to use RANK vs ROW_NUMBER.
Not mentioning deadlock debugging (pg_locks, pg_stat_activity) or consistent lock ordering — isolation level questions often lead here.
Claiming "we scaled the database" without PgBouncer, read replicas, partitioning, or connection limits.
Ignoring performance tuning tools — pg_stat_statements, slow query logging, vacuum — shows lack of operational experience.
Vague STAR answers without metrics — "we optimized the query" vs "query time dropped from 4.2s to 12ms after adding a covering index."

Frequently asked questions

What SQL topics are tested at the 4+ years experience level?

Query optimization and execution plans (EXPLAIN ANALYZE, index selection), indexing strategies (B-tree vs hash vs GIN vs GiST, partial/covering/composite indexes), complex joins and subqueries (correlated subqueries, LATERAL joins, anti-joins), window functions (ROW_NUMBER, RANK, LAG/LEAD), transaction isolation levels, normalization vs denormalization, stored procedures/triggers, database scaling (read replicas, sharding, connection pooling), CTEs and recursive queries, and performance tuning (slow query logging, pg_stat_statements, vacuuming).

How important is understanding EXPLAIN ANALYZE for senior SQL roles?

Critical. Interviewers expect you to interpret execution plans — identify sequential scans that should use indexes, spot nested loops vs hash joins, understand cost estimates vs actual timings, and explain when the planner chooses one strategy over another. You should be able to take a slow query, run EXPLAIN ANALYZE, identify the bottleneck (e.g., 2.3M rows seq scan on orders), and propose a fix (index on customer_id + created_at). Metrics matter: "After adding the composite index, execution time dropped from 4.2s to 12ms."

When do you choose B-tree vs GIN vs GiST indexes?

B-tree: default for equality, range, and ordering (WHERE id = 5, WHERE created_at > ..., ORDER BY). GIN: multi-value and full-text search (array contains, @>, to_tsvector). GiST: geometric data, range types, full-text when you need ranking. Partial indexes: when you query a subset (WHERE status = 'active'). Covering indexes (INCLUDE): when all SELECT columns are in the index — index-only scan. Bring a real example: "We added a partial index on (tenant_id) WHERE deleted_at IS NULL — queries dropped from 800ms to 3ms."

How do you approach transaction isolation level questions?

Explain the four levels: read uncommitted (rarely used), read committed (default in PostgreSQL — see committed data at statement start), repeatable read (see snapshot at transaction start, prevents dirty and non-repeatable reads), serializable (strictest — detects conflicts). For deadlocks: use pg_locks and pg_stat_activity to identify blocking queries; resolve with shorter transactions, consistent lock ordering, or retry logic. Mention real trade-offs: "We use REPEATABLE READ for financial reports to avoid phantom reads; for high-throughput writes we stay at READ COMMITTED and handle conflicts in application logic."

What scaling strategies should I discuss for SQL databases?

Read replicas: offload SELECT to replicas; handle replication lag (eventual consistency). Connection pooling: PgBouncer or pgpool to limit connections, avoid exhausting max_connections. Partitioning: range (by date), list (by region), hash (for distribution) — improves maintenance and query pruning. Sharding: horizontal split by shard key; discuss consistent hashing, cross-shard queries, and rebalancing. CQRS/materialized views: denormalize for reads when writes are infrequent. Bring metrics: "PgBouncer allowed 8 app servers (400 connections each) to share 50 DB connections; connection errors dropped to zero."

How do CTEs and recursive queries help in production?

CTEs (WITH): improve readability, materialize intermediate results, avoid repeating subqueries. Recursive CTEs: hierarchical data (org charts, bill of materials, graph traversal). Example: employee hierarchy, category trees, path finding. Know the structure: anchor (base case) UNION ALL recursive term (references CTE). Pitfall: missing termination condition causes infinite loop. Real use: "We used a recursive CTE to flatten a 5-level category tree; the report that took 45 minutes with application-level recursion now runs in 2 seconds."

What performance tuning tools and practices should I know?

Slow query logging: log_min_duration_statement to capture queries above a threshold. pg_stat_statements: top N queries by total time, mean time, calls. pg_stat_user_tables: seq_scan, idx_scan, n_tup_ins/upd/del — identify hot tables. VACUUM: reclaim dead rows; ANALYZE for statistics. Auto-vacuum tuning: adjust threshold and scale factor for high-churn tables. Connection pool monitoring: track active vs idle connections. Bring a debugging story: "pg_stat_statements showed a single query accounting for 60% of DB time; we rewrote it and cut P95 latency by 40%."

How do I demonstrate database design maturity in an interview?

Discuss 3NF trade-offs: when to normalize vs denormalize. When denormalization helps: read-heavy workloads, avoid expensive joins, materialized views for aggregations. CQRS: separate read and write models; read model can be denormalized. Migration strategy: zero-downtime schema changes (add column nullable, backfill, add constraint, add default). Discuss constraints, cascades, and when to enforce integrity in DB vs application. Show you think about indexes at design time, not just after production issues.

X LinkedIn Facebook

Surya Singh

Azure Solutions Architect & AI Engineer

Microsoft-certified Azure Solutions Architect with 8+ years in enterprise software, cloud architecture, and AI/ML deployment. I build production AI systems and write about what actually works—based on shipping code, not theory.

Microsoft Certified: Azure Solutions Architect Expert
Built 20+ production AI/ML pipelines on Azure
8+ years in .NET, C#, and cloud-native architecture

LinkedIn Twitter/X About Me

SQL Interview Questions and Answers — Real-World Guide for 4+ Years Experience (2026)

Key Takeaways

Table of Contents

1) How do you use EXPLAIN ANALYZE, index selection, and query rewriting to optimize slow queries?

2) How do you choose between B-tree vs hash vs GIN vs GiST, partial indexes, covering indexes, and composite indexes?

3) How do you handle correlated subqueries, LATERAL joins, anti-joins, and performance in complex queries?

4) How do you use ROW_NUMBER, RANK, LAG/LEAD, running totals, and moving averages in production?

5) How do transaction isolation levels (read committed vs repeatable read vs serializable) and deadlock debugging affect production?

6) How do you balance database normalization vs denormalization — 3NF trade-offs, materialized views, and CQRS?

7) When do you use stored procedures, functions, and triggers — and how do you test and migrate them?

8) How do you implement read replicas, sharding strategies, connection pooling (PgBouncer), and partitioning?

9) How do CTEs and recursive queries help with hierarchical data, graph traversal, org charts, and bill of materials?

10) How do you use slow query logging, pg_stat_statements, vacuuming, and connection pool monitoring for performance tuning?

Rapid-fire interview practice — STAR answers

From real experience

Common interview mistakes to avoid

Frequently asked questions

What SQL topics are tested at the 4+ years experience level?

How important is understanding EXPLAIN ANALYZE for senior SQL roles?

When do you choose B-tree vs GIN vs GiST indexes?

How do you approach transaction isolation level questions?

What scaling strategies should I discuss for SQL databases?

How do CTEs and recursive queries help in production?

What performance tuning tools and practices should I know?

How do I demonstrate database design maturity in an interview?

Related interview guides

Surya Singh