Backend interview questions

Redis Caching Interview Questions

Caching interviews are secretly about failure. What happens when Redis restarts? When a key expires during a flash sale? When two app servers race to repopulate the same hot key? Interviewers want to hear TTL policies tied to business tolerance, explicit invalidation after writes, and instrumentation that shows hit rate—not just "we added Redis and it got faster." If you have debugged stale data that confused users, that anecdote is gold.

Patterns with trade-offs

Operational literacy

Memory limits, key cardinality, large value warnings, replication lag if you read from replicas—pick what you have actually monitored. Mention namespaced keys, graceful degradation when cache is down, and security basics (no public Redis, TLS in transit where required).

Widen the system lens

Cache does not replace a bad schema. Revisit indexing and system design to practice sketches where Redis is one rectangle among queues, databases, and autoscaling groups—with clear ownership of each failure mode.

Azure Cache for Redis: stampede story

C# cache-aside with single-flight lock (conceptual)

async Task<byte[]> GetReportAsync(string tenant)
{
    var key = $"report:{tenant}";
    var cached = await _redis.StringGetAsync(key);
    if (cached.HasValue) return (byte[])cached;

    await _locks.WaitAsync(); // per-key lock in real code
    try {
        cached = await _redis.StringGetAsync(key);
        if (cached.HasValue) return (byte[])cached;
        var fresh = await _db.BuildHeavyReport(tenant);
        await _redis.StringSetAsync(key, fresh, TimeSpan.FromMinutes(5));
        return fresh;
    }
    finally { _locks.Release(); }
}

Expiry aligns, fifty nodes miss at once, SQL falls over—that is the thundering herd. Mention jittered TTLs, per-key mutexes, or prewarming after deploy. Interviewers want the war story, not only the API names.

Questions with sample answers

These are interview-ready outlines—sound human by swapping in your own metrics, team names, and war stories. The examples are generic on purpose so you can map them to what you actually shipped.

  1. Primary prompt

    Hot key on a viral object: how do you protect Redis and origin without lying to users?

    Local in-process LRU in front, read replicas, split key shards with random suffix + merge on read, request coalescing; serve slightly stale with TTL + background refresh if product accepts.

  2. Primary prompt

    Cache-aside vs write-through for a profile page updated rarely but read constantly.

    Cache-aside: app loads cache miss from DB then sets—simple, risk stale until TTL or invalidation. Write-through: write path updates cache + DB—fresher reads, write latency higher; pick based on read/write ratio and staleness tolerance.

  3. Primary prompt

    How do you implement singleflight or request coalescing at the app layer?

    In-flight map of key→Promise: second caller awaits same promise; only one origin fetch; clear after resolve; mutex per key in distributed system harder—use Redis lock briefly.

  4. Primary prompt

    What eviction policy matches session data vs derived aggregates?

    volatile-lru for sessions with TTL; allkeys-lru for general cache when memory tight; noeviction for critical queues if you prefer errors over data loss—match to workload.

Follow-ups interviewers often ask

Expect nested "why?" questions—brief answers here; expand with your production defaults.

  1. Follow-up

    How do you detect cache stampede before CPU graphs scream?

    Monitor origin QPS spike with flat hit rate, many simultaneous misses on same key—alert on synchronized expirations; add jitter to TTLs.

  2. Follow-up

    What is your TTL strategy when you cannot predict freshness needs?

    Short TTL + event-driven invalidation on writes; stale-while-revalidate pattern; version key prefix on schema change.

  3. Follow-up

    How do you encrypt or isolate sensitive cached payloads?

    TLS in transit, Redis ACL + separate logical DB, avoid caching highly sensitive unless encrypted at app level, key namespaced by tenant.

  4. Follow-up

    What happens when cache and DB disagree—which source wins during incident?

    Runbook: DB is source of truth; flush cache keys for affected entity; if write lost, replay from outbox—investigate race vs bug.

  5. Follow-up

    How do you load-test cache effectiveness vs hit rate vanity metrics?

    Measure origin latency saved, p99 end-to-end, cost per request; high hit rate with useless keys is vanity—tie to business read path.