Rate Limiters Interview Questions

February 26, 2026By Surya SinghSystem Design • Rate Limiting • Scalability • Interview

Rate limiter design interview questions — token bucket, sliding window, fixed window algorithms.

System DesignRate LimitingScalabilityInterview

Key Takeaways

  • 1Rate limiters protect APIs from abuse and ensure fair usage by capping requests per user/IP per time window.
  • 2Common algorithms: Fixed window, sliding window, sliding window log, token bucket, leaky bucket.
  • 3Sliding window and token bucket are widely used; Redis with atomic counters is a common implementation.
  • 4Consider distributed limits, which storage to use (in-memory vs Redis), and burst vs sustained traffic.

The questions below are commonly asked in technical interviews. Each answer is written to help you understand the concept clearly and explain it confidently. Focus on understanding the "why" behind each answer—that is what interviewers care about.

Interview Questions & Answers

What is a rate limiter and why do we need it?

A rate limiter restricts how many requests a client (user or IP) can make in a given time window. It protects your API from abuse, prevents a single user from consuming all resources, and helps you maintain fair usage. For example, you might allow 100 requests per minute per user. If the limit is exceeded, you return HTTP 429 (Too Many Requests). Rate limiters are essential for public APIs, login endpoints (to prevent brute force), and any service where one client could degrade quality for others.

What is the difference between fixed window and sliding window rate limiting?

Fixed window counts requests in discrete windows (e.g., 1:00–2:00, 2:00–3:00). A user could send 100 requests at 1:59 and another 100 at 2:01—200 in 2 minutes at the boundary. Sliding window counts requests in a rolling window (e.g., the last 60 seconds from "now"). This avoids the boundary burst. Sliding window is fairer but needs more storage (you may store timestamps of recent requests). Sliding window log stores each request timestamp; sliding window counter uses a weighted average of the previous window and current window to approximate without storing all timestamps.

How does the token bucket algorithm work?

A bucket holds tokens. Each request consumes one token. Tokens are refilled at a fixed rate (e.g., 10 tokens per second) up to a maximum bucket size (e.g., 100). If the bucket has tokens, the request is allowed and a token is removed. If not, the request is rejected. This allows bursts (up to the bucket size) while maintaining an average rate over time. Leaky bucket is similar but the bucket "leaks" at a constant rate (requests leave at a fixed rate); it smooths traffic but does not allow bursts. Token bucket is common in practice (e.g., AWS, Stripe) because it permits short bursts.

// Token bucket - in-memory
class TokenBucket {
    double _tokens;
    readonly double _maxTokens, _refillRate;
    DateTime _lastRefill = DateTime.UtcNow;
    public TokenBucket(int maxTokens, double refillPerSec) {
        _maxTokens = _tokens = maxTokens;
        _refillRate = refillPerSec;
    }
    public bool TryConsume() {
        Refill();
        if (_tokens >= 1) { _tokens--; return true; }
        return false;
    }
    void Refill() {
        var now = DateTime.UtcNow;
        _tokens = Math.Min(_maxTokens,
            _tokens + (now - _lastRefill).TotalSeconds * _refillRate);
        _lastRefill = now;
    }
}

How would I implement a rate limiter in a distributed system?

Use Redis with atomic operations. For a sliding window or fixed window, use INCR with a key like "ratelimit:user123:minute:1709123456". Set an EXPIRE so the key auto-deletes. Use a Lua script or Redis transactions to ensure INCR and EXPIRE are atomic. For token bucket, store (tokens, last_refill_time) and use a Lua script to refill tokens and decrement in one atomic step. Redis is fast and supports TTL; it works across multiple API servers. For very high scale, you might use a dedicated rate-limiting service or a distributed rate limiter like Kong or Envoy.

// Fixed-window rate limiter with Redis (StackExchange.Redis)
async Task<bool> IsAllowedAsync(IDatabase db, string key, int limit, TimeSpan window) {
    var fullKey = $"ratelimit:{key}:{DateTime.UtcNow.Ticks / window.Ticks}";
    var count = await db.StringIncrementAsync(fullKey);
    if (count == 1)
        await db.KeyExpireAsync(fullKey, window);
    return count <= limit;
}
// In-memory fixed window (single server)
private readonly Dictionary<string, (int Count, DateTime Window)> _counters = new();
bool IsAllowed(string key, int limit, TimeSpan window) {
    var now = DateTime.UtcNow;
    var windowStart = new DateTime(now.Ticks - now.Ticks % window.Ticks);
    if (!_counters.TryGetValue(key, out var v) || v.Window < windowStart)
        _counters[key] = (1, windowStart);
    else if (v.Count >= limit) return false;
    else _counters[key] = (v.Count + 1, v.Window);
    return true;
}

What are the trade-offs of storing rate limit state in-memory vs Redis?

In-memory is fast and simple but does not work across multiple server instances—each server has its own counter, so a user could hit each server and bypass the limit. Redis is shared across servers, so the limit is enforced globally. Redis adds network latency (usually 1–2 ms) and a single point of dependency. For a single-server app or when rate limits are per-instance, in-memory is fine. For production APIs with multiple nodes, Redis (or similar) is standard.

Loading...

Surya Singh

Surya Singh

Azure Solutions Architect & AI Engineer

Microsoft-certified Azure Solutions Architect with 8+ years in enterprise software, cloud architecture, and AI/ML deployment. I build production AI systems and write about what actually works—based on shipping code, not theory.

  • Microsoft Certified: Azure Solutions Architect Expert
  • Built 20+ production AI/ML pipelines on Azure
  • 8+ years in .NET, C#, and cloud-native architecture