Skip to content

Distributed Rate Limiting — Complete Multi-Server Rate Limiting Guide

DodaTech Updated 2026-06-28 2 min read

In this tutorial, you will learn about Distributed Rate Limiting. We cover key concepts, practical examples, and best practices to help you master this topic.

Distributed rate limiting ensures that rate limits are enforced consistently across all server instances. A client cannot exceed limits by sending requests to different servers.

What You'll Learn

You'll learn distributed rate limiting architectures, the trade-offs between centralized and distributed approaches, and how to handle failures.

Why It Matters

Modern APIs run on multiple servers. Without distributed rate limiting, a client can exceed limits by round-robining through instances, each unaware of the others.

Real-World Use

An API runs on 10 Kubernetes pods behind a load balancer. Using Redis-based distributed rate limiting, a client's 100 req/min limit is enforced across all 10 pods, not per-pod.

flowchart LR
    A[Client] --> B[Load Balancer]
    B --> C[Pod 1]
    B --> D[Pod 2]
    B --> E[Pod 3]
    C --> F[(Redis)]
    D --> F
    E --> F
    F --> G[Shared Counter]

Implementation

import redis
import time

class DistributedRateLimiter:
    def __init__(self, redis_host="localhost", redis_port=6379):
        self.redis = redis.Redis(
            host=redis_host, port=redis_port, db=0,
            decode_responses=True, socket_connect_timeout=2
        )

    def check(self, client_id, limit, window):
        key = f"drl:{client_id}"
        now = int(time.time())
        pipeline = self.redis.pipeline()
        pipeline.zremrangebyscore(key, 0, now - window)
        pipeline.zcard(key)
        pipeline.zadd(key, {str(now): now})
        pipeline.expire(key, window + 10)
        _, count, _, _ = pipeline.execute()
        return count <= limit

    def check_with_fallback(self, client_id, limit, window, fallback_limit=limit):
        try:
            return self.check(client_id, limit, window)
        except redis.ConnectionError:
            return self.local_check(client_id, fallback_limit, window)

limiter = DistributedRateLimiter()
if limiter.check("user_123", 100, 60):
    print("Request allowed")
else:
    print("Rate limit exceeded")

Common Mistakes

| Mistake | Fix | |---------|-----| | No fallback when Redis is down | API unavailable entirely | Degrade gracefully (allow with local limit) | | High latency for every request | Adds 1-5ms per API call | Use pipelining or Redis Cluster | | Clock skew between servers | Inconsistent window boundaries | Use Redis time (TIME command) not local time | | Inconsistent hashing for key distribution | Hotspots in Redis Cluster | Use hash tags for related keys | | Not handling Redis failover | Rate limits reset after failover | Configure Redis Sentinel with automatic failover |

Practice Questions

  1. Why is distributed rate limiting necessary?
  2. What happens when the rate limiting backend fails?
  3. How do you handle clock skew in Distributed Systems?
  4. What is the CAP Theorem trade-off for rate limiting?
  5. How does Redis Cluster distribute rate limit keys?

Challenge

Set up a 3-node Redis Cluster. Implement a distributed rate limiter with fallback to local in-memory limiting. Test with concurrent requests across multiple client processes.

What's Next

Learn about IP-based rate limiting.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro