20 Rate Limiting Design
title: Rate Limiting Design in REST APIs — Complete Guide weight: 30 date: 2026-06-28 lastmod: 2026-06-28 description: Learn rate limiting strategies for REST APIs including token bucket, sliding window, and fixed window algorithms with response headers for client feedback. tags: [api-development, rest]
Rate limiting in REST APIs controls how many requests a client can make in a given time window, using algorithms like token bucket, sliding window, or fixed window to prevent abuse and ensure fair resource allocation.
```mermaid
flowchart TD
A[Client Request] --> B[Rate Limiter]
B --> C{Under Limit?}
C -->|Yes| D[Process Request]
C -->|No| E[429 Too Many Requests]
D --> F[Update Counter]
F --> G[Return RateLimit Headers]
E --> H[Include Retry-After]
style A fill:#e1f5fe
style D fill:#c8e6c9
style E fill:#ffcdd2
Rate limiting algorithms include fixed window (count requests in 1-minute buckets), sliding window (more precise, counts requests in the last 60 seconds), and token bucket (tokens replenish at a steady rate). Each request consumes a token. When tokens run out, requests are rejected with 429 Too Many Requests.
Think of rate limiting like a water fountain. The token bucket is a bucket that refills at a constant rate. You can take a cup of water whenever you want, but if you take too many too fast, the bucket empties and you must wait for it to refill.
Example: Rate Limit Response Headers
import requests
response = requests.get("https://api.example.com/users")
print(f"X-RateLimit-Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"X-RateLimit-Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"X-RateLimit-Reset: {response.headers.get('X-RateLimit-Reset')}")
Expected output:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1688054400
Example: Handling 429 Too Many Requests
import requests
import time
def rate_limited_get(url, max_retries=3):
for attempt in range(max_retries):
response = requests.get(url)
remaining = int(response.headers.get("X-RateLimit-Remaining", 1))
if response.status_code == 429 or remaining == 0:
reset_time = int(response.headers.get("X-RateLimit-Reset", 0))
wait_time = max(reset_time - time.time(), 1)
print(f"Rate limited. Waiting {wait_time:.0f}s...")
time.sleep(wait_time)
else:
return response
raise Exception("Max retries exceeded")
result = rate_limited_get("https://api.example.com/users")
print(f"Success: {result.status_code}")
Expected output:
Rate limited. Waiting 5s...
Success: 200
Example: Token Bucket Simulation
import time
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.time()
def consume(self, tokens=1):
now = time.time()
elapsed = now - self.last_refill
self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
self.last_refill = now
if self.tokens >= tokens:
self.tokens -= tokens
return True
return False
bucket = TokenBucket(rate=10, capacity=20)
for i in range(25):
allowed = bucket.consume()
print(f"Request {i+1:2d}: {'Allowed' if allowed else 'Blocked'}")
if not allowed:
break
time.sleep(0.05)
Expected output:
Request 1: Allowed
Request 2: Allowed
...
Request 20: Allowed
Request 21: Blocked
Common Mistakes
- Not implementing rate limiting at all — Without rate limiting, a single abusive client can consume all server resources and degrade service for others.
- Using per-IP rate limiting alone — IP-based limiting penalizes users behind shared NAT. Use per-user or per-token limiting for authenticated endpoints.
- Not returning rate limit headers — Clients need X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset to self-regulate their request rate.
- Resetting counters at fixed global intervals — If all counters reset at midnight, clients flood the API at 12:00 AM. Use sliding windows instead.
- Not exempting essential health checks — Monitoring and health check endpoints should have higher or unlimited rate limits to avoid false alerts.
Practice Questions
- What are three common rate limiting algorithms?
- What HTTP status code indicates rate limiting?
- What headers should a rate-limited response include?
- Why is sliding window better than fixed window for rate limiting?
- Challenge: Implement a sliding window rate limiter in Python that tracks requests per user in a Redis-like in-memory store. Support different limits for different endpoints.
FAQ
Mini Project
Build a Python rate limiting middleware that supports token bucket and sliding window algorithms. The middleware should add rate limit headers to responses, return 429 when limits are exceeded, and support per-user and per-endpoint limits.
What's Next
Now learn about API documentation design in REST API Design.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro