Skip to content

20 Rate Limiting Design

DodaTech 4 min read

title: Rate Limiting Design in REST APIs — Complete Guide weight: 30 date: 2026-06-28 lastmod: 2026-06-28 description: Learn rate limiting strategies for REST APIs including token bucket, sliding window, and fixed window algorithms with response headers for client feedback. tags: [api-development, rest]


Rate limiting in REST APIs controls how many requests a client can make in a given time window, using algorithms like token bucket, sliding window, or fixed window to prevent abuse and ensure fair resource allocation.

```mermaid
flowchart TD
  A[Client Request] --> B[Rate Limiter]
  B --> C{Under Limit?}
  C -->|Yes| D[Process Request]
  C -->|No| E[429 Too Many Requests]
  D --> F[Update Counter]
  F --> G[Return RateLimit Headers]
  E --> H[Include Retry-After]
  style A fill:#e1f5fe
  style D fill:#c8e6c9
  style E fill:#ffcdd2

Rate limiting algorithms include fixed window (count requests in 1-minute buckets), sliding window (more precise, counts requests in the last 60 seconds), and token bucket (tokens replenish at a steady rate). Each request consumes a token. When tokens run out, requests are rejected with 429 Too Many Requests.

Think of rate limiting like a water fountain. The token bucket is a bucket that refills at a constant rate. You can take a cup of water whenever you want, but if you take too many too fast, the bucket empties and you must wait for it to refill.

Example: Rate Limit Response Headers

import requests

response = requests.get("https://api.example.com/users")
print(f"X-RateLimit-Limit: {response.headers.get('X-RateLimit-Limit')}")
print(f"X-RateLimit-Remaining: {response.headers.get('X-RateLimit-Remaining')}")
print(f"X-RateLimit-Reset: {response.headers.get('X-RateLimit-Reset')}")

Expected output:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1688054400

Example: Handling 429 Too Many Requests

import requests
import time

def rate_limited_get(url, max_retries=3):
    for attempt in range(max_retries):
        response = requests.get(url)
        remaining = int(response.headers.get("X-RateLimit-Remaining", 1))
        if response.status_code == 429 or remaining == 0:
            reset_time = int(response.headers.get("X-RateLimit-Reset", 0))
            wait_time = max(reset_time - time.time(), 1)
            print(f"Rate limited. Waiting {wait_time:.0f}s...")
            time.sleep(wait_time)
        else:
            return response
    raise Exception("Max retries exceeded")

result = rate_limited_get("https://api.example.com/users")
print(f"Success: {result.status_code}")

Expected output:

Rate limited. Waiting 5s...
Success: 200

Example: Token Bucket Simulation

import time

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.time()

    def consume(self, tokens=1):
        now = time.time()
        elapsed = now - self.last_refill
        self.tokens = min(self.capacity, self.tokens + elapsed * self.rate)
        self.last_refill = now

        if self.tokens >= tokens:
            self.tokens -= tokens
            return True
        return False

bucket = TokenBucket(rate=10, capacity=20)
for i in range(25):
    allowed = bucket.consume()
    print(f"Request {i+1:2d}: {'Allowed' if allowed else 'Blocked'}")
    if not allowed:
        break
    time.sleep(0.05)

Expected output:

Request  1: Allowed
Request  2: Allowed
...
Request 20: Allowed
Request 21: Blocked

Common Mistakes

  1. Not implementing rate limiting at all — Without rate limiting, a single abusive client can consume all server resources and degrade service for others.
  2. Using per-IP rate limiting alone — IP-based limiting penalizes users behind shared NAT. Use per-user or per-token limiting for authenticated endpoints.
  3. Not returning rate limit headers — Clients need X-RateLimit-Limit, X-RateLimit-Remaining, and X-RateLimit-Reset to self-regulate their request rate.
  4. Resetting counters at fixed global intervals — If all counters reset at midnight, clients flood the API at 12:00 AM. Use sliding windows instead.
  5. Not exempting essential health checks — Monitoring and health check endpoints should have higher or unlimited rate limits to avoid false alerts.

Practice Questions

  1. What are three common rate limiting algorithms?
  2. What HTTP status code indicates rate limiting?
  3. What headers should a rate-limited response include?
  4. Why is sliding window better than fixed window for rate limiting?
  5. Challenge: Implement a sliding window rate limiter in Python that tracks requests per user in a Redis-like in-memory store. Support different limits for different endpoints.

FAQ

What is the difference between rate limiting and throttling?

Rate limiting caps the number of requests over time. Throttling controls the rate at which requests are processed (speed). Both prevent abuse.

Should I rate limit per IP, per user, or per API key?

Use per-API key for authenticated endpoints and per-IP for unauthenticated endpoints. Per-user works when you have user-based authentication.

What is the Retry-After header?

Retry-After tells the client how long to wait before making another request. It can be seconds or a specific date.

How do I choose rate limits for different endpoints?

Expensive endpoints (search, reporting) get lower limits. Cheap endpoints (health checks, static data) get higher limits. Set limits based on actual resource consumption.

What happens when a client hits the rate limit?

Return 429 Too Many Requests with rate limit headers and a Retry-After header. Do not silently drop requests.

Mini Project

Build a Python rate limiting middleware that supports token bucket and sliding window algorithms. The middleware should add rate limit headers to responses, return 429 when limits are exceeded, and support per-user and per-endpoint limits.

What's Next

Now learn about API documentation design in REST API Design.

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro