Token Bucket Algorithm — Complete Rate Limiting Implementation Guide

DodaTech Updated 2026-06-28 2 min read

In this tutorial, you will learn about Token Bucket Algorithm. We cover key concepts, practical examples, and best practices to help you master this topic.

The token bucket algorithm maintains a bucket of tokens that refill at a constant rate. Each request consumes a token. If the bucket is empty, the request is denied. This allows bursts while enforcing average limits.

What You'll Learn

You'll learn how the token bucket algorithm works, its advantages, and how to implement it.

Why It Matters

Token bucket is the most widely used Rate Limiting algorithm. It is used by AWS, Google Cloud, Kong, and most major API gateways because it allows traffic bursts while maintaining average limits.

Real-World Use

An API allows 100 requests per second with burst capacity of 200. When a client sends no requests for 2 seconds, the bucket fills to 200 tokens. The client can then send 200 requests instantly, then drops to 100/sec average.

flowchart LR
    A[Token Refill Rate: 100/sec] --> B[(Token Bucket)]
    C[Requests] --> D{Tokens Available?}
    D -->|Consume 1 Token| E[Allow Request]
    D -->|No Tokens| F[429 Deny]
    B -->|Max 200 Tokens| D

Implementation

import time
import threading

class TokenBucket:
    def __init__(self, rate, capacity):
        self.rate = rate
        self.capacity = capacity
        self.tokens = capacity
        self.last_refill = time.time()
        self.lock = threading.Lock()

    def refill(self):
        now = time.time()
        elapsed = now - self.last_refill
        new_tokens = elapsed * self.rate
        self.tokens = min(self.capacity, self.tokens + new_tokens)
        self.last_refill = now

    def consume(self, count=1):
        with self.lock:
            self.refill()
            if self.tokens >= count:
                self.tokens -= count
                return True
            return False

bucket = TokenBucket(rate=100, capacity=200)
print(f"Allowed: {bucket.consume()}")  # True
print(f"Tokens remaining: {bucket.tokens}")

Common Mistakes

| Mistake | Fix | |---------|-----| | No Thread Safety | Race conditions on token count | Use locks or atomic operations | | Continuous refill instead of lazy | Performance overhead | Compute tokens on demand (lazy refill) | | Integer token counts | Inaccurate for fractional seconds | Use float for elapsed time multiplication | | Unlimited capacity | No burst limit | Always set max capacity | | Single bucket for all clients | One client can exhaust shared bucket | Per-client buckets |

Practice Questions

How does the token bucket allow bursts?
What is the difference between rate and capacity?
Why is lazy refill better than continuous refill?
How do you make token bucket thread-safe?
What happens when tokens reach capacity?

Challenge

Implement a thread-safe token bucket for a Flask API. Set rate=50/sec, capacity=100. Test with concurrent requests using threading. Verify burst behavior.

What's Next

Learn about the leaky bucket algorithm for shaping traffic.

← Previous Why Rate Limit — Complete API Protection Guide Next → Leaky Bucket Algorithm — Complete Traffic Shaping Guide

Built by the developers of DodaTech

Doda Browser, DodaZIP & Durga Antivirus Pro

Home Browse Rate Limiting Throttling