Token Bucket Algorithm — Complete Rate Limiting Implementation Guide
In this tutorial, you will learn about Token Bucket Algorithm. We cover key concepts, practical examples, and best practices to help you master this topic.
The token bucket algorithm maintains a bucket of tokens that refill at a constant rate. Each request consumes a token. If the bucket is empty, the request is denied. This allows bursts while enforcing average limits.
What You'll Learn
You'll learn how the token bucket algorithm works, its advantages, and how to implement it.
Why It Matters
Token bucket is the most widely used Rate Limiting algorithm. It is used by AWS, Google Cloud, Kong, and most major API gateways because it allows traffic bursts while maintaining average limits.
Real-World Use
An API allows 100 requests per second with burst capacity of 200. When a client sends no requests for 2 seconds, the bucket fills to 200 tokens. The client can then send 200 requests instantly, then drops to 100/sec average.
flowchart LR
A[Token Refill Rate: 100/sec] --> B[(Token Bucket)]
C[Requests] --> D{Tokens Available?}
D -->|Consume 1 Token| E[Allow Request]
D -->|No Tokens| F[429 Deny]
B -->|Max 200 Tokens| D
Implementation
import time
import threading
class TokenBucket:
def __init__(self, rate, capacity):
self.rate = rate
self.capacity = capacity
self.tokens = capacity
self.last_refill = time.time()
self.lock = threading.Lock()
def refill(self):
now = time.time()
elapsed = now - self.last_refill
new_tokens = elapsed * self.rate
self.tokens = min(self.capacity, self.tokens + new_tokens)
self.last_refill = now
def consume(self, count=1):
with self.lock:
self.refill()
if self.tokens >= count:
self.tokens -= count
return True
return False
bucket = TokenBucket(rate=100, capacity=200)
print(f"Allowed: {bucket.consume()}") # True
print(f"Tokens remaining: {bucket.tokens}")
Common Mistakes
| Mistake | Fix | |---------|-----| | No Thread Safety | Race conditions on token count | Use locks or atomic operations | | Continuous refill instead of lazy | Performance overhead | Compute tokens on demand (lazy refill) | | Integer token counts | Inaccurate for fractional seconds | Use float for elapsed time multiplication | | Unlimited capacity | No burst limit | Always set max capacity | | Single bucket for all clients | One client can exhaust shared bucket | Per-client buckets |
Practice Questions
- How does the token bucket allow bursts?
- What is the difference between rate and capacity?
- Why is lazy refill better than continuous refill?
- How do you make token bucket thread-safe?
- What happens when tokens reach capacity?
Challenge
Implement a thread-safe token bucket for a Flask API. Set rate=50/sec, capacity=100. Test with concurrent requests using threading. Verify burst behavior.
What's Next
Learn about the leaky bucket algorithm for shaping traffic.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro