Token-Bucket Rate Limiting at the Edge
You need a limiter that lets a client send a short burst — say, a dashboard firing ten parallel API calls on load — yet still caps the sustained rate at, for example, five requests per second. A fixed-window counter rejects the burst outright; a token bucket absorbs it. This guide is part of Rate Limiting and Abuse Prevention at the Edge, and it implements a correct, atomic token bucket inside a Cloudflare Durable Object.
The constraint that forces a Durable Object
A token bucket holds two numbers per client: the tokens currently available and the timestamp of the last refill. The decision is a read-modify-write: read the state, refill based on elapsed time, subtract a token if one exists, write the new state. At the edge this sequence runs inside an ephemeral V8 isolate with no shared memory, so two concurrent requests can both read tokens = 1, both decide to allow, and both write tokens = 0 — the bucket dispensed two tokens it only had one of.
The fix is a single-writer. A Durable Object routes every request for a given key to one globally-unique instance that processes messages serially, so the read-modify-write never interleaves. That serialization is exactly what makes the count exact rather than approximate — the property you cannot get from an eventually-consistent KV store.
The refill math
Token bucket does not run a timer. It refills lazily: when a request arrives, compute how much time has passed since updatedAt, multiply by the refill rate, and add that many tokens (capped at capacity). For a bucket of capacity = 10 refilling at refillPerSecond = 5, a client that drains the bucket then waits 400 ms regains 0.4 * 5 = 2 tokens. When the bucket holds less than one token, the request is rejected and the wait until the next whole token is (1 - tokens) / refillPerSecond seconds.
Step 1 — Write the pure decision function
Keep the math in a side-effect-free function so it is trivial to unit test. It takes the stored state and returns the verdict plus the next state.
// limiter.ts
export interface BucketState {
tokens: number; // tokens available at `updatedAt`
updatedAt: number; // ms epoch of last computation
}
export interface BucketConfig {
capacity: number; // max tokens (burst size)
refillPerSecond: number; // sustained rate
}
export interface Verdict {
allowed: boolean;
state: BucketState;
retryAfterSeconds: number;
remaining: number;
}
export function consume(
prev: BucketState,
cfg: BucketConfig,
now: number,
): Verdict {
const elapsed = Math.max(0, now - prev.updatedAt) / 1000;
const tokens = Math.min(cfg.capacity, prev.tokens + elapsed * cfg.refillPerSecond);
if (tokens >= 1) {
const next = { tokens: tokens - 1, updatedAt: now };
return { allowed: true, state: next, retryAfterSeconds: 0, remaining: Math.floor(next.tokens) };
}
const retryAfterSeconds = Math.ceil((1 - tokens) / cfg.refillPerSecond);
return { allowed: false, state: { tokens, updatedAt: now }, retryAfterSeconds, remaining: 0 };
}
Step 2 — Wrap it in a Durable Object
The Durable Object owns one bucket per instance. It reads state from durable storage, runs consume, persists the new state, and replies. Because the object is single-threaded, the read-write pair is atomic without explicit locks.
// rate-limiter-do.ts
import { consume, type BucketState, type BucketConfig } from "./limiter";
const CONFIG: BucketConfig = { capacity: 10, refillPerSecond: 5 };
export class TokenBucket {
private state: DurableObjectState;
constructor(state: DurableObjectState) {
this.state = state;
}
async fetch(_request: Request): Promise<Response> {
const now = Date.now();
const prev =
(await this.state.storage.get<BucketState>("bucket")) ??
{ tokens: CONFIG.capacity, updatedAt: now };
const verdict = consume(prev, CONFIG, now);
await this.state.storage.put("bucket", verdict.state);
return Response.json(
{ allowed: verdict.allowed, remaining: verdict.remaining },
{
status: verdict.allowed ? 200 : 429,
headers: verdict.allowed
? { "x-ratelimit-remaining": String(verdict.remaining) }
: { "retry-after": String(verdict.retryAfterSeconds) },
},
);
}
}
Durable Object storage operations inside a single fetch invocation run within the object’s serialized execution, so concurrent requests queue rather than race. No blockConcurrencyWhile is required for this simple read-then-write because the platform already serializes invocations per object.
Step 3 — Route requests to the bucket from your Worker
The Worker derives a stable client key, addresses the matching Durable Object instance by name, and forwards the check. Keying the object by client means each client gets its own serialized bucket.
// worker.ts
import { TokenBucket } from "./rate-limiter-do";
export { TokenBucket };
interface Env {
TOKEN_BUCKET: DurableObjectNamespace;
}
export default {
async fetch(request: Request, env: Env): Promise<Response> {
const ip = request.headers.get("cf-connecting-ip") ?? "0.0.0.0";
const id = env.TOKEN_BUCKET.idFromName(`ip:${ip}`);
const stub = env.TOKEN_BUCKET.get(id);
const verdict = await stub.fetch("https://limiter/check");
if (verdict.status === 429) {
return new Response(JSON.stringify({ error: "rate_limited" }), {
status: 429,
headers: {
"content-type": "application/json",
"retry-after": verdict.headers.get("retry-after") ?? "1",
},
});
}
// Allowed — continue the chain / fetch origin.
return new Response("ok");
},
};
Rejecting here is an early-return guard: the 429 short-circuits before any origin fetch, so blocked traffic costs almost nothing.
Step 4 — Configure wrangler
Declare the Durable Object binding and a migration that creates the class. The bucket needs no extra resources beyond the object itself.
// wrangler.jsonc
{
"name": "edge-rate-limiter",
"main": "src/worker.ts",
"compatibility_date": "2026-06-01",
"durable_objects": {
"bindings": [
{ "name": "TOKEN_BUCKET", "class_name": "TokenBucket" }
]
},
"migrations": [
{ "tag": "v1", "new_sqlite_classes": ["TokenBucket"] }
]
}
Local vs production divergence
| Behavior | wrangler dev (local) |
Production |
|---|---|---|
Date.now() resolution |
Full ms precision | Coarsened for timing-attack mitigation |
| Durable Object location | In-process, zero network latency | Homed to one region; remote PoPs pay a round-trip |
cf-connecting-ip |
Often absent or 127.0.0.1 |
Real client IP, set by the edge |
| Storage durability | In-memory unless persisted | Strongly durable across the object’s lifetime |
| Concurrency | Simulated, lower contention | Real parallel requests serialize at the object |
Because production coarsens Date.now(), do not rely on sub-millisecond refill precision; design limits so a few milliseconds of timestamp jitter is immaterial.
Step 5 — Validate with Vitest
Test the pure consume function first — it carries all the logic and needs no runtime. Then add an integration test with the Workers pool if you want end-to-end coverage.
// limiter.test.ts
import { describe, it, expect } from "vitest";
import { consume, type BucketState, type BucketConfig } from "./limiter";
const cfg: BucketConfig = { capacity: 10, refillPerSecond: 5 };
describe("token bucket", () => {
it("allows a burst up to capacity then rejects", () => {
let state: BucketState = { tokens: cfg.capacity, updatedAt: 0 };
const results: boolean[] = [];
for (let i = 0; i < 11; i++) {
const v = consume(state, cfg, 0); // same instant: no refill
results.push(v.allowed);
state = v.state;
}
expect(results.filter(Boolean)).toHaveLength(10); // 10 allowed
expect(results[10]).toBe(false); // 11th rejected
});
it("refills lazily over elapsed time", () => {
let state: BucketState = { tokens: 0, updatedAt: 0 };
// After 1 second at 5/s, 5 tokens are back; first request allowed.
const v = consume(state, cfg, 1000);
expect(v.allowed).toBe(true);
expect(v.remaining).toBe(4);
});
it("computes Retry-After when empty", () => {
const v = consume({ tokens: 0, updatedAt: 0 }, cfg, 0);
expect(v.allowed).toBe(false);
expect(v.retryAfterSeconds).toBe(1); // ceil(1 / 5) = 1s
});
it("never exceeds capacity after a long idle", () => {
const v = consume({ tokens: 2, updatedAt: 0 }, cfg, 60_000);
expect(v.state.tokens).toBeLessThanOrEqual(cfg.capacity);
});
});
Run with npx vitest run. The first test pins burst size, the second proves lazy refill, the third proves the retry hint, and the fourth guards the capacity cap.
Pitfalls
- Storing tokens as an integer. Refill produces fractional tokens; rounding down on write loses partial accrual and slows the effective rate. Persist the float and only floor for the
remainingheader. - Refilling above capacity. Forgetting the
Math.min(capacity, ...)cap lets an idle client accumulate unlimited tokens and then flood. Always clamp. - Keying the object per request.
idFromNamemust receive a stable client key. Hashing a per-request value (like a timestamp) gives every request its own fresh bucket and disables the limit. - Doing the math in the Worker, not the object. Reading state into the Worker, computing, then writing back reintroduces the race. Keep the read-modify-write entirely inside the Durable Object.
- Ignoring clock coarsening. Designing limits that depend on microsecond timing breaks in production where
Date.now()is intentionally coarse.
Production deployment checklist
-
consume - Object addressed by a stable client key via
-
429returns an accurateRetry-After -
wrangler.jsoncdeclares the binding and thenew_sqlite_classes
Frequently Asked Questions
Why use a Durable Object instead of KV for a token bucket?
A token bucket requires an atomic read-modify-write: read tokens, refill, subtract, write. KV is eventually consistent and offers no atomic increment, so concurrent requests can both read the same token count and both pass, dispensing tokens the bucket did not have. A Durable Object serializes all requests for a key through one single-threaded instance, making the update exact. Use KV only when an approximate limit is acceptable.
How does lazy refill work without a timer?
Instead of a background job adding tokens on a schedule, the limiter computes refill on demand. When a request arrives it measures the time since the last update, multiplies by the refill rate, and adds that many tokens up to capacity. This needs only a token count and a timestamp, costs nothing when idle, and is exact to the resolution of the clock.
How do I set the burst size versus the sustained rate?
The bucket capacity is the maximum burst — the number of requests a client can make instantly after a quiet period. The refillPerSecond is the sustained long-run rate. Set capacity to the largest legitimate burst (for example, the parallel calls a page makes on load) and refill to the average rate you want to enforce. They are independent knobs.
Do I need blockConcurrencyWhile for this limiter?
No. The Durable Object platform already serializes fetch invocations per object instance, so a simple read-then-write within one invocation cannot interleave with another. blockConcurrencyWhile is for guarding state during awaited initialization, not for a single synchronous-style read-modify-write like this bucket.