Caching API Responses in Cloudflare KV
This guide is part of KV and Durable Object Caching at the Edge. It walks through caching a slow or rate-limited upstream API response in Cloudflare Workers KV so that repeat requests are served from a globally replicated edge cache instead of hitting origin every time.
The problem
Your Worker proxies an upstream JSON API — a pricing service, a CMS, a third-party catalog. The upstream is slow (200–800 ms) or rate-limited (a fixed quota per minute). Every client request currently triggers a fresh upstream fetch, so you burn latency and quota on data that changes only every few minutes. You want to cache the response in the KV store keyed by request, serve it for a bounded TTL, and refresh on expiry.
Root cause: why not just use the Cache API?
The Cache API (caches.default) is per-PoP. A response cached in Frankfurt is invisible in Singapore, so a globally distributed audience produces a low hit ratio — every cold colo re-fetches origin. KV is globally replicated: a value written once is readable from every PoP within roughly 60 seconds. For an upstream that is expensive or quota-limited, KV’s global reach is what protects origin. The trade-off is KV’s eventual consistency — a freshly written value is not guaranteed to be visible immediately everywhere, and KV rate-limits writes to about one per second per key. Both constraints shape the steps below.
Step 1: Create the KV namespace and bind it
Create the namespace with Wrangler, then add the binding to wrangler.jsonc:
npx wrangler kv namespace create API_CACHE
// wrangler.jsonc
{
"name": "api-cache-worker",
"main": "src/index.ts",
"compatibility_date": "2025-01-01",
"kv_namespaces": [
{ "binding": "API_CACHE", "id": "" }
]
}
The binding name (API_CACHE) is how you reference the namespace in code; the id ties it to the created namespace.
Step 2: Derive a stable cache key
A fragmented key space destroys your hit ratio. Normalize the URL before keying: drop tracking parameters, sort the rest, and lowercase the path.
function cacheKey(request: Request): string {
const url = new URL(request.url);
const keep = new URLSearchParams();
for (const [k, v] of [...url.searchParams.entries()].sort()) {
if (!k.startsWith("utm_") && k !== "fbclid") keep.append(k, v);
}
return `api:${url.pathname.toLowerCase()}?${keep.toString()}`;
}
Step 3: Read through KV, fetch upstream on miss
Read the cached body and its metadata in one call, and write back with an explicit TTL. Run the write inside ctx.waitUntil so it never delays the response.
interface Env {
API_CACHE: KVNamespace;
}
const UPSTREAM = "https://api.example.com";
const TTL_SECONDS = 300;
export default {
async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
if (request.method !== "GET") return fetch(rewriteUpstream(request));
const key = cacheKey(request);
const hit = await env.API_CACHE.getWithMetadata<string, { ct: string }>(key, "text");
if (hit.value !== null) {
return new Response(hit.value, {
headers: {
"content-type": hit.metadata?.ct ?? "application/json",
"x-cache": "HIT",
},
});
}
const upstream = await fetch(rewriteUpstream(request));
if (!upstream.ok) return upstream; // never cache error responses
const body = await upstream.clone().text();
const ct = upstream.headers.get("content-type") ?? "application/json";
ctx.waitUntil(
env.API_CACHE.put(key, body, { expirationTtl: TTL_SECONDS, metadata: { ct } }),
);
return new Response(body, { headers: { "content-type": ct, "x-cache": "MISS" } });
},
};
function rewriteUpstream(request: Request): Request {
const url = new URL(request.url);
return new Request(UPSTREAM + url.pathname + url.search, request);
}
expirationTtl is in seconds and must be at least 60. Never cache a non-2xx response — caching a transient 500 pins the failure for the whole TTL.
Step 4: Honor upstream cache directives (optional)
If the upstream sends a sensible Cache-Control, derive the TTL from it instead of hardcoding:
function ttlFrom(response: Response, fallback: number): number {
const cc = response.headers.get("cache-control") ?? "";
const m = cc.match(/max-age=(\d+)/);
const age = m ? parseInt(m[1], 10) : fallback;
return Math.max(60, age); // KV minimum TTL is 60s
}
Local vs production divergence
| Behavior | wrangler dev --local |
Production |
|---|---|---|
| KV consistency | Immediate (Miniflare in-memory) | Eventual, up to ~60s global propagation |
| TTL enforcement | Honored | Honored; values may linger briefly past expiry |
| Write rate limit | Not enforced | ~1 write/s per key, hard cap |
| Value size cap | Loosely checked | 25 MB hard reject |
getWithMetadata |
Works | Works |
The dangerous divergence is consistency: local KV reads your writes instantly, so read-after-write bugs only surface in production. Test for them explicitly rather than trusting wrangler dev.
Validation with Vitest
Use @cloudflare/vitest-pool-workers, which runs tests inside the Workers runtime with a real KV binding.
// src/index.test.ts
import { env, createExecutionContext, waitOnExecutionContext } from "cloudflare:test";
import { describe, it, expect, vi } from "vitest";
import worker from "./index";
describe("KV API cache", () => {
it("serves a MISS then a HIT for the same URL", async () => {
const fetchSpy = vi.spyOn(globalThis, "fetch").mockResolvedValue(
new Response('{"price":42}', { headers: { "content-type": "application/json" } }),
);
const req = new Request("https://w.example.com/v1/price?sku=abc");
const ctx1 = createExecutionContext();
const r1 = await worker.fetch(req.clone(), env, ctx1);
await waitOnExecutionContext(ctx1); // let ctx.waitUntil KV write complete
expect(r1.headers.get("x-cache")).toBe("MISS");
const ctx2 = createExecutionContext();
const r2 = await worker.fetch(req.clone(), env, ctx2);
await waitOnExecutionContext(ctx2);
expect(r2.headers.get("x-cache")).toBe("HIT");
expect(await r2.json()).toEqual({ price: 42 });
expect(fetchSpy).toHaveBeenCalledTimes(1); // upstream hit exactly once
});
});
The waitOnExecutionContext call is essential: without it the ctx.waitUntil KV write may not have completed before the second request reads, and the HIT assertion flakes.
Pitfalls
- Eventual consistency surprises. A value written in one PoP can take up to ~60 seconds to appear in another. Do not rely on read-after-write across regions; if you need it, route through a Durable Object — see KV vs Durable Objects for edge state.
- Caching errors. Always guard on
upstream.okbefore writing; a cached429or500poisons the key for the full TTL. - 25 MB value cap. KV rejects values over 25 MB. For large payloads, store in R2 and keep a pointer in KV.
- Write stampede. Concurrent misses on a hot key can exceed the ~1 write/s limit and return
429. Coalesce refreshes through a Durable Object. - Forgetting
ctx.waitUntil. Awaiting the KVputinline adds its latency to every miss; defer it withwaitUntil.
Production deployment checklist
- KV namespace created and bound in
wrangler.jsoncwith the correct -
expirationTtl - KV writes deferred with
Frequently Asked Questions
What is the minimum TTL for a KV entry?
The expirationTtl must be at least 60 seconds. KV rejects shorter values. If you need finer-grained freshness control, layer the PoP-local Cache API in front of KV with a shorter max-age, or use a Durable Object for sub-second coordination.
Why does my freshly written value sometimes not appear on the next request?
KV is eventually consistent. A write can take up to roughly 60 seconds to propagate to all edge locations, and read-after-write is not guaranteed even within the same isolate. If the value is missing, the read simply falls through to origin and repopulates, which is safe. For guaranteed read-your-writes, use a Durable Object.
Should I cache POST or authenticated requests?
No. Cache only idempotent GET requests for shared, non-user-specific data. Authenticated or per-user responses risk leaking one user’s data to another from a shared key. Bypass the cache when an Authorization header or session cookie is present.
How do I avoid exceeding KV's write rate limit on a hot key?
KV allows about one write per second to a single key. Under concurrent misses, route the refresh through a Durable Object so exactly one writer repopulates the key per interval, deduplicating the origin fetch and the KV write.