Caching API Responses in Cloudflare KV

This guide is part of KV and Durable Object Caching at the Edge. It walks through caching a slow or rate-limited upstream API response in Cloudflare Workers KV so that repeat requests are served from a globally replicated edge cache instead of hitting origin every time.

The problem

Your Worker proxies an upstream JSON API — a pricing service, a CMS, a third-party catalog. The upstream is slow (200–800 ms) or rate-limited (a fixed quota per minute). Every client request currently triggers a fresh upstream fetch, so you burn latency and quota on data that changes only every few minutes. You want to cache the response in the KV store keyed by request, serve it for a bounded TTL, and refresh on expiry.

Root cause: why not just use the Cache API?

The Cache API (caches.default) is per-PoP. A response cached in Frankfurt is invisible in Singapore, so a globally distributed audience produces a low hit ratio — every cold colo re-fetches origin. KV is globally replicated: a value written once is readable from every PoP within roughly 60 seconds. For an upstream that is expensive or quota-limited, KV’s global reach is what protects origin. The trade-off is KV’s eventual consistency — a freshly written value is not guaranteed to be visible immediately everywhere, and KV rate-limits writes to about one per second per key. Both constraints shape the steps below.

The Worker serves from KV on a hit and only touches the upstream API on a miss, writing the fresh body back with a TTL.

Step 1: Create the KV namespace and bind it

Create the namespace with Wrangler, then add the binding to wrangler.jsonc:

npx wrangler kv namespace create API_CACHE

// wrangler.jsonc
{
  "name": "api-cache-worker",
  "main": "src/index.ts",
  "compatibility_date": "2025-01-01",
  "kv_namespaces": [
    { "binding": "API_CACHE", "id": "" }
  ]
}

The binding name (API_CACHE) is how you reference the namespace in code; the id ties it to the created namespace.

Step 2: Derive a stable cache key

A fragmented key space destroys your hit ratio. Normalize the URL before keying: drop tracking parameters, sort the rest, and lowercase the path.

function cacheKey(request: Request): string {
  const url = new URL(request.url);
  const keep = new URLSearchParams();
  for (const [k, v] of [...url.searchParams.entries()].sort()) {
    if (!k.startsWith("utm_") && k !== "fbclid") keep.append(k, v);
  }
  return `api:${url.pathname.toLowerCase()}?${keep.toString()}`;
}

Step 3: Read through KV, fetch upstream on miss

Read the cached body and its metadata in one call, and write back with an explicit TTL. Run the write inside ctx.waitUntil so it never delays the response.

interface Env {
  API_CACHE: KVNamespace;
}

const UPSTREAM = "https://api.example.com";
const TTL_SECONDS = 300;

export default {
  async fetch(request: Request, env: Env, ctx: ExecutionContext): Promise<Response> {
    if (request.method !== "GET") return fetch(rewriteUpstream(request));

    const key = cacheKey(request);
    const hit = await env.API_CACHE.getWithMetadata<string, { ct: string }>(key, "text");
    if (hit.value !== null) {
      return new Response(hit.value, {
        headers: {
          "content-type": hit.metadata?.ct ?? "application/json",
          "x-cache": "HIT",
        },
      });
    }

    const upstream = await fetch(rewriteUpstream(request));
    if (!upstream.ok) return upstream; // never cache error responses

    const body = await upstream.clone().text();
    const ct = upstream.headers.get("content-type") ?? "application/json";
    ctx.waitUntil(
      env.API_CACHE.put(key, body, { expirationTtl: TTL_SECONDS, metadata: { ct } }),
    );

    return new Response(body, { headers: { "content-type": ct, "x-cache": "MISS" } });
  },
};

function rewriteUpstream(request: Request): Request {
  const url = new URL(request.url);
  return new Request(UPSTREAM + url.pathname + url.search, request);
}

expirationTtl is in seconds and must be at least 60. Never cache a non-2xx response — caching a transient 500 pins the failure for the whole TTL.

Step 4: Honor upstream cache directives (optional)

If the upstream sends a sensible Cache-Control, derive the TTL from it instead of hardcoding:

function ttlFrom(response: Response, fallback: number): number {
  const cc = response.headers.get("cache-control") ?? "";
  const m = cc.match(/max-age=(\d+)/);
  const age = m ? parseInt(m[1], 10) : fallback;
  return Math.max(60, age); // KV minimum TTL is 60s
}

Local vs production divergence

Behavior	`wrangler dev --local`	Production
KV consistency	Immediate (Miniflare in-memory)	Eventual, up to ~60s global propagation
TTL enforcement	Honored	Honored; values may linger briefly past expiry
Write rate limit	Not enforced	~1 write/s per key, hard cap
Value size cap	Loosely checked	25 MB hard reject
`getWithMetadata`	Works	Works

The dangerous divergence is consistency: local KV reads your writes instantly, so read-after-write bugs only surface in production. Test for them explicitly rather than trusting wrangler dev.

Validation with Vitest

Use @cloudflare/vitest-pool-workers, which runs tests inside the Workers runtime with a real KV binding.

// src/index.test.ts
import { env, createExecutionContext, waitOnExecutionContext } from "cloudflare:test";
import { describe, it, expect, vi } from "vitest";
import worker from "./index";

describe("KV API cache", () => {
  it("serves a MISS then a HIT for the same URL", async () => {
    const fetchSpy = vi.spyOn(globalThis, "fetch").mockResolvedValue(
      new Response('{"price":42}', { headers: { "content-type": "application/json" } }),
    );

    const req = new Request("https://w.example.com/v1/price?sku=abc");

    const ctx1 = createExecutionContext();
    const r1 = await worker.fetch(req.clone(), env, ctx1);
    await waitOnExecutionContext(ctx1); // let ctx.waitUntil KV write complete
    expect(r1.headers.get("x-cache")).toBe("MISS");

    const ctx2 = createExecutionContext();
    const r2 = await worker.fetch(req.clone(), env, ctx2);
    await waitOnExecutionContext(ctx2);
    expect(r2.headers.get("x-cache")).toBe("HIT");
    expect(await r2.json()).toEqual({ price: 42 });

    expect(fetchSpy).toHaveBeenCalledTimes(1); // upstream hit exactly once
  });
});

The waitOnExecutionContext call is essential: without it the ctx.waitUntil KV write may not have completed before the second request reads, and the HIT assertion flakes.

Pitfalls

Eventual consistency surprises. A value written in one PoP can take up to ~60 seconds to appear in another. Do not rely on read-after-write across regions; if you need it, route through a Durable Object — see KV vs Durable Objects for edge state.
Caching errors. Always guard on upstream.ok before writing; a cached 429 or 500 poisons the key for the full TTL.
25 MB value cap. KV rejects values over 25 MB. For large payloads, store in R2 and keep a pointer in KV.
Write stampede. Concurrent misses on a hot key can exceed the ~1 write/s limit and return 429. Coalesce refreshes through a Durable Object.
Forgetting ctx.waitUntil. Awaiting the KV put inline adds its latency to every miss; defer it with waitUntil.

Production deployment checklist

KV namespace created and bound in wrangler.jsonc with the correct KV namespace created and bound in `wrangler.jsonc` with the correct `id`
Cache key normalized (tracking params stripped, params sorted, path lowercased)
expirationTtl `expirationTtl` set to at least 60 seconds
Non-2xx upstream responses never written to KV
KV writes deferred with KV writes deferred with `ctx.waitUntil`
Vitest covers MISS-then-HIT and asserts a single upstream call
Hot-key refreshes coalesced (Durable Object) if traffic exceeds ~1 write/s per key
Values verified to stay under the 25 MB cap

Frequently Asked Questions

What is the minimum TTL for a KV entry?

The expirationTtl must be at least 60 seconds. KV rejects shorter values. If you need finer-grained freshness control, layer the PoP-local Cache API in front of KV with a shorter max-age, or use a Durable Object for sub-second coordination.

Why does my freshly written value sometimes not appear on the next request?

KV is eventually consistent. A write can take up to roughly 60 seconds to propagate to all edge locations, and read-after-write is not guaranteed even within the same isolate. If the value is missing, the read simply falls through to origin and repopulates, which is safe. For guaranteed read-your-writes, use a Durable Object.

Should I cache POST or authenticated requests?

No. Cache only idempotent GET requests for shared, non-user-specific data. Authenticated or per-user responses risk leaking one user’s data to another from a shared key. Bypass the cache when an Authorization header or session cookie is present.

How do I avoid exceeding KV's write rate limit on a hot key?

KV allows about one write per second to a single key. Under concurrent misses, route the refresh through a Durable Object so exactly one writer repopulates the key per interval, deduplicating the origin fetch and the KV write.