Normalizing Query Parameters in Edge Cache Keys
This guide is part of Cache Key Normalization and the Vary Header at the Edge. It solves one concrete failure: your edge cache hit ratio is far lower than it should be because query strings that mean the same thing produce different cache keys.
The problem
Your product page is cacheable and should hit the edge nearly every time. Instead the hit ratio sits at 30%. Inspecting the keys, you find dozens of variants of the same logical URL:
/product/123?utm_source=newsletter&utm_medium=email
/product/123?fbclid=IwAR2x9...
/product/123?ref=twitter&utm_source=twitter
/product/123?color=red&size=9
/product/123?size=9&color=red
The last two are the same request with parameters in a different order. The first three carry tracking parameters that change nothing about the rendered bytes. Every one of these computes a distinct cache key, so every one is a miss and an origin fetch. The cache is fragmented into uselessness.
Root cause: the key is byte-exact, the query string is not
A cache key lookup is a byte-exact string comparison. The query string, by contrast, is semantically loose: parameter order does not matter to your application, and tracking parameters do not matter at all. The runtime makes no assumptions for you. At the edge you are inside a V8 isolate with only Web APIs, so you normalize the query string yourself with URL/URLSearchParams before it ever reaches the cache. The fix is three deterministic passes: strip tracking parameters, allowlist what remains, and sort.
Step 1: Strip tracking parameters
Tracking parameters never affect the response body. Match them by known prefixes and exact names, so a new utm_term or utm_content is caught without maintaining an exhaustive list.
const TRACKING_PREFIXES = ["utm_"];
const TRACKING_EXACT = new Set([
"fbclid", "gclid", "gclsrc", "dclid", "msclkid",
"mc_cid", "mc_eid", "_hsenc", "_hsmi", "ref", "ref_src",
]);
function isTrackingParam(key: string): boolean {
const k = key.toLowerCase();
if (TRACKING_EXACT.has(k)) return true;
return TRACKING_PREFIXES.some((p) => k.startsWith(p));
}
Step 2: Allowlist the parameters that matter
Stripping tracking parameters is necessary but not sufficient — an attacker or a buggy link can append arbitrary junk (?cachebust=12345). Only an allowlist guarantees the keyspace is bounded. Declare, per route, exactly which parameters change the response:
interface QueryNormalizeConfig {
/** Params that materially change the response for this route. */
allowedParams: string[];
}
function filterParams(
searchParams: URLSearchParams,
config: QueryNormalizeConfig,
): Array<[string, string]> {
const allowed = new Set(config.allowedParams.map((p) => p.toLowerCase()));
const kept: Array<[string, string]> = [];
for (const [key, value] of searchParams.entries()) {
if (isTrackingParam(key)) continue;
if (!allowed.has(key.toLowerCase())) continue;
kept.push([key, value]);
}
return kept;
}
Step 3: Sort deterministically
Sort by key, then by value, so repeated keys (?tag=a&tag=b) also order stably. Never rely on insertion order.
function sortPairs(pairs: Array<[string, string]>): Array<[string, string]> {
return [...pairs].sort((a, b) =>
a[0] === b[0] ? a[1].localeCompare(b[1]) : a[0].localeCompare(b[0]),
);
}
Step 4: Assemble the canonical key
Combine the passes into one pure function. This is the function both your cache read and cache write must call.
export function normalizeQueryForKey(
rawUrl: string,
config: QueryNormalizeConfig,
): string {
const url = new URL(rawUrl);
url.hash = ""; // fragments never reach the server
url.pathname = url.pathname.toLowerCase();
const kept = filterParams(url.searchParams, config);
const sorted = sortPairs(kept);
const canonical = new URLSearchParams(sorted).toString();
return canonical
? `${url.origin}${url.pathname}?${canonical}`
: `${url.origin}${url.pathname}`;
}
Configuration snippet
Map routes to their allowlists in one table, resolved before normalization runs. In a Next.js middleware.ts, declare the matcher and reuse the same function:
// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { normalizeQueryForKey } from "./lib/normalizeQuery";
export const config = { matcher: ["/product/:path*", "/search"] };
const ROUTE_ALLOWLIST: Record<string, string[]> = {
"/product": ["color", "size"],
"/search": ["q", "page"],
};
export function middleware(req: NextRequest) {
const path = "/" + (req.nextUrl.pathname.split("/")[1] ?? "");
const allowed = ROUTE_ALLOWLIST[path] ?? [];
const canonical = normalizeQueryForKey(req.url, { allowedParams: allowed });
if (canonical !== req.url.split("#")[0]) {
return NextResponse.rewrite(new URL(canonical));
}
return NextResponse.next();
}
Local vs production divergence
| Behavior | Local (next dev / wrangler dev) |
Production edge |
|---|---|---|
| Cache hit/miss | No shared edge cache; every request misses | Canonical key drives real hit/miss across PoPs |
| Param order from tooling | Often preserved as typed | Arbitrary; clients reorder freely |
| Tracking params | Rare in manual testing | Constant from real referral traffic |
localeCompare ordering |
Same algorithm | Same algorithm; deterministic across PoPs |
The key risk is that local testing looks fine because there is no shared cache to fragment. Always assert the canonical key directly in a unit test rather than eyeballing hit ratios locally.
Vitest validation
import { describe, expect, it } from "vitest";
import { normalizeQueryForKey } from "./normalizeQuery";
const cfg = { allowedParams: ["color", "size"] };
describe("normalizeQueryForKey", () => {
it("produces the same key regardless of param order", () => {
const a = normalizeQueryForKey("https://x.com/product/1?color=red&size=9", cfg);
const b = normalizeQueryForKey("https://x.com/product/1?size=9&color=red", cfg);
expect(a).toBe(b);
});
it("strips utm_* and fbclid", () => {
const key = normalizeQueryForKey(
"https://x.com/product/1?utm_source=nl&fbclid=abc&color=red",
cfg,
);
expect(key).toBe("https://x.com/product/1?color=red");
});
it("drops non-allowlisted params", () => {
const key = normalizeQueryForKey("https://x.com/product/1?cachebust=99&size=9", cfg);
expect(key).toBe("https://x.com/product/1?size=9");
});
it("lowercases the path and drops the fragment", () => {
const key = normalizeQueryForKey("https://x.com/Product/1?size=9#reviews", cfg);
expect(key).toBe("https://x.com/product/1?size=9");
});
it("yields a bare path when no allowed params survive", () => {
const key = normalizeQueryForKey("https://x.com/product/1?utm_source=nl", cfg);
expect(key).toBe("https://x.com/product/1");
});
});
Pitfalls
- Blocklisting instead of allowlisting. A blocklist of known tracking params still admits arbitrary unknown junk like
?cachebust=.... Always allowlist; the keyspace must be bounded. - Forgetting repeated keys.
?tag=a&tag=band?tag=b&tag=adiffer unless you sort by value too. Sort the full(key, value)pairs. - Case-sensitive param names.
?Color=redand?color=redfragment unless you lowercase keys when matching the allowlist. - Normalizing on read but not on write. If your cache write uses the raw URL and your read uses the canonical key, every read misses. Call the same function on both paths.
- Lowercasing values blindly. Lowercase param names for matching, but do not lowercase values unless the route is genuinely case-insensitive —
?q=iPhoneand?q=iphonemay be different searches.
Production deployment checklist
- Tracking params (
utm_*,fbclid,gclid - The same
normalizeQueryForKey
Frequently Asked Questions
Why allowlist instead of blocklist tracking parameters?
A blocklist only removes parameters you already know about, so any new or arbitrary parameter like cachebust still leaks into the key and fragments the cache. An allowlist keeps only the parameters that change the response, which bounds the keyspace no matter what junk arrives.
Do I need to sort parameter values, not just names?
Yes, when a key can repeat. Query strings like tag=a&tag=b and tag=b&tag=a are different byte strings unless you sort the full key-and-value pairs, so sort by key first and then by value.
Should I lowercase query parameter values?
Lowercase parameter names when matching the allowlist, but leave values intact unless the route is genuinely case-insensitive. Lowercasing a search term like q=iPhone could merge two distinct searches.
Why does my hit ratio look fine locally but drop in production?
Local dev servers have no shared edge cache, so every request simply misses and you never observe fragmentation. Real traffic carries reordered and tracking parameters across many PoPs. Assert the canonical key in a unit test instead of judging by local behavior.