Normalizing Query Parameters in Edge Cache Keys

This guide is part of Cache Key Normalization and the Vary Header at the Edge. It solves one concrete failure: your edge cache hit ratio is far lower than it should be because query strings that mean the same thing produce different cache keys.

The problem

Your product page is cacheable and should hit the edge nearly every time. Instead the hit ratio sits at 30%. Inspecting the keys, you find dozens of variants of the same logical URL:

/product/123?utm_source=newsletter&utm_medium=email
/product/123?fbclid=IwAR2x9...
/product/123?ref=twitter&utm_source=twitter
/product/123?color=red&size=9
/product/123?size=9&color=red

The last two are the same request with parameters in a different order. The first three carry tracking parameters that change nothing about the rendered bytes. Every one of these computes a distinct cache key, so every one is a miss and an origin fetch. The cache is fragmented into uselessness.

Root cause: the key is byte-exact, the query string is not

A cache key lookup is a byte-exact string comparison. The query string, by contrast, is semantically loose: parameter order does not matter to your application, and tracking parameters do not matter at all. The runtime makes no assumptions for you. At the edge you are inside a V8 isolate with only Web APIs, so you normalize the query string yourself with URL/URLSearchParams before it ever reaches the cache. The fix is three deterministic passes: strip tracking parameters, allowlist what remains, and sort.

Three deterministic passes turn an arbitrary query string into one canonical key.

Step 1: Strip tracking parameters

Tracking parameters never affect the response body. Match them by known prefixes and exact names, so a new utm_term or utm_content is caught without maintaining an exhaustive list.

const TRACKING_PREFIXES = ["utm_"];
const TRACKING_EXACT = new Set([
  "fbclid", "gclid", "gclsrc", "dclid", "msclkid",
  "mc_cid", "mc_eid", "_hsenc", "_hsmi", "ref", "ref_src",
]);

function isTrackingParam(key: string): boolean {
  const k = key.toLowerCase();
  if (TRACKING_EXACT.has(k)) return true;
  return TRACKING_PREFIXES.some((p) => k.startsWith(p));
}

Step 2: Allowlist the parameters that matter

Stripping tracking parameters is necessary but not sufficient — an attacker or a buggy link can append arbitrary junk (?cachebust=12345). Only an allowlist guarantees the keyspace is bounded. Declare, per route, exactly which parameters change the response:

interface QueryNormalizeConfig {
  /** Params that materially change the response for this route. */
  allowedParams: string[];
}

function filterParams(
  searchParams: URLSearchParams,
  config: QueryNormalizeConfig,
): Array<[string, string]> {
  const allowed = new Set(config.allowedParams.map((p) => p.toLowerCase()));
  const kept: Array<[string, string]> = [];
  for (const [key, value] of searchParams.entries()) {
    if (isTrackingParam(key)) continue;
    if (!allowed.has(key.toLowerCase())) continue;
    kept.push([key, value]);
  }
  return kept;
}

Step 3: Sort deterministically

Sort by key, then by value, so repeated keys (?tag=a&tag=b) also order stably. Never rely on insertion order.

function sortPairs(pairs: Array<[string, string]>): Array<[string, string]> {
  return [...pairs].sort((a, b) =>
    a[0] === b[0] ? a[1].localeCompare(b[1]) : a[0].localeCompare(b[0]),
  );
}

Step 4: Assemble the canonical key

Combine the passes into one pure function. This is the function both your cache read and cache write must call.

export function normalizeQueryForKey(
  rawUrl: string,
  config: QueryNormalizeConfig,
): string {
  const url = new URL(rawUrl);
  url.hash = ""; // fragments never reach the server
  url.pathname = url.pathname.toLowerCase();

  const kept = filterParams(url.searchParams, config);
  const sorted = sortPairs(kept);
  const canonical = new URLSearchParams(sorted).toString();

  return canonical
    ? `${url.origin}${url.pathname}?${canonical}`
    : `${url.origin}${url.pathname}`;
}

Configuration snippet

Map routes to their allowlists in one table, resolved before normalization runs. In a Next.js middleware.ts, declare the matcher and reuse the same function:

// middleware.ts
import { NextRequest, NextResponse } from "next/server";
import { normalizeQueryForKey } from "./lib/normalizeQuery";

export const config = { matcher: ["/product/:path*", "/search"] };

const ROUTE_ALLOWLIST: Record<string, string[]> = {
  "/product": ["color", "size"],
  "/search": ["q", "page"],
};

export function middleware(req: NextRequest) {
  const path = "/" + (req.nextUrl.pathname.split("/")[1] ?? "");
  const allowed = ROUTE_ALLOWLIST[path] ?? [];
  const canonical = normalizeQueryForKey(req.url, { allowedParams: allowed });
  if (canonical !== req.url.split("#")[0]) {
    return NextResponse.rewrite(new URL(canonical));
  }
  return NextResponse.next();
}

Local vs production divergence

Behavior	Local (`next dev` / `wrangler dev`)	Production edge
Cache hit/miss	No shared edge cache; every request misses	Canonical key drives real hit/miss across PoPs
Param order from tooling	Often preserved as typed	Arbitrary; clients reorder freely
Tracking params	Rare in manual testing	Constant from real referral traffic
`localeCompare` ordering	Same algorithm	Same algorithm; deterministic across PoPs

The key risk is that local testing looks fine because there is no shared cache to fragment. Always assert the canonical key directly in a unit test rather than eyeballing hit ratios locally.

Vitest validation

import { describe, expect, it } from "vitest";
import { normalizeQueryForKey } from "./normalizeQuery";

const cfg = { allowedParams: ["color", "size"] };

describe("normalizeQueryForKey", () => {
  it("produces the same key regardless of param order", () => {
    const a = normalizeQueryForKey("https://x.com/product/1?color=red&size=9", cfg);
    const b = normalizeQueryForKey("https://x.com/product/1?size=9&color=red", cfg);
    expect(a).toBe(b);
  });

  it("strips utm_* and fbclid", () => {
    const key = normalizeQueryForKey(
      "https://x.com/product/1?utm_source=nl&fbclid=abc&color=red",
      cfg,
    );
    expect(key).toBe("https://x.com/product/1?color=red");
  });

  it("drops non-allowlisted params", () => {
    const key = normalizeQueryForKey("https://x.com/product/1?cachebust=99&size=9", cfg);
    expect(key).toBe("https://x.com/product/1?size=9");
  });

  it("lowercases the path and drops the fragment", () => {
    const key = normalizeQueryForKey("https://x.com/Product/1?size=9#reviews", cfg);
    expect(key).toBe("https://x.com/product/1?size=9");
  });

  it("yields a bare path when no allowed params survive", () => {
    const key = normalizeQueryForKey("https://x.com/product/1?utm_source=nl", cfg);
    expect(key).toBe("https://x.com/product/1");
  });
});

Pitfalls

Blocklisting instead of allowlisting. A blocklist of known tracking params still admits arbitrary unknown junk like ?cachebust=.... Always allowlist; the keyspace must be bounded.
Forgetting repeated keys. ?tag=a&tag=b and ?tag=b&tag=a differ unless you sort by value too. Sort the full (key, value) pairs.
Case-sensitive param names. ?Color=red and ?color=red fragment unless you lowercase keys when matching the allowlist.
Normalizing on read but not on write. If your cache write uses the raw URL and your read uses the canonical key, every read misses. Call the same function on both paths.
Lowercasing values blindly. Lowercase param names for matching, but do not lowercase values unless the route is genuinely case-insensitive — ?q=iPhone and ?q=iphone may be different searches.

Production deployment checklist

Every cacheable route has an explicit allowlist of response-affecting params
Tracking params (utm_*, fbclid, gclid Tracking params (`utm_*`, `fbclid`, `gclid`, …) are stripped before keying
Surviving params are sorted by key and value
The same normalizeQueryForKey The same `normalizeQueryForKey` runs on both cache read and write
Param names are matched case-insensitively; values are left intact
A unit test asserts order-independence and tracking-strip behavior

Frequently Asked Questions

Why allowlist instead of blocklist tracking parameters?

A blocklist only removes parameters you already know about, so any new or arbitrary parameter like cachebust still leaks into the key and fragments the cache. An allowlist keeps only the parameters that change the response, which bounds the keyspace no matter what junk arrives.

Do I need to sort parameter values, not just names?

Yes, when a key can repeat. Query strings like tag=a&tag=b and tag=b&tag=a are different byte strings unless you sort the full key-and-value pairs, so sort by key first and then by value.

Should I lowercase query parameter values?

Lowercase parameter names when matching the allowlist, but leave values intact unless the route is genuinely case-insensitive. Lowercasing a search term like q=iPhone could merge two distinct searches.

Why does my hit ratio look fine locally but drop in production?

Local dev servers have no shared edge cache, so every request simply misses and you never observe fragmentation. Real traffic carries reordered and tracking parameters across many PoPs. Assert the canonical key in a unit test instead of judging by local behavior.