Skip to content

Cache Stampede & the Thundering Herd

Caching is supposed to protect your database. But there’s a failure mode where the cache becomes the trigger for an outage: a single popular key expires, and in the microseconds before it’s repopulated, every request for it misses the cache simultaneously and slams the origin at once. That synchronized surge is a cache stampede (a thundering herd), and it has taken down systems that were running comfortably a moment earlier.

Imagine a key served 10,000 times/second from cache, with the database never seeing that load. The key has a TTL. The instant it expires:

t=0 cache has "homepage" → 10,000 req/s served from cache, DB idle
t=TTL "homepage" EXPIRES
t=TTL+ε 10,000 concurrent requests all MISS → all 10,000 hit the DB at once
DB melts → slow responses → cache stays empty → more pile on → cascade

The database wasn’t sized for 10,000 simultaneous identical queries because it never saw them — the cache absorbed them. Expiry removes that shield for a moment, and the herd tramples through. What did the cache buy us, and what did it cost? It bought enormous load reduction, but it quietly created a correlated failure at the expiry instant.

Fix 1: Locking / single-flight (request coalescing)

Section titled “Fix 1: Locking / single-flight (request coalescing)”

Let only one request recompute the value; everyone else waits for that result or briefly serves stale data.

key misses → first request acquires a lock, recomputes, repopulates cache
meanwhile → other requests see the lock → wait, or return last-known value
→ exactly ONE query hits the DB instead of 10,000

This is often called single-flight (coalesce concurrent identical loads into one). It’s the most direct fix — the cost is added coordination and deciding what the waiters do (block vs serve stale).

Fix 2: Stale-while-revalidate (early/async recompute)

Section titled “Fix 2: Stale-while-revalidate (early/async recompute)”

Don’t wait for hard expiry. Serve the stale value immediately while a background task refreshes it, or recompute probabilistically before the TTL so the refresh is spread out rather than synchronized.

key near expiry → serve current (slightly stale) value NOW
→ trigger ONE async refresh in the background
→ users never see a miss; the DB sees one refresh, not a herd

The trade-off: you accept serving data that’s briefly stale in exchange for never exposing a cold-cache window. For most read-heavy content (feeds, product pages) that’s an easy yes.

A subtler cause: if many keys are populated together (e.g. a cache warm-up or a deploy) with the same TTL, they all expire at the same instant — a herd across many keys at once. Add random jitter to each TTL so expirations spread out.

BAD: every key TTL = 3600s → all expire together → synchronized stampede
GOOD: TTL = 3600s ± random(0..300s) → expirations smeared across 5 minutes

This one-line change (randomize the TTL) prevents the correlated expiry that turns thousands of independent keys into a single thundering herd. It’s the cheapest fix and worth doing by default.

What does this buy us, and what does it cost? A cache buys massive origin offload — but if you ignore the expiry instant, it quietly concentrates load into a synchronized spike that’s worse than no cache. The fixes (single-flight, stale-while-revalidate, jittered TTLs) cost a little staleness and coordination to buy a smooth, herd-free load curve. The lesson generalizes: any time many actors share a deadline, expect them to act in unison — and design to spread them out. (See also Caching Strategies.)

  1. Why can a database that was idle suddenly be overwhelmed the moment a single hot key expires?
  2. How does single-flight / request coalescing reduce the load from a cache miss on a popular key?
  3. What does stale-while-revalidate trade away, and what does it guarantee in return?
  4. Why do identical TTLs cause a stampede across many keys, and how does jitter fix it?
  5. What is cache penetration, and how does caching “not found” or a Bloom filter defend against it?