Backpressure & Flow Control

Every pipeline has a producer and a consumer, and they rarely run at the same speed. When the producer is faster — a firehose of events feeding a slower processor — something has to give. The naive answer is “buffer the difference.” It works, right up until it doesn’t, and when it fails it fails catastrophically: out-of-memory, GC death spirals, and a system that’s slower under recovery than it ever was under load. Backpressure is the discipline of letting a slow consumer tell a fast producer to slow down, instead of silently drowning.

The fundamental imbalance

Picture the pipeline as a pipe with a tank in the middle:

producer ──▶ [ queue / buffer ] ──▶ consumer
  100/s          fills up            40/s
                 at 60/s  ←── the gap has to go somewhere

If producer rate > consumer rate for any sustained period, the buffer between them grows without bound. There are only ever three things you can do with the overflow, and a system that doesn’t choose one explicitly has chosen the worst one by accident:

Buffer it — store the excess and hope the producer slows down later.
Drop it — discard work you can’t handle (load shedding).
Block it — refuse to accept more until the consumer catches up (backpressure).

Why unbounded buffers are a trap

The seductive option is (1) with an unbounded queue: “just keep everything, we’ll catch up.” This is the single most common reliability mistake in distributed systems. An unbounded buffer doesn’t solve the rate mismatch — it hides it while quietly converting a throughput problem into a memory problem.

queue depth
  │                            ╱ OOM / crash
  │                         ╱
  │                      ╱
  │                   ╱
  │             ____╱
  │     ______╱
  └────────────────────────────── time
   (looks fine)  (latency climbs)  (dead)

The damage compounds:

Latency explodes. A request entering a 10-million-item queue waits behind all 10 million. The buffer that was supposed to protect you is now the source of your tail latency.
Memory dies. The queue grows until the process is OOM-killed — at which point you lose everything in the buffer, not just the overflow.
Recovery is impossible. Once behind, the system must drain the backlog and serve new load. It’s slower exactly when it most needs to be fast — a death spiral.

Flow control: pull instead of push

The fix is to invert who controls the rate. In a push model, the producer decides when to send, and the consumer must cope. In a pull (demand-driven) model, the consumer requests the next batch only when it has capacity. The producer physically cannot get ahead, because it isn’t allowed to send what wasn’t requested.

PUSH (producer-driven)          PULL (consumer-driven)
producer ──▶ consumer            producer ◀── "give me 10" ── consumer
  sends whenever                 sends only what's asked for
  ⇒ consumer drowns              ⇒ rate self-limits

This is the core idea behind Reactive Streams and the credit-based flow control in message queues: the consumer grants the producer “credits” for N items, and the producer may only send up to its outstanding credit. Pull-based flow control makes backpressure the default rather than something you bolt on.

The mitigation toolkit

Bounded queues

Cap the buffer. When a bounded queue is full, put() either blocks (propagating backpressure upstream) or rejects (triggering shedding). The bound is not a limitation — it’s the safety valve. A bounded queue forces you to decide what to do when full, which is exactly the decision the unbounded queue let you dodge.

Load shedding

When you genuinely can’t slow the producer (it’s the open internet), drop work deliberately: reject excess requests with a 429 Too Many Requests, sample the firehose, or drop the lowest-value items. Shedding load early is graceful; collapsing under load later is not. A system that sheds 10% stays up for the other 90%; a system that buffers everything serves 0%.

Propagate backpressure end-to-end

Backpressure is only useful if it travels. A blocked consumer must slow its upstream, which slows its upstream, all the way back to the original source — TCP’s flow control, a client receiving 429s and backing off, a request rejected at the edge. A pipeline where backpressure stops halfway just relocates the unbounded buffer to wherever the chain breaks.

What does this buy us, and what does it cost?

Backpressure buys survival under overload: the system degrades predictably instead of crashing, latency stays bounded, and memory stays safe. The cost is that you must now say no — to slower producers, dropped messages, or rejected requests. That’s uncomfortable; “we lost some data” feels worse than “we kept everything” right until the unbounded buffer kills the whole process and you lose it all anyway. The mature trade is to accept bounded, visible, controlled loss in exchange for a system that stays alive.

Check your understanding

What are the only three things a system can do when producers outpace consumers?
Why does an unbounded buffer convert a throughput problem into a worse (memory + latency) problem?
How does a pull-based flow model make it structurally impossible for the producer to get ahead?
Why is a bounded queue’s “full” state a feature rather than a failure?
What goes wrong if backpressure is applied at the consumer but not propagated back to the source?