The Fallacies of Distributed Computing

In the 1990s, engineers at Sun Microsystems noticed that newcomers to distributed systems kept making the same wrong assumptions — and that those assumptions kept causing the same failures. They wrote them down. The result, the Fallacies of Distributed Computing, is the single most useful checklist in this field. Each fallacy is a thing that is true on one machine and false the moment a network is involved. Believe one of them and you’ll write code that works perfectly in testing and falls over in production.

This page is the overview’s “why systems are hard” made painfully concrete.

1. The network is reliable

The seductive assumption: if I send a message, it arrives. On a network, packets are dropped, connections reset, and entire links go dark — and crucially, a failure can be silent. You send a request and hear nothing. Did the server never get it? Did it process it and the reply got lost? You cannot tell. This is why “did my payment go through?” is genuinely hard.

   Client                Server
     |  ---- charge $20 ---->   | (processed! deducts balance)
     |  X---- "OK" -------------|  (reply lost)
     |  (timeout — retries)     |
     |  ---- charge $20 ---->   | (processed AGAIN — double charge)

2. Latency is zero

In-process, a function call returns in nanoseconds. A network call across a datacenter takes hundreds of microseconds; across the planet, hundreds of milliseconds — a factor of a million or more. The classic failure is the chatty interface: a loop that makes one network call per item. A hundred items that would be instant in memory becomes a hundred round-trips. The fix — batching, fetching related data together — only exists because someone forgot latency isn’t zero. (See Latency & Throughput.)

3. Bandwidth is infinite

You can move a gigabyte between two functions for free. Across a network, bandwidth is finite and shared. The classic failure: an API that returns the entire object graph “to be safe,” and a mobile client on a slow link that now waits seconds to load a screen — or a fan-out query that ships terabytes between services and saturates the link for everyone else. Pagination, field selection (GraphQL, sparse fieldsets), and compression are all responses to this fallacy.

4. The network is secure

The wire is not yours. Between your client and your server sit routers, ISPs, and possibly a hostile actor on the same coffee-shop Wi-Fi. Anything sent in plaintext can be read; anything unauthenticated can be forged. The failure here is assuming “it’s just our internal services talking” — until an attacker is inside the network and finds every service trusts every other by default. This fallacy is why we have TLS everywhere, mutual authentication, and the modern “zero-trust” stance: the network is hostile until proven otherwise.

5. Topology doesn’t change

You hard-code an IP address. It works. Then the server is replaced, the load balancer reshuffles, a container is rescheduled onto a different host, and your hard-coded address points at nothing — or worse, at someone else’s service. In a cloud or container environment the topology changes constantly and on its own. Service discovery, DNS, and health-checked load balancers exist precisely because you may not assume the map you drew yesterday is still accurate.

6. There is one administrator

On your laptop, you are god. In production, the system spans teams, cloud providers, third-party APIs, and DNS registrars — each with its own change windows, permissions, and pager. The failure: a debug session that requires a config change on a box owned by a team in another time zone, or an outage caused by an upstream provider you can’t even log into. “Who can fix this right now?” sometimes has the answer “nobody on this call.” Designs must assume coordination is slow and partial, which is why loose coupling and graceful degradation matter.

7. Transport cost is zero

Moving data isn’t free in money or in CPU, even when it’s fast enough. Every byte crossing a network must be serialized, sent, received, and deserialized — and cloud providers bill you for egress, often heavily. The failure: a “small” architectural choice to chat between two services across regions that quietly produces a five-figure monthly bandwidth bill, or a JSON-everywhere design whose serialization CPU dwarfs the actual work. This fallacy is why people care about wire formats (Protobuf vs JSON), data locality, and not crossing availability-zone or region boundaries casually.

8. The network is homogeneous

Your tidy world of identical servers meets reality: clients on 5G and on dial-up, services written in five languages, different OS versions, different TCP stacks, a load balancer that interprets a header slightly differently than the app behind it. The failure mode is the bug that only appears for one kind of client or one network path. Interoperability via well-defined, version-tolerant protocols — not “everyone runs my exact stack” — is the only defense.

Why these eight, and why they recur

Notice the shape: every fallacy is a property that holds inside one machine and breaks across the network. Reliable, instant, free, private, stable, singly-governed, costless, uniform — that is an exact description of a single computer, and an exact list of what you lose the moment you distribute.

   IN-PROCESS ASSUMPTION   →   DISTRIBUTED REALITY
   reliable                →   messages vanish, silently
   zero latency            →   ~1,000,000× slower across the planet
   infinite bandwidth      →   finite, shared, billed
   secure                  →   the wire is hostile
   stable topology         →   addresses move on their own
   one admin               →   many owners, slow coordination
   free transport          →   serialization CPU + egress $$$
   homogeneous             →   every client and stack differs

The thread

What does distribution buy us, and what does it cost? It buys scale, redundancy, and the ability to outgrow a single machine. It costs us all eight of these guarantees at once. The fallacies are the bill. Every robust distributed design — retries with idempotency, timeouts, circuit breakers, service discovery, encryption, batching — is a line item paying down one of these costs. You don’t get to avoid the bill; you only get to pay it on purpose or get surprised by it in an incident review.

Check your understanding

Why is a silent network failure worse than a loud one, and what discipline does it force on every non-idempotent operation?
Give the in-process truth and the distributed reality for the “latency is zero” fallacy. What is a “chatty interface”?
“It’s just our internal services talking, so we don’t need encryption.” Which fallacy is this, and why is it dangerous?
Explain how “topology doesn’t change” and “transport cost is zero” each translate into a specific production failure (one operational, one financial).
What single property of a single machine do all eight fallacies describe losing? Use that to explain why the list recurs for every new engineer.