Part 5 · Communication & APIs
A single program is easy: function A calls function B, the value comes back, control resumes. The call is instant, reliable, and typed by the compiler. The moment you split that program across two machines, every one of those guarantees evaporates. The call now crosses a network that can be slow, can drop packets, can deliver them twice, and can fail while looking exactly like success. Communication between services is the discipline of getting useful work done despite that hostile medium. This part of the book is about the choices you make at that boundary — and what each one buys you and costs you.
The one axis that underlies everything
Section titled “The one axis that underlies everything”You can organize the entire landscape of service-to-service communication along a single question:
Does the caller wait for the answer, or not?
That is the synchronous vs asynchronous axis, and it is not a minor implementation detail — it decides your latency, your failure behavior, your coupling, and how hard your system is to reason about.
SYNCHRONOUS (request → wait → response) ASYNCHRONOUS (send → move on) caller blocks until callee answers caller hands off a message and continues ───────────────────────────────── ──────────────────────────────────────── REST, RPC/gRPC, GraphQL queries message queues, events, pub/sub feels like a function call feels like dropping a letter in the mail simple to reason about resilient to the other side being down couples caller's fate to callee's decouples lifetimes, but adds indirectionAlmost every page that follows is a point on this axis. REST, RPC, and GraphQL are mostly the synchronous world — different shapes of “ask and wait.” Messaging, real-time streams, and event-driven architecture are mostly the asynchronous world — different shapes of “send and forget.” Holding this axis in your head turns a pile of acronyms into a map.
Why this is its own part of the book
Section titled “Why this is its own part of the book”The protocols here sit on top of the lower layers you have already met — DNS and routing gets the bytes to the right machine, load balancers spread them across replicas, and the fallacies of distributed computing explain why none of it is as reliable as a local call. Communication is the layer where application meaning crosses the wire: “create this order,” “give me this user,” “this payment settled.” Getting it right is the difference between a system that degrades gracefully and one that cascades into failure when one service hiccups.
The roadmap
Section titled “The roadmap”Read these roughly in order; each builds intuition for the next.
- REST APIs — the web’s lingua franca: resources, HTTP verbs, statelessness, idempotency, status codes, versioning, and pagination. The default, and why it’s the default.
- RPC & gRPC — making a remote call feel like a local function. gRPC, protobuf, and HTTP/2 streaming — when typed, fast, internal calls beat REST.
- GraphQL — letting the client specify exactly what it needs, killing over- and under-fetching — and the resolver/N+1 and caching costs that come with that power.
- Synchronous vs Asynchronous Messaging — the axis itself, examined head-on: temporal coupling, when to go async, and the latency-versus-resilience trade-off.
- Real-Time: Polling, WebSockets & SSE — pushing data toward clients: short/long polling, bidirectional WebSockets, and server-sent events, chosen by direction and scale.
- Event-Driven Architecture — building systems around facts that happened: events vs commands, pub/sub, event sourcing, choreography vs orchestration, and the dual-write problem lurking underneath.
The thread
Section titled “The thread”How do independent services cooperate over an unreliable network? By choosing, deliberately, where to wait and where not to. The synchronous protocols give you the clarity of a conversation; the asynchronous ones give you the durability of the postal system. Most real systems are a blend — a synchronous edge for the user-facing request, an asynchronous spine for everything that can happen later. The skill is knowing which is which, and paying the right cost on purpose.
Check your understanding
Section titled “Check your understanding”- State the single axis that organizes all of service-to-service communication, as one question.
- Which broad category — sync or async — do REST, RPC, and GraphQL mostly fall into, and which category do message queues and pub/sub fall into?
- In one phrase each, what does synchronous communication buy you and cost you? What about asynchronous?
- Why does splitting a program across machines remove the guarantees a local function call enjoys?
- Why do most production systems end up blending both styles rather than picking one?