Replication

A single database server is a single point of failure and a single point of throughput. The disk dies, the box reboots, the read traffic outgrows one CPU — and your whole system goes with it. Replication is the answer: keep complete copies of your data on more than one machine. It is the most common, most important reliability technique in data systems.

But copying introduces the deepest tension in distributed data. Replication buys you availability and read scale, and it charges you in consistency. The instant there are two copies, they can disagree — and reconciling that disagreement is the entire subject of this page.

What replication buys (and the bill it sends)

Three concrete wins, one recurring cost:

Availability — if one node dies, another has the data, so the system survives.
Read scalability — spread read queries across many copies; ten replicas can serve roughly ten times the reads.
Latency — put a copy near your users, so reads don’t cross an ocean.

The bill: the copies must be kept in sync, and they can’t all be perfectly in sync at the same instant. Every replication design is a different answer to who accepts writes, and how fast do the others find out?

Three topologies

Single-leader (primary/replica)

One node is the leader (primary). All writes go to it. The leader streams its changes to one or more followers (replicas), which apply them in order and serve reads.

        writes
          │
          ▼
     ┌─────────┐   replication stream   ┌──────────┐
     │  LEADER  │ ─────────────────────► │ FOLLOWER │ ── reads
     └─────────┘ ─────────────┐         └──────────┘
                               └───────► ┌──────────┐
                                          │ FOLLOWER │ ── reads
                                          └──────────┘

This is the workhorse — Postgres, MySQL, MongoDB, and most relational setups default to it. Writes have one clear ordering (no conflicts), reads scale across followers, and it’s simple to reason about. The catch: the leader is a write bottleneck, and if it dies you must failover (promote a follower), a delicate dance covered alongside Read Replicas & CQRS.

Multi-leader

Several nodes accept writes — typically one leader per data center — and they replicate to each other. This shines for multi-region setups (writes stay local and fast) and offline-capable clients. But two leaders can accept conflicting writes to the same row at the same time, and now you have a conflict that must be resolved (last-write-wins, application merge, CRDTs). Conflict resolution is genuinely hard, which is why multi-leader is a specialist tool, not a default.

Leaderless

No designated leader. The client (or a coordinator) writes to several nodes at once and reads from several too, using quorums to paper over stragglers. Dynamo-style systems — Cassandra, DynamoDB, Riak — work this way.

   Write to W nodes,  Read from R nodes,  total N replicas.
   If  W + R > N , every read overlaps at least one node that saw the latest write.
   e.g. N=3, W=2, R=2  →  2 + 2 > 3  ✓  (tunable consistency)

Leaderless trades the simplicity of one ordering for no single point of write failure and tunable consistency knobs (W and R), at the cost of needing repair mechanisms (read-repair, anti-entropy) to heal divergence.

Synchronous vs asynchronous

Orthogonal to topology is when the leader considers a write “done”:

SYNC:   leader waits for follower to confirm  →  durable on 2 nodes, but SLOW,
                                                  and stalls if the follower lags
ASYNC:  leader replies immediately, ships later → FAST, but a leader crash can
                                                  LOSE the not-yet-shipped writes

Synchronous guarantees the follower has the data before acknowledging the client — no data loss on leader failure — but the write is only as fast as the slowest follower, and a stuck follower can stall all writes.
Asynchronous acknowledges immediately and ships changes in the background — fast and resilient to slow followers — but a crash between ack and ship loses those writes.

Many systems run semi-synchronous: one follower synchronous (durability), the rest async (speed) — buying most of the safety for most of the performance.

Replication lag and the anomalies it causes

Async replication means followers are always a little behind — milliseconds usually, seconds or more when overloaded. This replication lag is invisible until a user trips over it.

Two cousins:

Monotonic reads — read once from a fresh follower (see comment), refresh and hit a stale follower (comment vanishes again). Time appears to run backward. Fix: pin each user to one follower so they never see older state than they already saw.
Consistent prefix — with partitioned data, you might see an answer before its question. Fix: ensure causally-related writes land in the same partition or are ordered.

These anomalies are not bugs in the database; they are the price of asynchronous copying, and they sit on the spectrum formalized in Consistency Models.

The thread

Replication is how data systems refuse to die with a single machine and how they serve far more reads than one box ever could. The moment you make a second copy, though, you’ve signed up for a managing a gap between copies — synchronous closes the gap at the cost of speed, asynchronous keeps speed at the cost of a lag window where users can see stale or vanishing data. Single-leader keeps ordering simple; multi-leader and leaderless trade that simplicity for write availability and geography. There is no configuration that gives you all the reads, all the availability, and perfect consistency at once — pick which two you’ll pay for.

Check your understanding

List the three things replication buys you and the one recurring cost it always charges.
Contrast single-leader, multi-leader, and leaderless replication. What problem does each one’s structure introduce?
In a leaderless quorum with N=5, what W and R guarantee that a read sees the latest write? Why?
Explain synchronous vs asynchronous replication, and what “semi-synchronous” buys.
Describe the read-after-write anomaly and one concrete way to prevent it.