Transactions & ACID

Real operations are rarely a single write. “Transfer $100” is two writes — debit one account, credit another — and if the system crashes between them, money vanishes or appears from nowhere. A transaction is the database’s promise to treat a group of operations as one indivisible, correct unit: all of it happens, or none of it does. This is the bedrock of correctness in data systems.

What does a transaction buy us, and what does it cost? It buys you the right to not reason about every partial-failure and concurrent-access nightmare yourself — the database handles it. It costs performance (coordination isn’t free) and, once your data spans machines, it costs a great deal more.

ACID, defined

The guarantees go by the acronym ACID. Three of the four are about safety; one is about durability.

Atomicity — all-or-nothing. If any part of the transaction fails, the whole thing rolls back as if it never happened. (The misleading “A” — it means abortability, not concurrency.)
Consistency — the transaction moves the database from one valid state to another, respecting all declared rules (constraints, foreign keys). This C is really about your invariants, partly upheld by you; it’s the odd one out, arguably added to make the acronym pronounceable.
Isolation — concurrent transactions don’t step on each other. Each runs as if it were the only one. This is the deep, expensive, interesting one.
Durability — once committed, it survives crashes. The data is on stable storage (disk, often replicated) and won’t evaporate on a power loss.

Atomicity and durability are largely binary — you have them or you don’t. Isolation is a dial, and turning it costs performance, so databases offer levels.

Why isolation is a spectrum

Perfect isolation (serializability) means the result is as if transactions ran one-at-a-time in some order. It’s the strongest, simplest-to-reason-about guarantee — and the slowest, because it forces concurrent transactions to wait on each other. So databases offer weaker levels that allow more concurrency by permitting specific anomalies. To choose a level, you must know which anomalies it lets through.

The three classic anomalies

DIRTY READ          T1 writes X (uncommitted) → T2 reads that X → T1 rolls back.
                    T2 read a value that never officially existed.

NON-REPEATABLE READ T1 reads X = 5 → T2 commits X = 8 → T1 reads X = 8.
                    Same query, same transaction, two different answers.

PHANTOM READ        T1 reads "all orders > $100" → T2 inserts a new $200 order →
                    T1 re-runs the query and a NEW ROW appears that wasn't there.

The level/anomaly ladder

Isolation level	Dirty read	Non-repeatable	Phantom
Read Uncommitted	possible	possible	possible
Read Committed	prevented	possible	possible
Repeatable Read	prevented	prevented	possible*
Serializable	prevented	prevented	prevented

*Some engines (e.g. Postgres) prevent phantoms at Repeatable Read via snapshot isolation; the SQL standard does not require it. The standard is loose here, so behavior varies by database — always verify what your engine actually does.

Distributed transactions and two-phase commit

Everything above assumes one machine. Once a transaction must commit atomically across multiple nodes — two shards, two databases — you need them to agree: all commit or all abort, despite crashes and network gaps. The classic protocol is two-phase commit (2PC).

   PHASE 1 (prepare):  coordinator → all participants: "can you commit?"
                       each writes its changes durably, locks rows, replies YES/NO
   PHASE 2 (commit):   if ALL said YES → "commit!";  if any NO → "abort!"
                       participants finalize and release locks

It works, but the costs are steep and reveal why distributed transactions are avoided when possible:

Blocking. If the coordinator crashes after participants vote YES but before sending the decision, participants are stuck — holding locks, unable to commit or abort, until it recovers. 2PC is not fault-tolerant against a coordinator failure.
Latency & locks. Two network round-trips, with rows locked across the whole window. Throughput drops and contention rises.
Coordinator is a single point of failure unless made highly available — which means real fault-tolerant agreement, i.e. Consensus (Raft/Paxos). Consensus is what 2PC should lean on to survive a coordinator crash, and it’s the deep machinery underneath robust distributed commits.

The thread

A transaction lets you treat many writes as one correct unit, so you don’t have to hand-code recovery from every partial failure or race. ACID names the guarantees: atomicity and durability you mostly have or don’t, while isolation is a dial you trade against throughput by choosing which anomalies you can tolerate. On one machine this is cheap and well-understood. Stretch it across machines and the cost explodes — 2PC buys atomic cross-node commit at the price of blocking, latency, and a fragile coordinator that ultimately needs consensus to be safe. The recurring wisdom: keep things that must be atomic together close together, and reach for distributed transactions only when you truly must.

Check your understanding

Spell out ACID. Which letter is about concurrency, which is the “odd one out,” and why?
Define dirty read, non-repeatable read, and phantom read with a one-line example each.
Why is isolation a spectrum rather than on/off? What does Serializable guarantee that Read Committed doesn’t?
Contrast locking and MVCC as ways to enforce isolation. Why can MVCC let readers not block writers?
Walk through two-phase commit and name its three big costs. Why does a robust coordinator need consensus?