Part 9 · Case Studies — Designing Real Systems
The previous eight parts built your vocabulary: caching, replication, sharding, queues, consensus, rate limiting. This part puts the vocabulary to work. Each page here is a case study — a full “design X” walkthrough — and they all follow the same skeleton. The skeleton matters more than any single answer, because a system design interview (or a real design doc) is not a trivia quiz. It is a test of whether you can take a vague prompt and drive it toward a defensible architecture under constraints.
The framework: six moves, in order
Section titled “The framework: six moves, in order”The single biggest mistake is jumping straight to boxes-and-arrows. Resist it. Walk these six steps in order, out loud, every time.
1. CLARIFY What are we actually building? (functional + non-functional)2. ESTIMATE How big is it? (QPS, storage, bandwidth — back-of-envelope)3. API What are the endpoints? (the contract clients depend on)4. DATA MODEL What do we store, and how is it shaped/keyed?5. ARCHITECTURE High-level boxes and arrows (the happy path)6. DEEP DIVE Find the bottleneck, scale it, name the trade-offs └─ thread throughout: what does each choice BUY us, and what does it COST?1. Clarify requirements
Section titled “1. Clarify requirements”Split requirements into two buckets. Functional requirements are what the system does — “shorten a URL,” “deliver a message,” “return autocomplete suggestions.” Non-functional requirements are the qualities it must have while doing it — latency, availability, consistency, durability, scale. Non-functional requirements are where the design actually lives: “a chat app” tells you almost nothing, but “100M users, sub-200ms delivery, messages must never silently disappear” tells you almost everything.
2. Back-of-envelope estimation
Section titled “2. Back-of-envelope estimation”Turn the scale into numbers you can design against. You need three:
- QPS — daily active users × actions per user ÷ 86,400 seconds, then multiply by a peak factor (typically 2–3×). Distinguish read QPS from write QPS.
- Storage — bytes per record × records per day × retention. Project to years.
- Bandwidth — QPS × payload size.
You are not chasing precision; you are chasing the order of magnitude that tells you whether one box suffices or you need a sharded fleet. See Back-of-Envelope Estimation and the latency numbers every engineer should know.
3. API sketch
Section titled “3. API sketch”A handful of endpoints — method, path, key params, return shape. This forces you to name the core operations and exposes hidden requirements (pagination, auth, idempotency keys). Keep it small; three to five endpoints is plenty.
4. Data model
Section titled “4. Data model”What entities exist, what fields they carry, and — crucially — what you key and index on. The access pattern dictates the model, not the other way around. Decide SQL vs NoSQL here, and anticipate your partition key before you need it.
5. High-level design
Section titled “5. High-level design”Now draw the boxes: clients → load balancer → stateless app tier → caches → databases → async workers. Show the happy path of one request first. Lean on the building blocks you already know: load balancers, caching, CDNs, replication.
6. Deep dive and trade-offs
Section titled “6. Deep dive and trade-offs”Pick the part that breaks first under your estimated load and fix it: hot keys, the celebrity fan-out, connection limits, write amplification. Then state the trade-offs explicitly. This is the move that separates a senior answer from a junior one — every choice you made bought something (latency, simplicity, scale) and cost something (consistency, money, operational burden). Saying so out loud proves you understand the design rather than reciting it.
The roadmap: eight case studies
Section titled “The roadmap: eight case studies”This part contains eight designs. The four detailed in depth here, plus four companions in this same directory. Each reuses the framework above; together they cover the major archetypes you’ll meet.
| Case study | Archetype it teaches |
|---|---|
| Design a URL Shortener | Read-heavy KV store, unique ID generation, caching |
| Design a News Feed | Fan-out, the celebrity problem, ranking |
| Design a Chat System | Stateful connections, ordering, real-time delivery |
| Design a Rate Limiter | Distributed counters, algorithm trade-offs |
| Design a Notification System | Multi-channel fan-out, queues, retries |
| Design a Typeahead Autocomplete | Tries, prefix search, latency budgets |
| Design a Web Crawler | Frontier queues, politeness, dedup at scale |
| Design a Payment System | Idempotency, exactly-once, consistency & audit |
Read the four deep dives first. They establish patterns — caching the hot path, fanning out work, holding stateful connections, counting under contention — that the companion four recombine.
The thread
Section titled “The thread”What does this buy us, and what does it cost? Carry that question through every page. A framework is only useful if it makes the trade-offs visible: estimation buys you the right to choose, the API buys you a contract, the data model buys you predictable access, and the deep dive is where you pay — in consistency, in money, in complexity — for the scale you asked for. Master the six moves and any “design X” prompt becomes the same problem wearing a different hat.
Check your understanding
Section titled “Check your understanding”- Name the six moves of the framework in order. Why is jumping straight to the architecture diagram a mistake?
- What is the difference between functional and non-functional requirements, and why do the non-functional ones do most of the design work?
- Why is the read:write ratio the first number to establish?
- How do you turn “100M daily active users” into a peak write-QPS figure?
- Give an example of a design choice and state explicitly what it buys and what it costs.