System Design, From First Principles
A first-principles guide to designing real systems. We build understanding in dependency order — each topic only uses ideas already covered. For every decision, keep asking one question: what does this buy us, and what does it cost? Every architecture choice is a trade-off; this book is about seeing both sides. Read in order, and after each page answer the Check your understanding questions in your own words. Depth over speed.
Part 0 · Foundations — Why Systems Are Hard
- Overview — Why Systems Are Hard
- The Fallacies of Distributed Computing
- Latency, Throughput & the Numbers to Know
- Back-of-the-Envelope Estimation
- Availability, SLAs & the Nines
- The CAP Theorem (Intuition)
Part 1 · Core Building Blocks
- Overview
- DNS & Request Routing
- Load Balancers
- Caching
- Content Delivery Networks (CDNs)
- Databases: A Field Guide
- Message Queues
- Reverse Proxies & API Gateways
Part 2 · Data: Storage & Retrieval
- Overview
- SQL vs NoSQL
- Indexing (B-Tree vs LSM)
- Replication
- Partitioning & Sharding
- Transactions & ACID
- Data Modeling
Part 3 · Scaling & Performance
- Overview
- Vertical vs Horizontal Scaling
- Statelessness & Sessions
- Caching Strategies
- Read Replicas & CQRS
- Database Scaling Patterns
- Finding Performance Bottlenecks
Part 4 · Distributed Systems Theory
- Overview
- CAP & PACELC
- Consistency Models
- Consensus: Raft & Paxos
- Time, Clocks & Ordering
- Idempotency
- Leader Election & Coordination
Part 5 · Communication & APIs
- Overview
- REST APIs
- RPC & gRPC
- GraphQL
- Synchronous vs Asynchronous Messaging
- Real-Time: Polling, WebSockets & SSE
- Event-Driven Architecture
Part 6 · Reliability & Resilience
- Overview
- Redundancy & Failover
- Timeouts, Retries & Backoff
- Circuit Breakers & Bulkheads
- Rate Limiting
- Graceful Degradation & Load Shedding
- Disaster Recovery (RPO/RTO)
Part 7 · Observability & Operations
Part 8 · Security & Trust Boundaries
- Overview
- Authentication
- Authorization
- OAuth & JWT
- Encryption in Transit & at Rest
- Secrets Management
- Threat Modeling & Abuse Prevention
Part 9 · Case Studies — Designing Real Systems
The “design X” track — full walkthroughs with the reasoning and trade-offs at every step.
- Overview — How to Approach a Design
- Design a URL Shortener
- Design a News Feed
- Design a Chat System
- Design a Rate Limiter
- Design a Notification System
- Design a Typeahead / Autocomplete
- Design a Web Crawler
- Design a Payment System
Part 10 · Advanced & Rare Concepts
The senior-level track — the subtle mechanics and failure modes most engineers learn the hard way.
- Overview — Advanced & Rare
- Consistent Hashing
- Vector Clocks & CRDTs
- The Dual-Write Problem & the Outbox Pattern
- Exactly-Once Semantics (and the Myth)
- Hot Partitions & the Celebrity Problem
- Tail Latency & p99
- Backpressure & Flow Control
- Probabilistic Data Structures
- Cache Stampede & the Thundering Herd