Back-of-the-Envelope Estimation

Before you choose a database, a number of servers, or a caching strategy, you should be able to answer a cruder question first: roughly how big is this thing going to be? A few minutes of arithmetic — the kind you can do on the back of an envelope — tells you whether your data fits in RAM, whether you need one server or a thousand, and whether your idea is even physically possible. This skill is what separates “let’s just try it and see” from “this will need sharding from day one.”

The goal is never precision. It’s getting the order of magnitude right: is this 10 servers or 10,000? 1 GB or 1 PB? Being off by 2× is fine. Being off by 1000× is a career-defining outage.

The round numbers that make it fast

Estimation is fast only if you stop doing exact arithmetic. Memorize a few friendly approximations and everything becomes mental math.

Powers of two (data sizes):

Power	Exact-ish	Name	Use it for
2^10	~1 thousand	KB	a small record
2^20	~1 million	MB	a photo, a page of data
2^30	~1 billion	GB	RAM on a box
2^40	~1 trillion	TB	a disk
2^50	~10^15	PB	”big data”

Time (the single most useful trick):

   seconds in a day = 86,400  ≈  100,000  (round UP to 10^5)

Rounding 86,400 up to 100,000 makes per-day → per-second conversions trivial and builds in a small safety margin. Also handy: ~2.5 million seconds per month, ~30 million seconds per year.

Other anchors: a character ~1 byte; a typical short text record ~1 KB; a thumbnail ~10s of KB; a phone photo ~1–5 MB; a minute of video ~10s of MB.

The three quantities you almost always need

QPS (queries/requests per second) — drives how many app servers and how much database throughput you need.
Storage — drives your database/disk choice and cost, and grows over time (you must size for years, not launch day).
Bandwidth — drives network and CDN cost, and is just QPS × payload size.

The universal recipe:

   per-day count  ÷  ~100,000 sec/day   =   average QPS
   average QPS    ×  (3 to 10)          ≈   peak QPS   (traffic is bursty)
   QPS            ×  bytes per request  =   bandwidth
   daily new data ×  days retained      =   storage (then × replication factor)

A worked example: sizing a photo-upload service

Suppose we’re sketching a service like a photo-sharing app. We’re told (or we assume): 10 million daily active users, each uploading 2 photos/day, each photo ~1 MB after compression, and each photo is viewed ~20 times/day on average. Photos are kept 5 years. Let’s size it.

Write QPS (uploads):

   10M users × 2 photos = 20M uploads/day
   20M ÷ 100,000 sec   = 200 uploads/sec  (average)
   peak ≈ 200 × 5      = ~1,000 uploads/sec

Read QPS (views):

   20M photos × 20 views = 400M views/day
   400M ÷ 100,000        = 4,000 views/sec (average)
   peak ≈ 4,000 × 5      = ~20,000 views/sec

Immediately we’ve learned the most important architectural fact: reads outnumber writes ~20:1. This is a read-heavy system, which screams caching and a CDN — serving popular photos from edge locations instead of hitting origin storage 20,000 times a second.

Storage:

   20M photos/day × 1 MB        = 20 TB/day of new photos
   20 TB × 365 × 5 years        ≈ 36 PB   (5-year footprint)
   × ~3 for replication/backup  ≈ ~100 PB

That single number reframes the whole project: this is not a “put it in Postgres” problem, it’s an “object store (S3-style) plus a metadata database” problem. We learned it in ninety seconds.

Bandwidth (egress for views):

   peak 20,000 views/sec × 1 MB = 20 GB/sec out at peak

20 GB/s of egress is a massive number that confirms a CDN isn’t optional — it’s the core of the design and the core of the cost. (See why the network bytes dominate in Latency & the Numbers to Know.)

Why estimation guides design

Notice what the envelope just told us to build, before we’d named a single technology:

read-heavy (20:1) → cache + CDN are first-class, not afterthoughts
~100 PB of objects → blob store, not a relational DB, for the photos
20 GB/s egress at peak → edge delivery dominates architecture and cost
~1k write QPS vs ~20k read QPS → scale the read and write paths separately

Every one of those is a real architectural decision, derived from arithmetic, not opinion. That’s the power of estimation: it prunes the design space so you spend your detailed effort on the parts that the numbers say actually matter — like the availability targets you’ll need to hit for 36 PB of irreplaceable user photos.

The thread

What does estimation buy us, and what does it cost? It buys early, cheap clarity — the ability to kill an impossible design or spot the dominant cost on a whiteboard, before spending months building the wrong thing. It costs almost nothing (a few minutes), but it tempts you toward false confidence: an estimate is a hypothesis, not a measurement. Size with it, then validate with real load. The envelope tells you where to aim; production tells you whether you hit.

Check your understanding

Why round 86,400 seconds/day up to 100,000? What two benefits does the rounding give you?
Walk through converting “500 million events per day” into an average QPS, then into a peak QPS. State your peak-multiplier assumption.
In the photo example, why does the 20:1 read/write ratio change the architecture, and toward what?
Why is storage sized over years and multiplied by a replication factor, while bandwidth is sized per second?
An estimate says you need 100 PB. Why is the right next step “validate with real data,” not “order 100 PB of disk”?