Design a Notification System

Almost every product eventually needs to tell users things — “your order shipped,” “someone replied,” “your password changed.” A notification system is the plumbing that takes a single business event and reliably turns it into a push, an email, and an SMS, to the right people, without spamming them and without losing messages when a downstream provider has a bad day. It looks trivial (“just call the email API”) and is anything but. This walkthrough builds one from scratch.

1. Requirements

Functional

Send notifications over multiple channels: mobile push (APNs/FCM), email, SMS, and in-app.
Support transactional (a password reset — must arrive) and bulk/marketing (a promo blast — can be dropped/deduped) sends.
Respect user preferences (opt-outs, per-channel, quiet hours) and rate limits per user.
Templating: a notification type + data renders into channel-specific content.
Be observable: who got what, when, and the delivery status.

Non-functional

Reliability first. At-least-once delivery; never silently drop a transactional message.
Scale: handle bursty fan-out — one “live event started” can target 10M users at once.
Latency: transactional within seconds; bulk can tolerate minutes.
Decoupling: a slow or down provider must not block the caller.

2. Estimation

Say 50M daily active users, averaging 4 notifications/day → 200M notifications/day ≈ 2,300/sec average. Bursts (a sports goal, a flash sale) spike to 100K+/sec for short windows. That ratio — ~40× peak-to-average — is the whole reason a queue exists: it absorbs spikes the providers can’t.

Storage: a delivery record of ~500 bytes × 200M/day ≈ 100 GB/day of status logs. We keep hot records ~30 days and roll the rest to cold storage.

3. API sketch

POST /v1/notifications
{
  "idempotency_key": "order-4412-shipped",   // caller-supplied, dedups retries
  "recipient": { "user_id": "u_991" },
  "type": "order_shipped",                     // selects template + default channels
  "channels": ["push", "email"],               // optional override
  "data": { "order_id": 4412, "eta": "Tue" },
  "priority": "transactional"
}
→ 202 Accepted { "notification_id": "n_8c1f" }   // accepted, not yet delivered

GET /v1/notifications/{id}     → per-channel delivery status

We return 202 Accepted, not 200 — the work is queued, not done. The idempotency_key is the contract that lets the caller safely retry a timed-out request without sending twice.

4. Data model

notification     id, user_id, type, priority, payload, created_at, status
delivery_attempt id, notification_id, channel, provider, status,
                 attempt_no, error, sent_at        // one row per (channel × try)
user_preference  user_id, channel, enabled, quiet_hours, locale
template         type, channel, version, body      // rendered per channel
device_token     user_id, platform, token, active  // for push fan-out

Splitting notification (the intent) from delivery_attempt (each physical try per channel) is the key modeling move: one logical notification can become three deliveries, each retried independently.

5. High-level design

                       ┌────────────────────────────────────────────┐
   caller ──POST──►  API / Ingest  ──► dedup (idempotency key)       │
                       │                  │                           │
                       │            writes notification row          │
                       └──────────────────┼───────────────────────── ┘
                                          ▼
                                 ┌──────────────────┐
                                 │   FAN-OUT svc    │  expand recipient → devices,
                                 │ (prefs + rate    │  drop opt-outs, split per channel
                                 │  limit + render) │
                                 └───────┬──────────┘
                       enqueue one job per (recipient × channel)
              ┌────────────────┬─────────┼─────────┬────────────────┐
              ▼                ▼          ▼         ▼                ▼
          push queue      email queue  sms queue  inapp queue   (priority lanes)
              │                │          │
        push workers     email workers  sms workers   ◄── pull, call provider,
              │                │          │                record delivery_attempt
              ▼                ▼          ▼
            APNs/FCM        SES/etc     Twilio/etc
              │                │          │
              └──── delivery receipts / webhooks ──► status updater ──► DB

The spine is a set of per-channel queues drained by stateless worker pools. See Message Queues for why this shape is the backbone of async systems. The queue buys us three things at once: it absorbs bursts (the 40× spike just deepens the queue instead of melting a provider), it isolates failures (SMS provider down → SMS queue backs up while push keeps flowing), and it lets workers scale independently per channel.

What does this buy us, and what does it cost? The async queue gives elasticity and fault isolation — but it costs us synchronous certainty. The caller no longer knows at request time whether delivery succeeded; they get a notification_id and must check status or trust the pipeline. We’ve traded “I know it sent” for “it will send.”

6. Scaling and deep dives

Fan-out. The dangerous operation is “notify 10M users.” Don’t expand that inline in the request. Accept the intent, then let the fan-out service expand it in the background, chunking recipients (e.g. 1,000 per job) so a single huge target becomes thousands of small, parallelizable jobs.

Retries with backoff. A worker that gets a 5xx or timeout re-enqueues with **exponential backoff

jitter** (1s, 4s, 16s…). After N attempts it lands in a dead-letter queue for inspection. Jitter matters: without it, all failed jobs retry in lockstep and hammer the recovering provider in a synchronized wave (a thundering herd).

Deduplication. Two sources of duplicates: (a) the caller retrying — caught by the idempotency_key, stored with a TTL so a repeat key within the window is a no-op; (b) our own at-least-once queue redelivering a job — caught by a per-attempt dedup key (notification_id:channel). This is Idempotency applied at two layers, and it’s what keeps “your card was charged” from arriving five times.

Rate limiting. Cap per-user sends (e.g. ≤ 5 marketing/day) with a token bucket keyed by user; respect quiet hours by deferring (not dropping) non-urgent messages. Marketing and transactional ride separate priority lanes so a promo blast can never delay a password reset.

Status & receipts. Providers report final delivery asynchronously via webhooks. A status updater consumes these and reconciles delivery_attempt rows, giving you the audit trail.

7. Key trade-offs

Decision                  Buys you                  Costs you
─────────────────────────────────────────────────────────────────────
Async queue + 202         burst absorption,         no synchronous
                          failure isolation         success confirmation
At-least-once delivery    no lost transactionals    must dedup everywhere
Per-channel queues        independent scaling        more moving parts
Priority lanes            transactional never        capacity planning per lane
                          starved by bulk
Fan-out in background     huge targets don't         eventual, not instant,
                          time out the request       delivery for big blasts

The throughline: a notification system is mostly a reliability and fan-out problem wearing a “call the email API” costume. Get the queue, the retries, and the two layers of dedup right, and the rest is templating and config.

Check your understanding

Why does the ingest API return 202 Accepted instead of waiting for delivery, and what does the caller give up by accepting that?
Name the two independent sources of duplicate notifications and which dedup mechanism stops each.
Why are per-channel queues better than one shared queue when a single SMS provider goes down?
What problem does jitter in retry backoff solve, and what goes wrong without it?
Why expand a 10M-user fan-out in a background service rather than inside the original request?