Skip to content

Design a Notification System

Almost every product eventually needs to tell users things — “your order shipped,” “someone replied,” “your password changed.” A notification system is the plumbing that takes a single business event and reliably turns it into a push, an email, and an SMS, to the right people, without spamming them and without losing messages when a downstream provider has a bad day. It looks trivial (“just call the email API”) and is anything but. This walkthrough builds one from scratch.

Functional

  • Send notifications over multiple channels: mobile push (APNs/FCM), email, SMS, and in-app.
  • Support transactional (a password reset — must arrive) and bulk/marketing (a promo blast — can be dropped/deduped) sends.
  • Respect user preferences (opt-outs, per-channel, quiet hours) and rate limits per user.
  • Templating: a notification type + data renders into channel-specific content.
  • Be observable: who got what, when, and the delivery status.

Non-functional

  • Reliability first. At-least-once delivery; never silently drop a transactional message.
  • Scale: handle bursty fan-out — one “live event started” can target 10M users at once.
  • Latency: transactional within seconds; bulk can tolerate minutes.
  • Decoupling: a slow or down provider must not block the caller.

Say 50M daily active users, averaging 4 notifications/day → 200M notifications/day2,300/sec average. Bursts (a sports goal, a flash sale) spike to 100K+/sec for short windows. That ratio — ~40× peak-to-average — is the whole reason a queue exists: it absorbs spikes the providers can’t.

Storage: a delivery record of ~500 bytes × 200M/day ≈ 100 GB/day of status logs. We keep hot records ~30 days and roll the rest to cold storage.

POST /v1/notifications
{
"idempotency_key": "order-4412-shipped", // caller-supplied, dedups retries
"recipient": { "user_id": "u_991" },
"type": "order_shipped", // selects template + default channels
"channels": ["push", "email"], // optional override
"data": { "order_id": 4412, "eta": "Tue" },
"priority": "transactional"
}
→ 202 Accepted { "notification_id": "n_8c1f" } // accepted, not yet delivered
GET /v1/notifications/{id} → per-channel delivery status

We return 202 Accepted, not 200 — the work is queued, not done. The idempotency_key is the contract that lets the caller safely retry a timed-out request without sending twice.

notification id, user_id, type, priority, payload, created_at, status
delivery_attempt id, notification_id, channel, provider, status,
attempt_no, error, sent_at // one row per (channel × try)
user_preference user_id, channel, enabled, quiet_hours, locale
template type, channel, version, body // rendered per channel
device_token user_id, platform, token, active // for push fan-out

Splitting notification (the intent) from delivery_attempt (each physical try per channel) is the key modeling move: one logical notification can become three deliveries, each retried independently.

┌────────────────────────────────────────────┐
caller ──POST──► API / Ingest ──► dedup (idempotency key) │
│ │ │
│ writes notification row │
└──────────────────┼───────────────────────── ┘
┌──────────────────┐
│ FAN-OUT svc │ expand recipient → devices,
│ (prefs + rate │ drop opt-outs, split per channel
│ limit + render) │
└───────┬──────────┘
enqueue one job per (recipient × channel)
┌────────────────┬─────────┼─────────┬────────────────┐
▼ ▼ ▼ ▼ ▼
push queue email queue sms queue inapp queue (priority lanes)
│ │ │
push workers email workers sms workers ◄── pull, call provider,
│ │ │ record delivery_attempt
▼ ▼ ▼
APNs/FCM SES/etc Twilio/etc
│ │ │
└──── delivery receipts / webhooks ──► status updater ──► DB

The spine is a set of per-channel queues drained by stateless worker pools. See Message Queues for why this shape is the backbone of async systems. The queue buys us three things at once: it absorbs bursts (the 40× spike just deepens the queue instead of melting a provider), it isolates failures (SMS provider down → SMS queue backs up while push keeps flowing), and it lets workers scale independently per channel.

What does this buy us, and what does it cost? The async queue gives elasticity and fault isolation — but it costs us synchronous certainty. The caller no longer knows at request time whether delivery succeeded; they get a notification_id and must check status or trust the pipeline. We’ve traded “I know it sent” for “it will send.”

Fan-out. The dangerous operation is “notify 10M users.” Don’t expand that inline in the request. Accept the intent, then let the fan-out service expand it in the background, chunking recipients (e.g. 1,000 per job) so a single huge target becomes thousands of small, parallelizable jobs.

Retries with backoff. A worker that gets a 5xx or timeout re-enqueues with **exponential backoff

  • jitter** (1s, 4s, 16s…). After N attempts it lands in a dead-letter queue for inspection. Jitter matters: without it, all failed jobs retry in lockstep and hammer the recovering provider in a synchronized wave (a thundering herd).

Deduplication. Two sources of duplicates: (a) the caller retrying — caught by the idempotency_key, stored with a TTL so a repeat key within the window is a no-op; (b) our own at-least-once queue redelivering a job — caught by a per-attempt dedup key (notification_id:channel). This is Idempotency applied at two layers, and it’s what keeps “your card was charged” from arriving five times.

Rate limiting. Cap per-user sends (e.g. ≤ 5 marketing/day) with a token bucket keyed by user; respect quiet hours by deferring (not dropping) non-urgent messages. Marketing and transactional ride separate priority lanes so a promo blast can never delay a password reset.

Status & receipts. Providers report final delivery asynchronously via webhooks. A status updater consumes these and reconciles delivery_attempt rows, giving you the audit trail.

Decision Buys you Costs you
─────────────────────────────────────────────────────────────────────
Async queue + 202 burst absorption, no synchronous
failure isolation success confirmation
At-least-once delivery no lost transactionals must dedup everywhere
Per-channel queues independent scaling more moving parts
Priority lanes transactional never capacity planning per lane
starved by bulk
Fan-out in background huge targets don't eventual, not instant,
time out the request delivery for big blasts

The throughline: a notification system is mostly a reliability and fan-out problem wearing a “call the email API” costume. Get the queue, the retries, and the two layers of dedup right, and the rest is templating and config.

  1. Why does the ingest API return 202 Accepted instead of waiting for delivery, and what does the caller give up by accepting that?
  2. Name the two independent sources of duplicate notifications and which dedup mechanism stops each.
  3. Why are per-channel queues better than one shared queue when a single SMS provider goes down?
  4. What problem does jitter in retry backoff solve, and what goes wrong without it?
  5. Why expand a 10M-user fan-out in a background service rather than inside the original request?