Design WhatsApp — The Senior+ Walkthrough

Jun 23, 2026

∙ Paid

This is the question that defines the real-time messaging archetype. Ten questions in the question bank are variants of it — Slack, Discord, Zoom, Twitter Spaces, iMessage, Facebook Messenger, Google Chat. They share a spine: a long-lived connection problem, a delivery semantics problem, and a presence problem. Get fluent with all three and you can answer any messaging interview you’re handed.

The surface question looks simple: two people want to send each other messages. The probe underneath is anything but. Interviewers use this question because it exposes how candidates think about distributed systems under failure — specifically what happens when the recipient is offline, when the network drops mid-message, when the same message arrives twice, and when a user has five devices. Those four failure modes, and your response to them, are where the round is decided.

For L4 / mid-level: the basic chat flow and message storage. For L5 / senior: delivery semantics under unreliable networks (at-most-once, at-least-once, exactly-once), presence at scale, and offline message handling. For L6 / staff: the multi-device consistency model, group message fan-out at WhatsApp’s scale (2 billion users), end-to-end encryption architecture, and what happens when a message is delivered to 256 group members across 12 time zones while half of them are offline.

The Question

“Design WhatsApp. Users should be able to send messages to each other in real time. Messages should be delivered even when the recipient is temporarily offline.”

Common variants:

“Design Facebook Messenger.”
“Design iMessage / Apple Messages.”
“Design Slack’s messaging core.”
“Design a real-time chat system.”
“Design the messaging layer for [our product].”

All five are the same architecture. Slack adds channels and threads; the delivery model is identical. iMessage adds end-to-end encryption; the transport is identical. Discord adds voice and server-based channels; the one-to-one messaging core is identical.

Step 1 — Clarify Before You Draw

Six questions. Two of them are load-bearing for the entire design.

1. One-to-one only, or also group chats? One-to-one is the simpler problem. Group chats add fan-out — sending one message to N recipients — which at WhatsApp’s scale means up to 256 recipients per group. State which you’re designing for. For L5+: design for both, but start with one-to-one and explicitly extend.

2. What are the delivery guarantees? This is the load-bearing question. Three options:

At-most-once: messages might be lost but never duplicated. Fine for ephemeral signals (typing indicators). Terrible for chat.
At-least-once: messages are definitely delivered but might be delivered more than once. The receiver handles deduplication. Acceptable for most chat systems.
Exactly-once: messages are delivered exactly once, regardless of retries. Theoretically ideal. In practice, you can approximate it with idempotency keys and deduplication at the receiver — but you cannot achieve it perfectly across an unreliable network. Saying “exactly-once is impossible in a distributed system” out loud is a senior signal.

WhatsApp and most production systems implement at-least-once delivery with client-side deduplication. State this explicitly.

3. What devices does each user have? Multiple devices per user (phone, tablet, web, desktop) massively complicates the model. A message sent to a user needs to land on all their active devices. On which device does the “read receipt” trigger? Clarify this early.

4. Offline handling — how long do you store messages for an offline recipient? If a recipient is offline for 30 days, do their messages accumulate on your servers? Most production systems store for 30 days and drop after that (with notification to sender). Clarify the retention window.

5. End-to-end encryption? E2EE (like WhatsApp’s Signal Protocol implementation) means the server cannot read message content — it only stores encrypted ciphertext and routing metadata. This changes the data model significantly. State whether you’re designing with E2EE or not. For L5: acknowledge it exists and where it fits. For L6+: explain the key exchange flow.

6. What scale? 2 billion monthly active users is WhatsApp’s actual scale. For the interview, pick a working number: 500M DAU, 100 billion messages per day (WhatsApp’s reported figure as of its scale). These numbers drive your storage and throughput estimates.

Step 2 — Estimate

Working assumptions:

500M daily active users
100 billion messages per day = ~1.16 million messages per second average, ~5M/sec peak
Average message size: 100 bytes (text only), 500 bytes with metadata
30-day message retention for offline recipients
1 billion messages stored at any moment (roughly 1% of daily volume is “pending delivery”)
Media (images, video, voice): handle separately via object storage (S3 / CDN pre-signed URLs); only metadata and a pointer lives in the chat system

Storage for messages: 100B messages/day × 500 bytes × 30 days retention ≈ 1.5 PB. This is your “hot” storage — messages waiting to be delivered or recently delivered. After 30 days, messages drop off the hot store. Delivered messages can be stored client-side and expunged from the server (this is exactly what WhatsApp does — server deletes after delivery confirmation).

Connection state: 500M users × ~30% concurrently connected = ~150M concurrent WebSocket connections. At 64KB per connection socket buffer that’s roughly 9.6 TB of connection memory across your fleet. This is one of the hardest scaling problems in messaging — the number of open connections.

Write throughput: 1.16M messages/sec average into a message store. This rules out single-node anything. You need a distributed message queue and a sharded persistence layer.

Get Access to GitHub Repo

Step 3 — API Design

Four endpoints. The real-time delivery is not REST — it’s a persistent connection.

WebSocket: wss://chat.whatsapp.com/connect

Auth: Bearer token in Upgrade request headers

Purpose: The long-lived connection for receiving messages and sending keep-alives. All message delivery happens here.

On connect: server sends any pending (undelivered) messages accumulated while the client was offline.

POST /v1/messages/send

Idempotency-Key: <client-generated UUID> ← THE PROBE

Body: {

to_user_id: string,

message_id: string, (client-generated, for dedup)

content_encrypted: bytes, (ciphertext if E2EE)

content_type: “text” | “image” | “audio” | “video”,

media_url: string, (pre-uploaded, optional)

client_timestamp: epoch_ms

}

Response: {

message_id: string,

server_timestamp: epoch_ms,

status: “queued” | “delivered”

}

POST /v1/messages/ack

Body: { message_id: string, ack_type: “delivered” | “read” }

Purpose: Explicit delivery and read acknowledgment from client.

Server relays read receipts back to sender.

GET /v1/users/:user_id/presence

Response: { status: “online” | “offline” | “last_seen_at”: timestamp }

Note: WhatsApp makes presence opt-out — users can hide last seen.

This is a privacy setting, not a system constraint.

The senior move on the API step: Name the Idempotency-Key and message_id in the same breath. The message_id is client-generated before the send — if the client retries because it didn’t receive an ack, the server detects the duplicate via message_id and does not re-queue. The Idempotency-Key is the HTTP-level dedup. Both are necessary: the HTTP key handles transport-level retries, the message_id handles application-level deduplication across devices and reconnections.

Saying that distinction out loud — two levels of deduplication for two different failure modes — is the L6 signal on this question.

Subscribe now to
→unlock complete system design walkthroughs
→ Get access to downloadable Drill Cards and full walkthrough

Continue reading this post for free, courtesy of System Design Roadmap.

Or purchase a paid subscription.

System Design Interview Roadmap