Design Netflix Streaming — The Senior+ Walkthrough

Jun 30, 2026

∙ Paid

This is the question that defines the

The surface question sounds manageable. Netflix streams video to 270 million subscribers. How hard can it be to store a file and serve it? The probe underneath is what makes this question a senior filter. The interviewer is not testing whether you know what a CDN is. They are testing whether you understand why naive HTTP file serving collapses at scale, what adaptive bitrate encoding is and why it exists, how a CDN edge network makes cold-start latency irrelevant, and what happens when the most-watched show in the world premieres at 9 PM and 50 million people try to start the same episode simultaneously.

Those four problems — not the basic architecture — are where the round is decided.

For L4 / mid-level: the content storage and delivery pipeline at a high level. For L5 / senior: adaptive bitrate encoding with multiple renditions, CDN edge caching with pre-positioning, and the cold-start problem. For L6 / staff: the encoding pipeline at scale, multi-region active-active CDN strategy, the playback session architecture, quality-of-experience monitoring, and what Netflix actually does differently from a naive CDN deployment.

The Question

“Design Netflix. Users should be able to browse a catalog of movies and TV shows and stream them on demand with high quality and low buffering.”

Common variants:

- “Design YouTube’s video streaming system.”
- “Design Spotify’s audio streaming.”
- “Design Twitch’s live streaming platform.”
- “Design an adaptive bitrate streaming system.”
- “Design the video encoding pipeline for a platform like Netflix.”

The first two are nearly identical to Netflix. Twitch adds a real-time constraint (live, not on-demand) that changes the delivery model but not the encoding fundamentals. Spotify replaces video with audio — smaller files, same adaptive delivery architecture. The encoding pipeline variant focuses specifically on Phase 2 below.

Step 1 — Clarify Before You Draw

Six questions. The first three are load-bearing.

1. On-demand only, or also live streaming? On-demand (Netflix, Spotify) and live (Twitch, live sports) have meaningfully different architectures. On-demand content can be pre-encoded, pre-cached at CDN edges, and optimized offline. Live content has a latency requirement (viewers want near-real-time) that forces a different delivery model. Clarify and scope to on-demand unless told otherwise.

2. What devices must be supported? Smart TVs, mobile (iOS/Android), web browsers, game consoles. This matters because different devices support different codecs (H.264, H.265/HEVC, AV1, VP9) and different DRM systems (Widevine, FairPlay, PlayReady). Acknowledge the codec diversity; don’t try to design the full DRM system in the interview — it’s a separate subsystem.

3. What are the quality and latency SLOs? Netflix targets: playback starts within 2 seconds of pressing play. Buffering events under 0.5% of total play time. 4K HDR delivery available on supported devices. These numbers drive every architecture decision — state them explicitly before drawing anything.

4. What is the catalog size? Netflix has approximately 36,000 titles globally (varies by region). Each title is stored in multiple renditions and formats. State the working assumption: 36,000 titles × 20 renditions × ~4 GB per rendition = ~3 PB of encoded video. This estimate calibrates the storage and CDN design.

5. What scale? 270 million subscribers. Peak concurrent streams: approximately 15–20% of active users simultaneously — roughly 40–50 million concurrent streams. At peak (new season premiere, weekend evening), this spikes significantly. State the peak assumption explicitly.

6. Does the design include recommendations, search, and the catalog UI? These are separate systems (ML recommendation engine, search index, catalog metadata API). Acknowledge they exist; scope the interview to the streaming pipeline — content ingestion, encoding, storage, and delivery. Otherwise you’ll spend 30 minutes on recommendation and never get to the hard part.

Step 2 — Estimate

Working assumptions:

- 36,000 titles × 20 renditions per title = 720,000 encoded files
- Average rendition size: 4 GB (HD, 2-hour movie)
- Total encoded storage: 720,000 × 4 GB ≈ 3 PB of video content
- Metadata storage (titles, posters, descriptions, subtitles): ~5 TB — trivial by comparison - Peak concurrent streams: 50 million
- Average bitrate served: 5 Mbps (mix of HD and 4K streams)
- Peak egress bandwidth: 50M streams × 5 Mbps = 250 Tbps

That 250 Tbps peak egress number is the most important figure in the whole estimate. Netflix is one of the largest sources of internet traffic on the planet — at peak, it accounts for roughly 15% of global downstream internet bandwidth. No single origin infrastructure serves this. CDN edge servers distributed globally are the only viable answer. State this conclusion explicitly from the estimate.

Content ingestion rate: Netflix adds roughly 500–1,000 hours of new content per month. At 20 renditions per title, that’s ~10,000–20,000 encoding jobs per month — a background batch process, not a latency-sensitive path.

Get Access to GitHub Repo

Step 3 — API Design

Four APIs. The streaming API is the non-obvious one.

GET /v1/catalog

Query: page_token, limit, genre?, region

Response: { titles: [...], next_page_token }

Note: region-filtered — different catalogs per country (licensing)

GET /v1/titles/:title_id

Response: {

title_id, name, description, genres, cast,

available_resolutions: [”4K”, “1080p”, “720p”, “480p”],

subtitles: [{ language, url }],

poster_url, trailer_url

}

POST /v1/playback/start

Body: {

title_id: string,

content_id: string, (specific episode or movie version)

device_type: string,

supported_codecs: [”H264”, “H265”, “AV1”],

network_speed_mbps: float, ← client measures and reports

drm_system: “widevine” | “fairplay” | “playready”,

resume_position_seconds: integer

}

Response: {

session_id: string,

manifest_url: string, ← THE KEY RESPONSE FIELD

license_token: string, (for DRM decryption)

cdn_edge_url: string,

initial_quality: “1080p”,

heartbeat_interval_seconds: 30

}

POST /v1/playback/heartbeat

Body: {

session_id: string,

position_seconds: integer,

current_quality: string,

buffer_health_seconds:

float,

rebuffer_events: integer,

bandwidth_estimate_mbps: float

}

Response: { continue: true, quality_recommendation: “720p” }

The senior move here: The manifest_url in the playback start response is the key design signal. You are not returning a video file URL — you are returning a manifest file URL. This distinction is the entire adaptive bitrate streaming architecture. Explain it before moving on.

The heartbeat endpoint is the second senior signal. Netflix does not stream and forget. Every 30 seconds, the player reports back: current position, current quality, buffer health, rebuffer events, estimated bandwidth. This telemetry is the basis for quality-of-experience monitoring, A/B testing new streaming algorithms, and detecting degraded CDN nodes before users notice. Candidates who include the heartbeat signal they’ve thought about the operational layer, not just the happy path.

Step 4 — Data Model

Five stores. Video content itself is not in any of these — it lives in object storage.

titles table — catalog metadata

title_id, title, description, genres[], cast[], director,

release_year, rating, available_regions[], duration_seconds,

thumbnail_url, created_at, updated_at

Sharded by title_id. Regional availability (available_regions[]) drives catalog filtering — a subscriber in Germany sees a different catalog than one in the US because Netflix licenses content per territory.

content_assets table — encoded renditions index

content_id, title_id, rendition_type (4K/1080p/720p/480p/360p),

codec (H264/H265/AV1), bitrate_kbps, file_size_bytes,

storage_uri (S3 path), manifest_uri, duration_seconds,

encoding_status (pending/processing/ready), created_at

The manifest_uri points to the HLS or DASH manifest file in S3 — the index of all segments for this rendition. When the player starts playback, it downloads the manifest first, then requests individual segments.

playback_sessions table — active and recent sessions session_id, user_id, content_id, device_type, started_at,

last_heartbeat_at, current_position_seconds, current_quality,

total_play_time_seconds, rebuffer_count, total_rebuffer_seconds

Not stored in your OLTP database. High-frequency writes (heartbeat every 30 seconds, 50M concurrent sessions = ~1.7M writes/sec). Use Cassandra or a time-series store. Partition by user_id so “resume where I left off” is a single-partition read. Short TTL — you only need the last 30 days of session data for analytics; older data goes to cold storage.

user_watch_history — resume position and completion user_id, title_id, content_id, position_seconds, completed (boolean), watched_at, device_type

This is what powers “continue watching.” Separate from the playback session — a session is live, watch history is the permanent record. Partition by user_id.

CDN content index — not a database table

The CDN’s internal index of what content is cached at which edge location. You don’t design this — it’s the CDN’s internal state. What you do design is the pre-positioning logic: before a major content release (new season of a hit show), Netflix pre-pushes the encoded files to CDN edge servers before playback demand arrives. This is called content pre-warming and it is one of the most important operational patterns in the whole system.

Preparing for a distributed systems interview?
→Download the free Interview Pack
→ Subscribe now to access source code repository - 200 + coding lessons

Continue reading this post for free, courtesy of System Design Roadmap.

Or purchase a paid subscription.

System Design Interview Roadmap