Graceful Degradation vs. Fail Fast
Issue #118: System Design Interview Roadmap • Section 5: Reliability & Resilience
When Netflix Keeps Streaming Despite Chaos
Imagine you're binge-watching your favorite series when Netflix's recommendation engine crashes. Do you want the entire app to stop working, or would you prefer it continues playing videos while showing a simple "Recommended for You" placeholder? This scenario perfectly illustrates the fundamental choice every system architect faces: should components fail fast and loudly, or degrade gracefully to maintain core functionality?
What We'll Master Today
Graceful Degradation: Maintaining partial functionality when dependencies fail
Fail-Fast Principles: When immediate failure prevents bigger problems
Decision Framework: Choosing the right strategy for each system component
Implementation Patterns: Circuit breakers, fallbacks, and monitoring strategies
The Tale of Two Philosophies
Graceful Degradation says: "Keep the lights on, even if some features don't work." Your e-commerce site continues processing orders even when the recommendation engine is down—customers can still buy, just without personalized suggestions.
Fail Fast says: "Stop immediately when something's wrong to prevent corruption." Your payment service refuses all transactions the moment it detects database inconsistency—better to block payments than process them incorrectly.
When to Choose Each Strategy


