Secret Management in Production: Vault, KMS, and Rotation Strategies
Introduction
A team ships a microservice. Six months later, a security scan finds a PostgreSQL password buried in git history — committed in a
.envfile, pushed before the.gitignorewas set up. The password rotated to nothing in production, but three developers still have the original credential memorized. That’s not a hypothetical; it’s how breaches start. Secret management is the infrastructure that sits between “we have credentials” and “those credentials can never leak, expire gracefully, and rotate without a deployment.”
The Three-Layer Hierarchy
Secret management operates across three distinct layers, and conflating them causes architectural mistakes.
KMS (Key Management Service) — AWS KMS, GCP Cloud KMS, Azure Key Vault HSM — manages cryptographic keys only. It does not store your database passwords. Its job is to encrypt and decrypt other keys. You call KMS to wrap a key; KMS never sees your actual application secrets.
Secret Store (HashiCorp Vault, AWS Secrets Manager, GCP Secret Manager) — manages secret lifecycle: storage, access control, rotation, and auditing. Vault encrypts secrets at rest using an internal barrier key. That barrier key is itself encrypted by KMS. Vault without KMS integration requires a manual unseal ceremony (covered below).
Application Layer — consumes secrets via the Vault SDK, environment injection at container start, or a Vault Agent sidecar. The right choice depends on your rotation requirements.
Envelope Encryption
KMS introduces a pattern called envelope encryption, which solves a fundamental problem: you can’t send a 10 GB database to KMS to encrypt — KMS API requests have a 4 KB payload limit, and the cost would be enormous.
Instead: (1) generate a random 256-bit Data Encryption Key (DEK) locally, (2) encrypt your data with the DEK using AES-256-GCM locally, (3) send only the DEK to KMS to encrypt using your master Customer Master Key (CMK), receiving an Encrypted DEK (EDEK) back, (4) store the EDEK alongside the ciphertext.
To decrypt: call KMS with the EDEK, receive the DEK, decrypt locally. AWS KMS charges $0.03 per 10,000 API calls. At 1,000 decrypt operations per second, that’s ~$260/month — manageable, but worth metering.
The critical benefit: rotating encryption keys means re-encrypting only the DEK (a few bytes), not re-encrypting all data. Shopify uses exactly this pattern for PCI-DSS compliance — DEK rotation is a millisecond operation regardless of database size.
HashiCorp Vault Architecture
Vault’s core is a cryptographic barrier. All data written to storage crosses this barrier and gets encrypted. The barrier key is split using Shamir’s Secret Sharing: with a 5-of-3 configuration, five key shares are generated at initialization and any three are required to unseal. On restart, Vault starts sealed — it cannot serve any requests until unsealed.
Inside the barrier, Vault has three primitives:
Auth Methods: How identities are verified. Kubernetes JWT, AWS IAM, AppRole, LDAP. The Kubernetes auth method is the dominant choice for cloud-native: pods present a bound service account token, and Vault validates it against the Kubernetes API.
Secrets Engines: Plugins that generate secrets. KV v2 stores static secrets with versioning. The database engine generates dynamic credentials. The PKI engine issues X.509 certificates.
Policies: HCL rules mapping identity paths to capabilities (
read,write,create,delete,list).
Dynamic Secrets: The Core Value Proposition
Static secrets in KV have a fundamental problem: they exist until you explicitly rotate them. An attacker who exfiltrates a static credential has indefinite access.
Dynamic secrets invert this model. When an app requests a PostgreSQL credential from Vault’s database engine, Vault connects to PostgreSQL, executes CREATE USER vault_<random> WITH PASSWORD '<random>', grants permissions per the configured role, and returns the credential with a lease TTL (e.g., 1 hour). When the TTL expires — or when explicitly revoked — Vault executes DROP USER vault_<random>. The credential was useful for exactly its lifetime.
Every dynamic credential is unique per requester, per request. A compromised credential from Pod A cannot be used by Pod B, and it self-destructs at TTL.
Critical Insights
Static secrets in environment variables are a rotation anti-pattern. A secret injected at container start cannot be rotated without restarting the container. Worse, env vars are readable by any code in the process — including third-party SDKs. Use Vault Agent with templated files instead: the agent rewrites a credentials file on rotation, and the app watches the file for changes.
Auto-unseal via KMS changes the security threat model. Traditional unsealing requires multiple key holders to physically present their key shares — a ceremony analogous to nuclear launch authorization. Auto-unseal (KMS wraps the barrier key) is operationally convenient and necessary for HA deployments, but it means control of the KMS CMK = control of Vault. Document this dependency in your threat model and restrict CMK access rigorously.
Lease renewal storms are a hyperscale failure mode. If 50,000 pods start simultaneously — as happens after a cluster-wide deployment or a mass restart — all their 1-hour leases are issued within seconds of each other. At the 30-minute mark, all 50,000 pods attempt lease renewal simultaneously. Vault’s Raft FSM processes renewals serially. Solution: set renewal trigger at 70% TTL plus ±(TTL * 0.1 * random()) jitter. Vault Agent handles this automatically.
The “secret zero” problem — how an app authenticates to Vault without a pre-shared secret — is solved by platform identity. Kubernetes workload identity tokens, AWS IAM role assumption, and GCP service account keys all establish identity without an initial credential. AppRole (Vault’s own auth method) still requires bootstrapping a RoleID and SecretID through an external mechanism; use it only when platform identity isn’t available.
Vault namespace isolation (Enterprise feature) allows teams to operate independent Vault instances within a shared cluster. Each namespace has its own auth methods, secrets engines, and policies. A credential leak in the payments namespace cannot access secrets in the ml-training namespace. Lyft adopted this pattern to enforce blast radius boundaries across 300+ microservices.
Raft leader elections introduce a 1–3 second unavailability window when the active Vault node fails. Applications must implement retry logic with exponential backoff. A naive app that fails immediately on a 503 from Vault will shed the Vault HA benefit entirely.
Real-World Examples
Netflix manages secrets across 100,000+ microservices using Vault with a Spinnaker pipeline integration that injects Vault tokens at deploy time. Their PKI engine issues 4-hour TLS certificates, eliminating certificate revocation list (CRL) maintenance — short TTLs make revocation unnecessary, since a compromised certificate expires before it can be significantly abused.
Shopify combines AWS KMS envelope encryption for data-at-rest with Vault for runtime secret injection. Their database credentials are dynamic, generated per-deploy with a 24-hour TTL aligned to their deployment cycle. Critically, they pin credential TTL to slightly longer than their P99 deployment duration to prevent mid-deploy credential expiry.
Square uses Vault’s transit secrets engine as a shared encryption-as-a-service layer. Rather than distributing encryption keys to individual services, services send plaintext to Vault’s transit engine and receive ciphertext back — Vault never stores the data, and the encryption key never leaves Vault’s barrier.
Architectural Considerations
GitHub Link
https://github.com/sysdr/sdir/tree/main/Secret_Management_in_Production/vault-secrets-demoMonitor Vault’s /v1/sys/health for sealed status, vault.token.ttl metric for approaching token expirations, and lease creation/revocation rates for anomalies. An alert on “lease creation rate drops to zero” catches Vault outages before apps notice.
Cost considerations: KMS API calls, Vault Enterprise licensing (~$30k/year/cluster), and operational overhead of managing Vault HA. For smaller teams, AWS Secrets Manager at $0.40/secret/month with automatic rotation built in can be more cost-effective than operating Vault.
Do not put non-sensitive configuration in Vault. Feature flags, timeout values, and service URLs belong in a config store (etcd, Consul KV, LaunchDarkly). Vault is optimized for secrets — its audit logging, encryption, and access control add overhead inappropriate for high-frequency configuration reads.
Practical Takeaway
Run bash setup.sh to deploy a complete secret management stack: HashiCorp Vault, PostgreSQL, and a Node.js service with a real-time dashboard. The demo shows KV v2 secrets with versioning, dynamic PostgreSQL credentials with live TTL countdowns, lease revocation, and a simulated rotation event stream.
Specifically: watch a dynamic credential get created, connect to PostgreSQL with it directly, then watch Vault revoke the DB user when the lease expires — the credential becomes invalid in real time, no deployment required.
After the demo, explore extending it with Vault Agent sidecar injection (add vault_agent service to the compose file), or enable the PKI engine to issue short-lived TLS certificates. Both patterns are production-standard at major tech companies and directly relevant to Staff+ system design interviews where secret lifecycle management and zero-trust credential issuance are increasingly common topics.
Run bash cleanup.sh to remove all containers and volumes when finished.


