← Bookmarks 📄 Article

How Vercel Cut Build Wait Times From 90 Seconds To 5

Vercel achieved an 18x speedup by accepting a harder constraint—treating every customer build as potentially malicious—then building from scratch with Firecracker microVMs instead of containers.

· infrastructure
Read Original
Summary used for search

• Containers provide weak isolation (shared kernel); microVMs provide strong isolation (separate kernels) at near-container speed—Firecracker boots in 125ms vs 30-60s for traditional VMs
• The 90s→5s drop came from three stacked optimizations: image caching + block device snapshotting (cut 45s), warm pools of pre-booted cells (eliminated wait for common case), and Firecracker's baseline speed
• Vercel's architecture assumes hostile multi-tenancy: each build runs in an ephemeral "cell" (one Firecracker microVM + one container), destroyed after every build to prevent state leakage between customers
• Warm pools are expensive (paying for idle compute) but necessary—the tradeoff is between wasted capacity and tail latency during traffic spikes
• Building Hive from primitives instead of using Kubernetes gave Vercel leverage to ship features like enhanced build machines and Secure Compute, but required massive engineering investment

Vercel's Hive platform rests on a single foundational assumption: treat every customer build as potentially malicious code running on shared hardware. This hostile multi-tenancy constraint ruled out standard container orchestration. Containers share a kernel, so a kernel exploit in one customer's build could reach every other build on the same machine. Traditional VMs provide separate kernels but take 30-60 seconds to boot—too slow for ephemeral 2-minute builds. Vercel chose Firecracker microVMs, which boot in 125ms and use only a few megabytes of memory while providing VM-level isolation enforced by CPU virtualization features. Each build runs in a "cell" (one Firecracker process managing one microVM containing one container), with strict 1:1 mapping and complete destruction after every build to prevent state leakage.

The 18x speedup came from three compounding optimizations on top of this foundation. First, faster cold starts: caching the build container image locally (saving 45 seconds) and using block device snapshotting to start from a known-good disk image instead of building from scratch. Second, warm pools: keeping pre-booted cells idle and waiting, so most builds start immediately instead of waiting 5 seconds for provisioning. Third, Firecracker's baseline speed—traditional VMs would make warm pools impractical at scale since the pool would need to be enormous. The warm pool is the primary source of the speedup but comes with real cost: Vercel pays for compute doing no useful work, constantly balancing waste (too many idle cells) against tail latency (too few cells during traffic spikes).

Building Hive from primitives instead of using Kubernetes or ECS required enormous engineering investment and ongoing maintenance burden. The payoff is leverage: Vercel can make decisions like destroying every cell after every build, tuning warm pools based on customer patterns, and shipping features like enhanced build machines and Secure Compute without fighting someone else's platform constraints. The lesson isn't "use microVMs"—it's that threat model drives architecture. Cooperative tenants can use containers; adversarial tenants at scale require microVMs or sandboxed runtimes. Vercel got faster by accepting the harder problem first, then optimizing for speed on top of the right foundation.