How to Rack 30 Petabytes of Storage

This is a detailed case study of how an AI research lab built 30 petabytes of storage for video training data at 1/40th the cost of AWS. The core insight: ML training data is a commodity that doesn't need enterprise-grade reliability. Since losing any 5% of pretraining data has minimal impact, they don't need AWS's "13 nines" of reliability—2 nines works fine. This single requirement difference unlocks massive cost savings that cloud providers can't match because their pricing assumes everyone needs maximum redundancy.

The technical approach prioritized radical simplicity over features. Instead of using Ceph (which requires dedicated specialists) or MinIO (unnecessary S3 compatibility overhead), they wrote 200 lines of Rust for writes, nginx for reads, and SQLite for metadata tracking. No redundancy, no sharding, just XFS-formatted drives that roughly saturate their 100Gbps connection. Key lessons: frontloaders were a mistake (had to screw in 2,400 drives individually), daisy-chaining bottlenecked performance (should have given each chassis its own HBA), and having the datacenter blocks from their office was worth the premium for debugging access. They provide a complete replication guide including specific part numbers, vendor recommendations, and networking setup tips.

The broader implication: small teams can compete with frontier labs by questioning default assumptions about infrastructure. When your requirements genuinely differ from enterprise defaults, cloud pricing becomes irrational. The article includes full cost breakdowns ($17.5k/month recurring + $12k/month depreciation), failure modes they encountered, and a parts list for anyone wanting to replicate this. They even rate-limited Cloudflare's R2 metadata layer during large training runs, confirming their scale genuinely needed custom infrastructure.

How to Rack 30 Petabytes of Storage

TLDR

In Detail

TLDR

In Detail

Related