I taught a bucket to speak git | Tigris Object Storage
A developer pointed a pure-Go git implementation at object storage and built a stateless git server with no filesystem, no git binary, and no database—then had to fix the thousand ways filesystems lie to us about latency.
Read Original Summary used for search
TLDR
• Git's on-disk format (objects, trees, commits, refs) is mostly immutable and append-only—accidentally perfect for object storage
• The main disasters: atomic renames don't exist in S3, stat() calls exploded to 8,500+ requests for a 200KB repo, and packfile reads turned into megabytes of HTTP round trips
• Solutions exploited that packfiles are content-addressed and immutable (trivially cacheable), used Tigris's atomic rename extension, and collapsed 256 directory listings into one
• Result: a git server that runs in Kubernetes with zero local state, just temporary caches—challenging the assumption that git servers need mounted filesystems
• Post-receive hooks work by spinning up sandboxed containers with repo checkouts, enabling CI without traditional infrastructure
In Detail
The author built objgit by connecting go-git (a pure-Go git implementation) to Tigris object storage through billy, a filesystem abstraction layer. The insight: git's on-disk format is just four things—objects (compressed blobs), trees (folders), commits (pointers to trees), and refs (branch pointers). Most are immutable and append-only, which maps perfectly to object storage's fundamental model. The mutable parts (refs) are tiny files that object storage handles trivially.
The implementation revealed how filesystems hide latency costs. Git assumes stat() calls are microseconds, not the 10ms+ of object storage round trips. A simple 318-object repo made 8,500+ GetObject calls during clone because git randomly reads packfiles thousands of times—fine with page caching, catastrophic over HTTP. A 100,000 object push cost 200,000 storage calls (stat to check existence, then write). The SSH transport exploded packfiles into loose objects due to a deadlock workaround, making pushes take over 30 minutes. The listing cache was completely broken—matching recursive prefixes against the wrong root, burning thousands of background calls caching nothing.
The fixes exploited git's properties: packfiles are immutable and content-addressed (download to temp folder, cache with LRU), Tigris has atomic rename (one round trip vs copy-then-delete), and packfiles are self-delimiting (stop depending on EOF). The result is a stateless git server that runs in Kubernetes with only temporary caches—no mounted filesystem, no git binary, no database. It challenges the assumption that git servers must be stateful single points of failure, even at GitHub's scale.