Minions: Stripe's one-shot, end-to-end coding agents
Stripe's fully unattended coding agents merge 1,000+ PRs per week with zero human-written code by tightly integrating with their existing developer infrastructure rather than relying on generic off-the-shelf tools.
Read Original Summary used for search
TLDR
• Minions go from Slack message to ready-to-merge PR with no human interaction, enabling engineers to parallelize tasks by spinning up multiple agents simultaneously
• Built custom because generic agents fail on Stripe's scale: hundreds of millions of lines of Ruby/Sorbet code with custom libraries moving $1T+ in payments
• Run in isolated devboxes (10-second spin-up) connected to 400+ MCP tools via internal "Toolshed" server for context gathering across docs, tickets, CI, and code intelligence
• "Shift feedback left" architecture: automated local lints in <5 seconds, then at most 2 selective CI runs with auto-applied test fixes to balance speed vs completeness
• Core philosophy: "if it's good for humans, it's good for LLMs"—agents use the same developer tooling and productivity infrastructure as human engineers
In Detail
Stripe built Minions because generic coding agents can't handle the complexity of their production codebase. While LLMs excel at greenfield projects, Stripe's hundreds of millions of lines of Ruby/Sorbet code with custom libraries and $1T+ payment volume requires sophisticated mental models that off-the-shelf tools can't develop within context window constraints. Their solution: build agents that integrate tightly with the same developer productivity infrastructure human engineers use.
The architecture mixes LLM creativity with deterministic guardrails. Minions run on forked goose agents in isolated devboxes that spin up in 10 seconds, connected to 400+ MCP tools via an internal "Toolshed" server for gathering context from docs, tickets, CI systems, and Sourcegraph code intelligence. The orchestration interleaves agent loops with required steps like linters and tests. Agent rules are conditionally applied by subdirectory rather than globally, since blanket rules don't scale. The system "shifts feedback left" by running automated local lints in under 5 seconds, then allowing at most two CI runs with selective test execution and auto-applied fixes—balancing speed against diminishing returns from multiple full CI loops.
Engineers invoke Minions from Slack, CLI, web UI, or directly from internal tools like docs platforms and feature flag systems. CI automatically creates tickets for flaky tests with one-click Minion fixes. The result: engineers parallelize work by spinning up multiple agents simultaneously, particularly useful during on-call rotations. While the North Star is zero human code, even incomplete runs provide excellent starting points for focused human work.