What I learned building an opinionated and minimal coding agent

• Built pi with 4 tools (read/write/edit/bash) and <1000 token system prompt vs Claude Code's 10k+ tokens—benchmarks show minimal approach works just as well
• No security theater: runs in full YOLO mode because if an agent can write and execute code, safety rails are pointless anyway
• Rejected MCP servers (21 tools, 13.7k tokens) in favor of CLI tools with READMEs—agent only pays token cost when it needs the tool (progressive disclosure)
• Built unified LLM API (pi-ai) handling cross-provider quirks, context handoffs between models mid-session, and proper abort support throughout the pipeline
• Use tmux instead of background bash, markdown files instead of plan mode, bash-spawned instances instead of sub-agents—composable Unix tools beat custom abstractions

The author built pi, a minimal coding agent harness, after watching Claude Code evolve from a simple tool into a "spaceship with 80% of functionality I have no use for." The core thesis: existing coding agents are over-engineered, and what matters most is context engineering and full observability—not feature count. pi proves this with Terminal-Bench 2.0 results that place it competitively against commercial tools despite using just 4 tools (read, write, edit, bash) and a system prompt under 1000 tokens.

The technical implementation spans three packages. pi-ai is a unified LLM API that handles the messy reality of multiple providers (Anthropic, OpenAI, Google, xAI, etc.)—each with different quirks around reasoning traces, token reporting, and tool calling. It supports context handoffs between providers mid-session, proper abort support throughout the pipeline, and structured tool results that split LLM content from UI display content. pi-tui implements a minimal terminal UI using differential rendering (only redraw changed lines) and synchronized output sequences to minimize flicker. pi-coding-agent wires it together with radical design decisions: full YOLO mode by default (no permission prompts), no MCP support (use CLI tools with READMEs instead), no background bash (use tmux), no built-in sub-agents (spawn via bash), and no plan mode (use markdown files).

The philosophy challenges conventional wisdom about coding agents. Security measures in existing tools are "mostly security theater"—if an agent can write and execute code with network access, you're already playing whack-a-mole with attack vectors. MCP servers like Playwright dump 13.7k tokens into context on every session for tools you'll never use; CLI tools with READMEs let the agent pay token costs only when needed. Background bash adds complexity for process tracking; tmux provides better observability and lets you co-debug with the agent. Sub-agents mid-session for context gathering are an anti-pattern—do that work in a separate session first and create an artifact. The benchmark results validate this minimalism: pi with Claude Opus 4.5 scored 55.0% on Terminal-Bench 2.0, competitive with sophisticated commercial tools. Even more telling: Terminus 2, which just gives the model a raw tmux session with no fancy tools, ranks competitively on the leaderboard.

What I learned building an opinionated and minimal coding agent

My Notes (2)

TLDR

In Detail

My Notes (2)

TLDR

In Detail

Related