← Bookmarks 📄 Article

A love letter to Pi | Lucas Meijer

Ex-Unity engineer shares hard-won lessons on actually using AI coding agents: stop chasing agent swarms, make your repo "agent-friendly," and put the evaluation burden on the agent itself.

· ai ml
Read Original
Listen to Article
0:000:00
Summary used for search

• Think of your codebase as a Marble Madness level where the agent is the marble—your job is removing hazards like incomplete docs and build warnings that send it off track
• Ask yourself how you'll evaluate agent work BEFORE sending it off, then put that in the prompt—agents (and humans) perform better when they know the success criteria upfront
• Make agents create "evaluation packs" (videos, screenshots, HTML slide decks) so reviewing their work takes minutes instead of reading hour-long transcripts
• Aggressively manage context: use branching to prune dead-end side quests, never argue with agents (just rewind and rephrase), stay under 50% context window or intelligence drops
• Pi's hackability enables "Barbapapa software"—programs that morph themselves to fit your workflow while running, like writing extensions for itself mid-session

Lucas Meijer argues that most developers are approaching AI coding agents wrong—chasing "stage 9" complexity like agent swarms instead of solving actual problems. His core insight: treat your codebase like a Marble Madness level where the agent is the marble rolling down. Your job isn't to orchestrate complex agent systems, but to remove hazards that send agents off track—incomplete AGENTS.md files, build systems spewing ignored warnings, outdated documentation. After each agent session, read the full transcript to identify where it went wrong, then fix the repo to prevent future mistakes.

The biggest mental shift is thinking about evaluation before execution. Before sending an agent on a task, decide how you'll verify its work—then put that in the prompt. This gives agents clarity on when they're done and forces them to validate their own output. Meijer's "evaluation packs" approach makes agents do the presentation work: record videos demonstrating features, generate screenshots for visual verification, compile everything into HTML slide decks. This shifts the bottleneck from agent execution time to your evaluation time, so making evaluation efficient is critical. It also prevents agents from "cheating"—if they have to record a video of the website working, they'll catch JavaScript errors themselves.

Context management is the other critical discipline. Most agents show context window percentage—Meijer gets nervous above 50% because intelligence degrades. He uses Pi's tree/branching feature to prune dead-end side quests (like a tangent about beef chili recipes) without paying for them in every subsequent interaction. Never "argue" with agents or say "no, I meant this"—just rewind and rephrase. The ultimate vision is "Barbapapa software" (named after a 70s cartoon about shape-shifting characters): programs that morph themselves to fit users' needs while running. Pi demonstrates this by writing extensions for itself mid-session, like adding a Doom overlay or custom answer UI for handling interview questions.