gbrain/docs/ethos/THIN_HARNESS_FAT_SKILLS.md at master · garrytan/gbrain

The 100x productivity gap in AI coding isn't about smarter models—it's about architecture: push intelligence into reusable markdown procedures (fat skills), execution into deterministic tools, and keep the orchestration layer (harness) razor-thin.

May 27, 2026 · ai ml

Read Original

• Skill files work like method calls: markdown procedures that take parameters and encode judgment, not rigid prompts—same /investigate skill becomes medical research analyst or forensic investigator depending on arguments
• Fat harnesses kill performance: 40+ tool definitions with 2-5 second MCP round-trips vs. thin CLI harnesses with 100ms Playwright operations (75x faster)
• Resolvers route context automatically: when developer changes prompt, resolver loads docs/EVALS.md first, which runs eval suite and blocks bad changes—developer never knew it existed
• Latent vs deterministic work separation: LLM reads 50 documents to produce 1 page of structured judgment (latent), then deterministic code executes—forcing 800-person seating chart into latent space produces hallucinations
• YC's 6,000 founder matching system runs nightly enrichments, makes cross-domain judgment calls ("Santos and Oram aren't competitors—different infra layers"), then rewrites its own matching rules based on NPS surveys

The leaked Claude Code source (512K lines, March 2026) revealed the secret: 100x productivity isn't about model intelligence—it's about the harness architecture. The bottleneck is whether the model understands your schema and gets the right context at the right time. Five concepts solve this: skill files (markdown procedures that take parameters like method calls), thin harnesses (200-line orchestration layer, not 40+ bloated tool definitions), resolvers (routing tables that auto-load the right docs), latent vs deterministic separation (judgment in latent space, execution in code), and three-layer architecture (fat skills on top, thin harness middle, deterministic tools bottom).

The anti-pattern is fat harnesses with thin skills: MCP tools with 2-5 second round-trips, REST API wrappers for every endpoint, god tools eating half the context window. This creates 3x tokens, 3x latency, 3x failure rate. Instead, build purpose-built CLI tools (Playwright CLI does screenshot+assert in 200ms vs Chrome MCP's 15 seconds for the same operation—75x faster). The key insight: skill files are method calls. The /investigate skill takes TARGET, QUESTION, DATASET parameters—same seven-step procedure becomes medical research analyst or forensic investigator depending on arguments. This is software design using markdown as the programming language.

YC's 6,000 founder matching system demonstrates all five concepts working together: nightly /enrich-founder cron pulls GitHub stats, social signals, advisor transcripts, and diarizes "SAYS vs ACTUALLY BUILDING" contradictions. The /match skill gets invoked three ways (breakout clusters, serendipity lunch tables, real-time 1:1 pairs) with different parameters but same judgment process. After events, /improve reads NPS surveys, extracts patterns from "OK" responses, and writes new matching rules back into the skill file—it rewrites itself. The result: 12% "OK" ratings dropped to 4% next event. When new models drop, every skill instantly improves because the judgment in latent steps gets better while deterministic steps stay reliable. The system compounds forever.

gbrain/docs/ethos/THIN_HARNESS_FAT_SKILLS.md at master · garrytan/gbrain

TLDR

In Detail

TLDR

In Detail

Related