← Bookmarks 📄 Article

Prompt Engineering Best Practices with Claude

Anthropic's applied AI team live-codes a production prompt from scratch, transforming Claude's output from "skiing accident" to confident insurance claim analysis—revealing that prompt engineering is less about magic words and more about sequential reasoning and iterative debugging.

· ai ml
Read Original
Listen to Article
0:005:52
Summary used for search

• Order matters critically: analyzing the accident form before the sketch (context before interpretation) dramatically improved accuracy—mimicking how humans would approach the task
• The 10-part prompt structure: task context → background data → detailed step-by-step instructions → examples → output formatting, with XML tags for parsing and structure
• Use extended thinking as a diagnostic tool—analyze Claude's reasoning transcript to understand where your prompt fails and what context to add
• Prefill responses to control output format (start with { for JSON or <final_verdict> for XML) and skip preamble when you just need structured data
• Prompt engineering is empirical science: build test cases for failure modes, iterate based on wrong outputs, and bake edge cases into your system prompt as examples

Anthropic's Hannah and Christian demonstrate building a production prompt for a Swedish car insurance company that needs to analyze accident report forms and hand-drawn sketches to determine fault. They start with a minimal prompt that hilariously fails—Claude thinks it's analyzing a skiing accident. Through iterative refinement, they show the systematic process of prompt engineering.

The core framework is a 10-part structure: (1) task context defining Claude's role, (2) tone guidance (stay factual, don't guess), (3) background data about the form structure (which never changes, making it perfect for prompt caching), (4) the dynamic content (images), (5) detailed step-by-step instructions, (6) examples of edge cases, (7) conversation history if relevant, (8) task reminders to prevent hallucinations, (9) output formatting requirements, and (10) prefilled responses. They emphasize XML tags for structure because Claude is fine-tuned on them and they enable clean parsing.

The breakthrough insight is about sequential reasoning: telling Claude to analyze the form first, then the sketch—not simultaneously. Without context from the checkboxes, the hand-drawn diagram is unintelligible. This mirrors human cognition and dramatically improved accuracy. They also show using extended thinking as a debugging tool: enable it, watch Claude's reasoning process, then distill those steps into explicit instructions in your system prompt for token efficiency. The iterative process is key—when Claude fails, add that failure mode as an example or constraint. By the end, they've transformed vague, uncertain outputs into structured XML verdicts ready for database insertion.