GPT-5 prompting guide
OpenAI's official GPT-5 prompting guide reveals that the model's superior instruction-following makes it MORE sensitive to prompt contradictions than previous models—plus real-world tuning insights from Cursor showing how they achieved production-grade agentic coding performance.
Read Original Summary used for search
TLDR
• GPT-5's "surgical precision" instruction-following means contradictory prompts hurt it more than other models—it wastes reasoning tokens trying to reconcile conflicts instead of picking one at random
• Cursor's production tuning: set verbosity=low globally but prompted high verbosity for code tools only; removed "maximize context" language that caused tool overuse; emphasized product features to reduce user confirmations
• Responses API with previous_response_id improved Tau-Bench scores from 73.9% to 78.2% by reusing reasoning context across tool calls
• Control agentic "eagerness" via reasoning_effort and explicit context-gathering criteria—model is thorough by default but can be tuned for speed
• New minimal reasoning mode is fastest option but needs more explicit planning prompts and tool preambles than higher reasoning levels
In Detail
OpenAI's comprehensive GPT-5 prompting guide emphasizes a counterintuitive insight: the model's superior instruction-following capabilities make it more vulnerable to poorly-constructed prompts than previous models. When faced with contradictory instructions, GPT-5 expends reasoning tokens attempting to reconcile conflicts rather than defaulting to one instruction, leading to degraded performance and increased latency. The guide provides a healthcare scheduling example where subtle contradictions about patient consent and lookup procedures significantly impaired the model's reasoning efficiency.
The guide features extensive real-world insights from Cursor, an AI code editor that served as an alpha tester. Cursor's team discovered that setting the verbosity API parameter to low while prompting for high verbosity specifically in code tools achieved optimal balance—concise status updates with readable, well-commented code. They also found that removing "maximize context" language from their prompts prevented tool overuse on smaller tasks where GPT-5's internal knowledge sufficed. By emphasizing product-specific features like Undo/Reject capabilities, they reduced unnecessary user confirmations and enabled more autonomous operation.
The guide strongly recommends the Responses API for agentic workflows, demonstrating that including previous_response_id to reuse reasoning context improved Tau-Bench Retail scores from 73.9% to 78.2%. This allows the model to reference previous reasoning traces rather than reconstructing plans from scratch after each tool call. For controlling agentic behavior, the guide introduces the concept of "agentic eagerness"—GPT-5 is thorough by default but can be calibrated via reasoning_effort settings and explicit context-gathering criteria. The new minimal reasoning mode offers the fastest performance while maintaining reasoning benefits, though it requires more explicit planning prompts and tool preambles than higher reasoning levels. For coding tasks, the guide provides specific framework recommendations (Next.js, Tailwind, shadcn/ui) and demonstrates how self-reflection prompts asking the model to construct and iterate against excellence rubrics improve zero-to-one app generation quality.