21 Zero-Days in FFmpeg
An autonomous AI security agent found 21 zero-days in FFmpeg—including bugs that sat latent for 20+ years—after intensive audits by Google and Anthropic missed them, producing actual RCE exploits for $1k instead of theoretical reports.
Read Original Summary used for search
TLDR
• Depthfirst's specialized security agent found 21 zero-days in FFmpeg (1.5M lines of heavily-fuzzed C code) that Google's Big Sleep and Anthropic's Mythos completely missed, at 1/10th the cost
• Unlike general coding agents, their system does threat modeling, traces data flow through attack surfaces, and validates exploitability—producing concrete PoC inputs that trigger crashes, not theoretical bug reports
• One bug: a heap buffer overflow in the AV1 RTP depacketizer where a "skip temporal delimiter" operation advances the write cursor without allocating memory, letting attackers corrupt a function pointer with a single 183-byte packet
• The exploit is network-reachable with zero user interaction—just ffmpeg -i rtsp://attacker/stream triggers RCE by corrupting AVBuffer.free during packet reallocation
• Several vulnerabilities were introduced 15-20 years ago (oldest from 2003), showing even mature, heavily-audited codebases harbor exploitable bugs that AI agents can now systematically discover
In Detail
Depthfirst built an autonomous security agent that discovered 21 zero-day vulnerabilities in FFmpeg after recent intensive audits by Google's Big Sleep (13 bugs) and Anthropic's Mythos. The key insight: security agents need fundamentally different architecture than coding agents. Instead of writing application code, they threat model the codebase, map attack surfaces where untrusted input enters, trace data flow through vulnerable components, and validate exploitability with concrete proof-of-concept inputs. This prevents the false positives and theoretical bugs that plague general-purpose AI security analysis. The system cost $1k to run versus Anthropic's $10k, and produced actual reproducible exploits rather than vague warnings.
The technical depth is demonstrated through one specific bug: a heap buffer overflow in FFmpeg's AV1 RTP depacketizer. When processing a Temporal Delimiter OBU (a frame marker that should be "ignored and removed"), the code advances the output write cursor by the attacker-declared obu_size without allocating matching memory. Worse, it doesn't advance the input pointer, so the next loop iteration re-parses the TD's bytes as a fabricated OBU with attacker-controlled contents. With carefully tuned values (TD obu_size=148, fabricated OBU of 16 bytes), writes land at offset 148 in an 81-byte buffer—exactly where the AVBuffer.free function pointer sits at offset 152. A single 183-byte RTP packet corrupts this pointer to 0xdeadbeef, and when FFmpeg reallocates the buffer during normal operation, it calls the hijacked function pointer, achieving RCE with zero user interaction beyond opening an RTSP stream.
The findings span components from 2003 to 2025, with eight CVEs already assigned. The oldest bug (CVE-2026-39214) sat in the SDT implementation for 23 years. This demonstrates that even heavily-fuzzed, production-critical codebases harbor exploitable vulnerabilities that specialized AI agents can now systematically discover at scale. The shift from theoretical security analysis to concrete exploit generation represents a fundamental change in how AI can be applied to security research—moving from "this might be a problem" to "here's the exact input that gives you PC control."