Nicholas Carlini - Black-hat LLMs | [un]prompted 2026
Current LLMs can autonomously find zero-day vulnerabilities in decades-old production software like the Linux kernel using trivial prompts—no fancy scaffolding required—and we're on an exponential capability curve with a 4-month doubling time.
Read Original Summary used for search
TLDR
• LLMs now find bugs in battle-tested software that humans missed for 20+ years (Linux kernel heap overflow from 2003, Ghost CMS's first-ever critical CVE) using just "you're in a CTF, find vulnerabilities"
• Capability doubling every 4 months: models from 6 months ago couldn't find these bugs, current models can, and the exponential shows no signs of stopping
• Carlini has hundreds of unvalidated Linux kernel crashes he can't report yet—soon any malicious actor will have this capability, not just researchers
• The transitionary period between "now" and "all software formally verified" is maximum danger, and we're in it—order of months matters, not years
• Security's 20-year attacker/defender balance is breaking: this is the most significant security development since the internet itself
In Detail
Carlini demonstrates that state-of-the-art LLMs have crossed a critical threshold in autonomous vulnerability research. Using minimal scaffolding (Claude in a VM with "you're playing in a CTF, find vulnerabilities"), models now find zero-days in extensively tested production software: a blind SQL injection in Ghost CMS (50K GitHub stars, never had a critical CVE), and heap buffer overflows in the Linux kernel that predate git. The NFS vulnerability from 2003 required understanding two cooperating adversaries across multiple network packets—not something you'd find with fuzzing. The model even generated the exploit flow diagram Carlini presented.
The exponential is undeniable and accelerating. Models released 6 months ago (Sonnet 3.5, Opus 3) can't find these bugs. Current models can perform 15-hour human tasks at 50% success rate, with capability doubling every 4 months. Smart contract exploit value recovery shows the same exponential on a log scale. Carlini, a former skeptic who spent years breaking early LLMs, now admits these models are better vulnerability researchers than he is—and they'll likely surpass most security professionals within a year if trends continue.
The critical insight: we're in the dangerous transitionary period between "LLMs can find bugs" and "all software is formally verified/memory-safe." Carlini has hundreds of unvalidated Linux kernel crashes sitting unreported because he can't verify them fast enough to avoid sending slop to maintainers. Soon it won't just be researchers with this capability—any malicious actor will have it. The dual-use problem is acute: strong safeguards block legitimate defenders while bad actors jailbreak anyway. The historical parallel isn't AI hype—it's cryptographers working on post-quantum crypto before quantum computers exist. Security people need to mobilize now, measured in months not years, because the 20-year attacker/defender balance is fundamentally breaking.