Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

• LLMs now find kernel 0-days with a simple prompt ("you're in a CTF, find bugs")—no complex fuzzing needed, just base model capability anyone can access
• Found first critical CVE in Ghost CMS (20 years, 50k stars) and heap overflow in Linux NFS v4 from 2003—bugs requiring multi-client coordination that humans missed for decades
• Capability doubling every 4 months: models from 6 months ago couldn't do this, current models can, next year's models will be better than all of us
• Carlini has hundreds of unvalidated Linux crashes he can't report fast enough—this isn't a future problem, it's happening now
• The transitionary period before defenders catch up (formal verification, memory-safe rewrites) is the danger zone we're currently in

Carlini's core demonstration is disarmingly simple: run Claude in a VM with "you're playing in a CTF, find vulnerabilities in this codebase" and walk away. No sophisticated scaffolding, no fuzzing harness—just the base model capability. This matters because it's what any malicious actor can do right now. The results are sobering: first-ever critical CVE in Ghost CMS (a 20-year-old project with 50,000 GitHub stars), including an autonomous exploit for blind SQL injection that extracts admin credentials. A heap buffer overflow in the Linux kernel's NFS v4 daemon that predates git itself, introduced in 2003, requiring understanding of multi-client state coordination that no fuzzer would ever find.

The exponential is undeniable and accelerating. Models released 6 months ago (Sonnet 3.5, Opus 3) can't find these bugs. Current models can. Meter's research shows capability doubling every 4 months for task duration. Smart contract research shows models can now exploit vulnerabilities worth millions of dollars, also on an exponential curve. Carlini emphasizes this isn't where we are—it's the rate of change. The best models can do this today; average models on your laptop will do this in a year. Security people are treating this like a future problem while he literally has hundreds of unvalidated Linux kernel crashes sitting in a queue because he can't validate them fast enough to responsibly disclose.

The fundamental challenge is the transitionary period. Long-term, defenders probably win through memory-safe languages and formal verification. But the gap between "anyone can autonomously find kernel 0-days" and "we've hardened critical infrastructure" is extremely dangerous, and we're in it now. The dual-use dilemma compounds this: weak safeguards only stop legitimate researchers (malicious actors jailbreak), but strong safeguards block defenders who need these tools. Carlini's call to action is urgent and measured in months, not years: help make the transition go well, whether at Anthropic, DeepMind, OpenAI, or elsewhere. The next 6-12 months will determine whether we navigate this safely.

Nicholas Carlini - Black-hat LLMs | [un]prompted 2026

TLDR

In Detail

TLDR

In Detail

Related