The 23-Year Bug That AI Found in Minutes

There is a particular kind of respect reserved for bugs that survive decades. Not because they are admirable, but because they quietly demonstrate how complex systems can outpace human attention. A recently discussed Linux kernel vulnerability—introduced in 2003 and unnoticed for 23 years—falls squarely into that category. What makes this case notable is not just the bug itself, but how it was found: by an AI model with minimal guidance.

Nicholas Carlini, speaking at the [un]prompted AI security conference, described using Claude Code to scan the Linux kernel almost naively. The setup was disarmingly simple: iterate over source files, ask the model to “find a vulnerability,” and collect the results. No elaborate harness, no handcrafted heuristics—just persistence and a prompt framed like a CTF challenge.

The outcome was not subtle. Among the findings were remotely exploitable heap buffer overflows—precisely the class of bugs that tend to evade even experienced security researchers. Carlini’s own reaction captures the shift: he had never found such bugs manually, yet now had multiple candidates surfaced by the model.

One vulnerability in particular illustrates why this matters.

The issue lives in the NFS implementation. Exploiting it requires coordination between two clients interacting with a server, not just malformed input. The first client establishes a lock with an unusually large owner identifier—1024 bytes, which is valid but uncommon. The second client then attempts to acquire the same lock, triggering a denial response from the server.

That denial message is where things break.

The kernel allocates a buffer of 112 bytes to encode the response. But the actual response includes the original lock owner identifier—up to 1024 bytes—along with additional metadata. The total size reaches 1056 bytes. The kernel proceeds to write all of it into the undersized buffer.

That mismatch is enough. An attacker can overwrite adjacent kernel memory with controlled data originating from the oversized owner field. From there, the usual consequences of memory corruption apply.

What stands out is not just the arithmetic error, but the context required to uncover it. This is not a simple bounds-check oversight in isolation; it depends on understanding protocol flow, state interactions between multiple clients, and the structure of serialized responses. The model effectively reasoned across these layers.

Carlini shared that Claude Code even generated protocol diagrams as part of its analysis, helping articulate the attack path. That detail hints at something deeper: the model is not merely pattern-matching known bug signatures, but constructing explanations that resemble how a human might reason through a system.

The historical detail adds weight. The offending code predates git, introduced via a patch that assumed a fixed buffer size sufficient for certain operations, with plans to extend functionality later. That extension—supporting locking—quietly invalidated the assumption. The buffer remained unchanged.

For over two decades, that assumption held just long enough to avoid scrutiny.

The practical bottleneck now is not discovery but validation. Carlini reported hundreds of potential issues identified by the model, many unreported simply because confirming them requires human effort. The asymmetry is striking: machines can generate hypotheses at scale, but humans must still filter signal from noise.

That imbalance suggests a near-term future where vulnerability discovery accelerates faster than remediation. Security researchers will not be the only ones with access to these tools.

If you want to read Carlini’s account in detail, including the exact scripting approach and diagrams, it’s worth exploring his write-up:
https://mtlynch.io/claude-code-found-linux-vulnerability/

There is no need for dramatic language here. The implications are already clear enough. Systems as mature and widely scrutinized as the Linux kernel still contain deep, non-obvious flaws. What has changed is the cost of finding them—and that cost is dropping quickly.

The 23-Year Bug That AI Found in Minutes

Related

Claude Code Controversy: How Much Does Your AI See?

When a Git Worktree Became an AI Agent Escape Hatch

From Chatbots to AI Coworkers: The Rise of Agentic Work

Teaching AI to Imagine Before It Acts

US Government Halts Anthropic’s AI Models Citing Security Fears, Sparks Industry Controversy

The Build Log That Spoke to AI Agents

Half a Billion Dollar AI Blunder: The Hidden Costs of Unchecked Tech Spending

ECC v2.0: Elevating Agentic Work with Versatile Operator Systems and Open-Source Innovation

The Vulnerability Bottleneck Has Moved