Back to Home

A $1,000 AI Agent Just Found a 23-Year-Old RCE in FFmpeg. The Bill Goes to the Volunteers.

11 min read
Sumeet Zankar

Sumeet Zankar

AI Solutions Specialist & Full-Stack Developer

A single 183-byte RTSP packet. One ordinary command. Twenty-one zero-days for a thousand dollars in compute. The economics of finding critical vulnerabilities just collapsed by an order of magnitude — and the economics of fixing them didn't move at all.

One Packet, One Command, Full Control

A single 183-byte RTSP packet. One command — ffmpeg -i rtsp://attacker/stream — the most ordinary thing a media pipeline does. That's enough to hand an attacker program counter control on a machine running the most widely deployed media library on Earth.

That bug had been sitting in FFmpeg's AV1 RTP depacketizer since 2024. Nobody found it. Two decades of fuzzing, manual audits, Google's Big Sleep, Anthropic's Mythos — all missed it. Then a security startup called depthfirst pointed an autonomous agent at the codebase, spent about a thousand dollars in compute, and walked out with 21 zero-days. Nine got CVEs. The oldest had been latent since 2003.

This story is being told as an AI-finds-bugs story. It isn't. It's a funding story, and the timing is brutal.

How We Got Here

In late 2025, Google's Big Sleep started firehosing FFmpeg with AI-discovered vulnerabilities — including, memorably, a bug in the LucasArts Smush codec from a 1995 video game nobody has shipped in three decades. FFmpeg's maintainers, who are volunteers, started calling it "CVE slop" in public. The argument was simple: if a trillion-dollar company can afford an AI agent to mine bugs in our code, it can afford to send patches or cut a check.

Google's response in April 2026 was to revise its bug bounty program to handle AI submissions more cleanly — concise repros, structured reports — and cap patches at three per month. FFmpeg's response to that was, essentially, a laugh. Then Anthropic's Mythos preview ran the same play and found a 16-year-old H.264 bug. Then, this week, depthfirst published 21 Zero-Days in FFmpeg and the calculus changed.

Twenty-one zero-days. One thousand dollars. About a tenth of what Anthropic reportedly spent on Mythos.

Some of the findings are jaw-dropping:

  • CVE-2026-39214 — a stack buffer overflow in the SDT implementation, introduced in 2003, latent for 23 years.
  • DFVULN-122 — an MPEG4-AAC RTP depacketizer bug from 2005, also over two decades old.
  • CVE-2026-39217 — a heap overflow in the VP9 decoder, a March 2025 regression.
  • CVE-2026-39210 — heap overflow in the TS demuxer, missing length bounds checks since 2010.

This isn't a model finding low-hanging fruit. This is a 1.5-million-line, heavily optimized C codebase that has absorbed two decades of relentless fuzzing, and the agent is finding things humans missed for the entire lifetime of the project.

The AV1 Bug, Because the Details Matter

The headline finding is worth understanding, because it shows what an agent can now do without a human in the loop.

FFmpeg's AV1 RTP depacketizer stitches incoming RTP packets into a clean AV1 bitstream. AV1 uses OBUs (Open Bitstream Units). One OBU type — the Temporal Delimiter — is just a frame separator, and the spec says to "ignore and remove" it.

Here's how that "ignore" is implemented in libavformat/rtpdec_av1.c:

pktpos = pkt->size;
// ...
if ((obu_type == AV1_OBU_TEMPORAL_DELIMITER) ||
    (obu_type == AV1_OBU_TILE_LIST)) {
    pktpos += obu_size;       // advance the output cursor...
    rem_pkt_size -= obu_size; // ...and the input counter
    obu_cnt++;
    continue;                 // ...but never allocate, never advance buf_ptr
}

Two problems fall out of that single continue.

First, the write cursor pktpos jumps forward by an attacker-controlled obu_size, but pkt->data is never grown to match. The invariant the whole routine depends on — that pktpos never runs ahead of the allocation — quietly dies.

Second, buf_ptr isn't advanced, so the next loop iteration re-parses the TD's own bytes as a fresh OBU. The attacker now controls both the offset of the write and the contents.

On the next iteration the loop grows the packet by a tiny amount and then writes at pkt->data[pktpos]. With the right numbers, the writes start 67 bytes past the end of an 81-byte allocation. And because of how FFmpeg's allocator lays things out — av_buffer_alloc allocates the data buffer, then an AVBuffer struct, then an AVBufferRef, all 64-byte aligned via posix_memalign — what sits immediately after the overflowed buffer is an AVBuffer struct containing a function pointer.

Controlled offset. Controlled contents. Function pointer at the destination. That is about as clean a memory corruption primitive as exists.

The agent didn't just flag it. It produced a reproducible PoC and a working exploit primitive walkthrough.

What's Actually New Here

The interesting thing isn't that AI can find bugs. Fuzzers have done that for years. The interesting thing is the shape of the work.

A coding agent and a security agent share the same models, but their objectives are inverted. A coding agent writes plausible code; a security agent threat-models a codebase, identifies attacker-controlled entry points, traces data flow through those paths, and validates whether a sink is actually reachable. depthfirst's writeup is explicit that their agent generates harnesses, executes them, and confirms findings with concrete inputs. The output isn't a vague warning — it's a crashing input.

That last part is the unlock. The reason fuzzers have been viable for years is that they produce reproducible crashes; an AI that produces theoretical warnings is just a more eloquent linter. An AI that produces reproducible crashes at $50 per CVE is something else entirely.

For maintainers of widely-deployed C, this is the new baseline. Every codebase will be scanned. The economics make it inevitable.

The Bill

Here's the part nobody at the labs wants to talk about.

The cost of finding a 20-year-old RCE in critical infrastructure used to be measured in elite-researcher quarters. It's now measured in API credits. The marginal cost of the next CVE in FFmpeg is approaching the cost of a nice dinner.

The cost of fixing one hasn't changed at all. A maintainer still has to read the report, confirm the trace, write a patch that doesn't regress the format support of half the planet, get it reviewed, cut a release, coordinate disclosure, and answer the inevitable downstream questions for the next eighteen months.

FFmpeg is maintained by volunteers. The disclosure pipeline that depthfirst, Big Sleep, and Mythos are pointing at terminates in a handful of people with day jobs. Multiply that across cURL, OpenSSL, libxml2, sqlite, every parser anyone depends on, and you can see the shape of the next decade: trillion-dollar companies running cheap agents against unpaid maintainers and shipping the resulting reports as security wins on their quarterly slides.

The FFmpeg maintainers' response — fund us or stop sending bugs — is being framed as grumpy open-source culture. It isn't. It's the only sustainable answer. If your business model depends on a piece of software that processes hostile input on every browser, every CDN, every streaming platform on Earth, and you have invented a $1,000 way to discover that it has 21 zero-days, the bill for fixing them is yours.

What To Do About It, If You Ship Software

A few things are real now in a way they weren't six months ago.

  • Anything that parses untrusted input in C is in scope, today. Not theoretically. Today. If you depend on a media library, an image decoder, a protocol parser — assume an agent has already looked.
  • Pin and patch faster. The window between disclosure and exploitation collapses when reproducible PoCs ship with the writeup. depthfirst published an AV1 exploit primitive walkthrough. That's not a CVE — that's a starter kit.
  • Sponsor your dependencies. This is now an engineering risk question, not a values question. If a project you ship in production is maintained by three volunteers, the failure mode isn't "they get burned out and stop." The failure mode is "they get buried in AI-generated reports and your CVE disclosure lands on a maintainer who hasn't slept."
  • For your own code, run the same agents on yourself. depthfirst, Big Sleep, and Mythos aren't the only players. The asymmetry between "attacker has an agent" and "defender doesn't" is the new shadow IT.

The Story Underneath the Headline

The headline of this story is going to be "AI finds 21 zero-days." The actual story is that the economics of finding critical bugs just collapsed by an order of magnitude, and the economics of fixing them didn't move. Until that gap closes — through funding, paid maintainers, or shared responsibility from the labs running the agents — every disclosure cycle is going to feel a little more like a denial-of-service attack on the people we depend on.

A $1,000 agent found a 23-year-old RCE.

Send the bill to whoever shipped the agent.

Sources

SecurityAI AgentsFFmpegOpen SourceZero-DayFunding

Enjoyed this article?

Connect with me on LinkedIn for more insights on AI, automation, and full-stack development.