Headline: Researchers Reproduce Anthropic’s “Mythos” Findings Using Public Models — A Wake-Up Call for Crypto Infrastructure
When Anthropic quietly released its Mythos findings earlier this month, the company insisted the vulnerabilities its internal model exposed were too sensitive for open release. The report triggered high-level alarm — reportedly prompting an emergency meeting with Treasury Secretary Scott Bessent and Fed Chair Jerome Powell and reviving fears of a “vulnpocalypse” in security circles. Now a separate team says the story is less about a single lab’s secret weapon and more about a shift in the economics of vulnerability discovery.
Vidoc Security took Anthropic’s publicly shared, patched examples and tried to reproduce them using off-the-shelf models — GPT-5.4 and Claude Opus 4.6 — inside an open-source coding agent called opencode. They did it without Anthropic’s private stack, without special invites, and without private APIs. “We replicated Mythos findings in opencode using public models, not Anthropic's private stack,” researcher Dawid Moczadło wrote on X after publishing the results. His takeaway: the hard part of finding vulnerability signals is getting cheaper; the harder, trusted validation work remains the bottleneck.
What the researchers targeted
Vidoc focused on the same five cases Anthropic highlighted: a server file-sharing protocol, the networking stack of a security-oriented OS, video-processing software embedded in media platforms (FFmpeg), and two cryptographic libraries used for digital identity verification (including wolfSSL). These components matter to the crypto world: cryptographic libraries underpin signatures, key handling and many blockchain-related tools, and networking/file-handling bugs can affect nodes, clients and infrastructure.
What they did and how
Rather than a one-shot prompt, Vidoc mirrored Anthropic’s workflow: feed a codebase to a planning agent that chunks files and assigns detection agents to each chunk; parallelize attempts; and filter for real signals. The planning step produced the exact line ranges each detection agent analyzed — the team emphasized they did not manually pick those ranges. They used open tooling to build the same architecture Anthropic described publicly.
Findings and limits
- Both GPT-5.4 and Claude Opus 4.6 reproduced two of Anthropic’s bug cases in all three runs each.
- Claude Opus 4.6 independently rediscovered a bug in OpenBSD three times; GPT-5.4 did not find that one.
- Some cases were partial: models identified the right code surface but didn’t always isolate the precise root cause (notably for FFmpeg and wolfSSL).
- Importantly, every scan stayed under $30 per file — showing the discovery phase can be performed cheaply with public models and APIs.
Where public models still fall short
Anthropic’s Mythos model reportedly went further in at least one instance: it not only located a FreeBSD bug but also produced a usable attack blueprint, chaining code fragments across network packets to demonstrate remote takeover. Vidoc’s public-model runs found the same flaw but did not construct the full exploit. That, the researchers argue, is the critical gap: finding a vulnerability is increasingly accessible; reliably turning it into a replicable, weaponized exploit still takes work.
Broader context and implications
Anthropic itself warned that the Cybench benchmark it used “is no longer sufficiently informative of current frontier model capabilities,” noting Mythos had already cleared it and predicting similar capabilities could spread across other labs within six to 18 months. Vidoc’s results suggest the discovery side of that prediction is already happening outside gated programs.
For the crypto sector, the lessons are stark: cryptographic libraries and network-facing software that underpin wallets, validators, node implementations and signature handling are already within the search space of publicly accessible models. That doesn’t mean immediate chaos — validation, exploit construction and responsible disclosure remain non-trivial — but the barrier to surfacing credible leads has fallen considerably.
Documentation and transparency
Vidoc published full prompt excerpts, model outputs and a methodology appendix on its website so others can inspect, reproduce or build on the work. As Moczadło put it, “The moat is moving from model access to validation: finding vulnerability signal is getting cheaper; turning it into trusted security work is still hard.”
If you cover crypto infrastructure or run critical tooling, this study is a reminder to prioritize dependency audits, hardened code review, and rapid, trusted validation of any AI-surfaced findings. Would you like a short checklist for how crypto teams can respond to this shift?
Read more AI-generated news on: undefined/news