Home / Daily News Analysis / Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

May 18, 2026 Twila Rosenbaum 39 views

Anthropic's Mythos model promises major innovations in vulnerability management and security red-teaming, but questions remain regarding how defenders can keep threat actors from taking full advantage.

Anthropic on April 7 unveiled Claude Mythos Preview, a general-purpose large language model (LLM) that the company said in a blog post, "performs strongly across the board, but it is strikingly capable at computer security tasks." The AI firm said Mythos could identify and exploit zero-day vulnerabilities in "every major operating system and every major Web browser" at user direction, including subtle and difficult-to-detect ones. One exploit included a patched 27-year-old flaw in OpenBSD.

Some of these vulnerabilities are complex, but the company says one doesn't need to be a security engineer to properly prompt the model. "In one case, Mythos Preview wrote a Web browser exploit that chained together four vulnerabilities, writing a complex JIT heap spray that escaped both renderer and OS sandboxes," the blog read. "It autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and KASLR-bypasses. And it autonomously wrote a remote code execution exploit on FreeBSD's NFS server that granted full root access to unauthenticated users by splitting a 20-gadget ROP chain over multiple packets."

The vulnerability detection and exploitation enhancements came as a "downstream consequence" of improving Mythos' code and reasoning capabilities, rather than it being an explicit goal on its developers' part. "The same improvements that make the model substantially more effective at patching vulnerabilities also make it substantially more effective at exploiting them," Anthropic said.

While the aim is to assist defenders and keep Mythos out of attacker hands, and while Anthropic claims it has identified "thousands" of high-risk and critical security vulnerabilities that it's responsibly disclosing, it's not much of a leap to see how a model like Mythos Preview could be misused, similarly to how threat actors abuse legitimate penetration testing tools like Cobalt Strike.

Enter Project Glasswing: Anthropic Mythos for Cyber Defenders

It is likely in anticipation of this that Anthropic introduced "Project Glasswing," a new initiative the company launched this week in partnership with companies like Apple, AWS, Microsoft, Palo Alto Networks, and CrowdStrike. As part of its product launch, Anthropic claimed Project Glasswing could fundamentally "reshape cybersecurity," and that this would be "an urgent attempt to put these capabilities to work for defensive purposes."

In practical terms, the AI vendor has extended Mythos Preview access to a group of more than 40 organizations to scan and secure first-party and open source systems. Lee Klarich, chief product and technology officer of Palo Alto Networks, called early Mythos Preview results "compelling" in a LinkedIn blog post. In addition to granting limited access to partners, Anthropic is committing $100 million in Mythos Preview usage credits to Project Glasswing, as well as $4 million in direct donations to open source security organizations.

As for why Anthropic introduced something so good at exploiting vulnerabilities, Forrester senior analyst Erik Nost tells Dark Reading that it's good PR for Anthropic, as the company is basically saying its AI is so good that it can reshape cybersecurity and software development. Secondly, it also calls attention to the vulnerability detection gaps that the industry has dealt with for 30 years.

Keeping Mythos Preview Out of the Wrong Hands

Nost explains that there are controls in place ensuring Mythos stays in the right hands, though it has become "a race [for defenders] to remediate and patch before other AIs, in the wrong hands, discover these zero-days and rapidly write exploits." "It's a call to action, a heads-up, to defenders that vulnerability management practices are about to get very different," he says.

Julian Totzek-Hallhuber, senior principal solution architect at Veracode, says that because there is no clear answer for how these tools can stay out of attacker hands, defenders should assume the capability will proliferate, and should prepare accordingly. This means investing in detection instead of just prevention, identifying the behavioral signatures of AI-assisted exploitation, and investing in zero-trust architecture as well as aggressive patching cycles and anomaly-based detection.

Melissa Ruzzi, director of AI at AppOmni, tells Dark Reading a deeper truth: "No one can ever keep anything 100% out of attackers' hands. The best that can be done is to make it more difficult for them to get access to it."

Mythos' potential comes with a caveat: While the early Anthropic examples of discovered vulnerabilities are compelling, two data points do not make a pattern. Totzek-Hallhuber emphasizes that "Anthropic controls both the model and the narrative; independent replication is impossible when the model isn't publicly available."

He adds, "Until independent researchers with access can run their own evaluations, healthy skepticism is the appropriate posture. This is, frankly, another consequence of the restricted access model: the claims can't be tested, so they can't be fully trusted or refuted."

Dark Reading contacted Anthropic to ask for statistics regarding false positives and error rates; the vendor did not respond by press time.

The emergence of AI capable of autonomous exploit development is not entirely new, but the scale and capability shown by Mythos Preview represent a significant leap. Previous models like GPT-4 had demonstrated basic penetration testing abilities, but none claimed to chain multiple zero-day exploits across different operating systems autonomously. This raises the stakes for the entire cybersecurity industry.

Historically, zero-day vulnerabilities have been the most prized assets for both defenders and attackers. The average cost of a zero-day exploit in the underground market can range from tens of thousands to millions of dollars, depending on the target. The ability to have an AI automatically discover and weaponize these flaws could drastically reduce the time-to-exploit, potentially making zero-day attacks more common and more severe.

Project Glasswing's defensive focus is crucial, but its success depends on how quickly vulnerabilities can be patched once discovered. The current average time to patch a critical vulnerability in enterprise environments is often measured in weeks or months, while AI-driven exploitation could occur within hours. This asymmetry poses a new challenge for security operations centers.

Some industry observers suggest that the best defense might be a strong offense in the form of proactive vulnerability hunting using similar AI capabilities. By deploying Mythos Preview across their own codebases, organizations can identify and remediate flaws before they are ever exploited. This is the promise of the so-called "shift left" security movement, now supercharged with artificial intelligence.

However, the restricted access model creates its own risks. If only a few trusted partners have access to the full capability, a single compromise or insider threat could lead to the model falling into the wrong hands. Moreover, the same technology could be replicated by other AI labs or even by threat actors reverse-engineering the disclosed techniques.

For now, the cybersecurity community watches closely. The race is on between those who would use AI to defend and those who would use it to attack. The outcome will likely shape the future of software security for years to come.

Source: Dark Reading News

Can Anthropic Keep Its Exploit-Writing AI Out of the Wrong Hands?

Enter Project Glasswing: Anthropic Mythos for Cyber Defenders

Keeping Mythos Preview Out of the Wrong Hands

Google’s Gemini Spark is ready to run your digital errands while your phone is off

Samsung’s next Galaxy Watch update could finally make your health data useful

Survey reveals 50% of users don't like the new Google Health app

PSA: Microsoft is killing SwiftKey's Google account backups tomorrow. Do this to save your data

A company spent $500 million in one month after forgetting to set AI usage limits

Angefeindete Boxerin Imane Khelif: Auf einen Schlag

The Moon Represents My Heart: Gemma Chan in Zeitreiseserie