Anthropic says an AI may have just attempted the first truly autonomous cyberattack

In a new report, AI company Anthropic detailed a “highly sophisticated espionage campaign” that deployed its artificial intelligence tools to launch automated cyberattacks around the globe.

The attackers aimed high, targeting government agencies, big tech companies, banks and chemical companies, and succeeded in “a small number of cases,” according to Anthropic. The company says that its research links the hacking operation to the Chinese government.

The company claims that the findings are a watershed moment for the industry, marking the first instance of a cyber espionage scheme carried out by AI. “We believe this is the first documented case of a large-scale cyberattack executed without substantial human intervention,” Anthropic wrote in a blog post. Fast Company has reached out to China’s embassy in D.C. for comment about the report.

Anthropic says that it first detected the suspicious use of its products in mid-September and conducted an investigation to uncover the scope of the operation. The attacks weren’t fully autonomous – humans were involved to set them in motion – but they manipulated Anthropic’s Claude Code tool, a version of the AI assistant designed for developers, to execute complex pieces of the campaign.

Tricking Claude into doing crime

To get around Claude’s built-in safety guardrails, the hackers worked to “jailbreak” the AI model, basically tricking it into doing smaller, benign-seeming tasks without the broader context of their application. The attackers also told the AI tool that they were working in a defensive capability for a legitimate cyber firm to persuade the model to let down its defenses.

After bending Claude to their will, the attackers set the AI assistant to work analyzing its targets, identifying high value databases and writing code to exploit weaknesses it found in its targets’ systems and infrastructure.

“… The framework was able to use Claude to harvest credentials (usernames and passwords) that allowed it further access and then extract a large amount of private data, which it categorized according to its intelligence value,” Anthropic wrote. “The highest-privilege accounts were identified, backdoors were created, and data were exfiltrated with minimal human supervision.”

In the last phase, the attackers directed Claude to document its actions, producing files including stolen credentials and the systems that were analyzed , which they could build on in future attacks. The company estimates that at least 80% of the operation was carried out autonomously, without a human directing it.

Anthropic noted in its report that much like it does with less malicious tasks, the AI made errors during the cyberattack, making false claims about harvesting secret info and even hallucinating some of the logins it produced. Even with some errors, an agentic AI that’s right most of the time can point itself at a lot of targets, quickly create and execute exploits and do a lot of damage in the process.

AI on the attack

The new report from Anthropic isn’t the first time that an AI company has discovered its tools being misused in elaborate hacking schemes. It’s not even a first for Anthropic.

In August, the company detailed a handful of cybercrime schemes using its Claude AI tools, including new developments in a long-running employment scam to get North Korean operatives hired into remote positions at American tech companies.

In another recent cybercrime incident, a now-banned user turned to Anthropic’s Claude assistant to create and sell ransomware packages online to other cybercriminals for up to $1200 each.

“The growth of AI-enhanced fraud and cybercrime is particularly concerning to us, and we plan to prioritize further research in this area,” Anthropic said in the report.

The new attack is noteworthy both for its links to China and for its use of “agentic” AI – AI that can execute complex tasks on its own once set in motion. The ability to work from start to finish with less oversight means these tools work more like humans do, pursuing a goal and completing smaller steps to get there in the process. The appeal of an autonomous system that can pull off detailed analysis and even write code at scale has obvious appeal in the world of cybercrime.

“A fundamental change has occurred in cybersecurity,” Anthropic wrote in its report. “…The techniques described above will doubtless be used by many more attackers—which makes industry threat sharing, improved detection methods, and stronger safety controls all the more critical.”

Related Stories

Retail’s CEO exodus: Walmart’s leadership shake-up comes amid a surge in industry executive departures

Stock options, embryos, and polyamory: Women have more reasons than ever to get a prenup

Stressed about shopping for holiday gifts? These ‘consumer wisdom’ tips can help