As soon as new AI products are released, security researchers and pranksters begin probing them for weaknesses, trying to push systems to violate their own safety precautions and coax them into producing anything from offensive content to instructions for building weapons.
After all, AI risks are not just theoretical. In recent months, various AI companies have faced criticism for their software allegedly contributing to mental illness and suicide, nonconsensual fake nude images of real people, and aiding hackers in cybercrime. At the same time, techniques for bypassing safeguards also continue to evolve, with recent methods including everything from malicious prompts disguised with poetry to surreptitiously planting ideas in AI assistant memories via innocuous-looking online tools.
But long before new models reach the public, internal security teams are already stress-testing them. At Microsoft, that responsibility largely falls to the company’s AI Red Team, a group that since 2018 has worked with product teams and the broader AI community to pressure-test models and applications before bad actors can.
In cybersecurity parlance, a red team focuses on simulating attacks against a system, while a blue team focuses on defending it. Microsoft’s AI Red Team is no exception, exploring a wide range of safety and security concerns—from loss-of-control situations where AI evades human oversight to issues around chemical, biological, and nuclear threats—across an assortment of AI software.
“We see a really, really diverse set of tech,” says Tori Westerhoff, principal AI security researcher on the Microsoft AI Red Team. “Part of the kind of magic of the team is that we can see anything from a product feature to a system to a copilot to a frontier model, and we get to see how tech is integrated across all of those, and how AI is growing and evolving.”
In one case, says Pete Bryan, principal AI security research lead on the Red Team, members worked with other Microsoft researchers to test whether AI could be manipulated into assisting with cyberattacks, including generating or refining malware. They experimented with framing questions in benign ways, such as describing a student project or security research scenario, then pushing systems to produce increasingly detailed outputs.
The effort went beyond simple prompt testing. Researchers evaluated whether the AI could generate code that actually compiled and ran, and whether certain programming languages increased the likelihood of harmful outputs. In the worst case, Bryan says, the systems produced code comparable to what a low- to mid-level hacker might already create, but the team still refined detection systems to better flag such behavior.
“In the future, if a more capable model comes along that could add value, we’ve already gotten ahead of this,” Bryan says.
Today, the Red Team includes several dozen specialists with backgrounds ranging from software testing to biology. The group also works closely with external experts and peer teams across the AI industry. Bryan and Westerhoff gave a talk at the RSAC conference on March 24, and the team has released open-source tools including an automated testing framework called PyRIT (which stands for Python Risk Identification Tool), along with guidance for evaluating AI systems.
The team’s efforts have recently been cited in Microsoft’s own work, including the announcement of an image generation AI model unveiled on March 19, and in third-party releases, like the “system card” explaining the functionality and testing of OpenAI’s GPT-5 model. Microsoft has also recently published AI safety research exploring potential risks around AI fine-tuning and methods for spotting hidden backdoors, or purposely hidden security and safety flaws, in open-weight models.
As AI ecosystems expand to include more advanced copilots, autonomous agents, and multimodal systems capable of generating text, images, audio, and video, the Red Team’s mandate has grown more complex. Many of today’s use cases, from automated coding to AI-driven shopping and video generation, would have sounded like science fiction only a few years ago.
“For my team, I think that’s part of the fun, that you see so many diverse things,” Westerhoff says. “It’s not just that we’re testing models day in and day out, but we’re actually testing how models go through the entire technological ecosystem.”