As built-in AI pops up in more aspects of everyday life, laymen are counting on the experts to keep technology safe to use. But one Meta employee’s misadventure with AI has social media users fearful for the future of AI alignment.
Summer Yue is the director of alignment at Meta Superintelligence Labs, the company’s AI research and development division. Her LinkedIn bio states that she’s “passionate about ensuring powerful AIs are aligned with human values and guided by a deep understanding of their risks.”
If anyone would have a handle on keeping AI in check, it’s Yue—and yet, on February 22, she posted about losing control of AI on her own computer.
In a post that’s since garnered nearly nine million views on X, Yue shared screenshots from her messages with AI agent OpenClaw. After using it to organize a small mock inbox, she tried getting OpenClaw to sort through her real email, but things went awry when the agent started deleting every message that was more than a week old.
Yue wrote that she watched OpenClaw “speedrun deleting [her] inbox,” even as she sent it instructions, including: “Do not do that,” “Stop don’t do anything,” and “STOP OPENCLAW.”
“I couldn’t stop it from my phone. I had to RUN to my Mac mini like I was defusing a bomb,” Yue added.
After she’d stopped it from fully nuking her inbox, Yue asked OpenClaw if it remembered her instruction to not perform any actions without her approval.
“Yes, I remember,” it replied. “And I violated it. You’re right to be upset.”
OpenClaw, an open-source AI agent, is controversial for the far-reaching permissions it requires to function as intended, including access to users’ email accounts, messaging platforms, and other private and potentially sensitive information.
Combine that with Yue’s example of it explicitly ignoring her instructions, and some online observers are concerned the tool is a bridge too far in terms of AI’s power to override humans.
Yue responded to questions in the replies to her post, including whether she was intentionally pushing the limits of OpenClaw, or if she simply made a mistake.
“Rookie mistake tbh,” she replied. “Turns out alignment researchers aren’t immune to misalignment. Got overconfident because this workflow had been working on my toy inbox for weeks. Real inboxes hit different.”
Yue’s mistake went viral, with X users marveling at the fact that someone as well-versed in AI as Yue could find herself scrambling to stop an AI agent. Some posters said the incident called Meta’s judgment on AI safety into question.
Meanwhile, at least one poster considered the incident’s broader implications: “A matter of time till these people are begging the AI not to launch nuclear weapons,” the user quipped, “and then the last thing it says is ‘I’m sorry. You’re right to be upset.”
Meta did not respond to Fast Company’s request for comment.