 
        Welcome to AI Decoded, Fast Company’s weekly newsletter that breaks down the most important news in the world of AI. I’m Mark Sullivan, a senior writer at Fast Company, covering emerging tech, AI, and tech policy.
This week, I’m focusing on a stunning stat showing that OpenAI’s ChatGPT engages with more than a million users a week about suicidal thoughts. I also look at new Anthropic research on AI “introspection,” and at a Texas philosopher’s take on AI and morality.
Sign up to receive this newsletter every week via email here. And if you have comments on this issue and/or ideas for future ones, drop me a line at sullivan@fastcompany.com, and follow me on X (formerly Twitter) @thesullivan.
OpenAI’s vulnerable position
OpenAI says that 0.15% of users active in a given week have conversations that include “explicit indicators of potential suicidal planning or intent.” Considering that ChatGPT has an estimated 700 million weekly active users, that works out to more than a million such conversations every week.
That puts OpenAI in a very vulnerable position. There’s no telling how many of those users will choose their actions based on the output of a language model. There’s the case of teenager Adam Raine, who died by suicide in April after talking consistently with ChatGPT. His parents are suing OpenAI and its CEO Sam Altman, charging that their son took his life as a result of his chatbot discussions.
While users feel like they can talk to a non-human entity without judgement, there’s evidence that chatbots aren’t always good therapists. Researchers at Brown University found that AI chatbots routinely violate core mental health ethics standards, underscoring the need for legal standards and oversight as use of these tools increases.
All of this helps explain OpenAI’s recent moves around mental health. The company decided to make significant changes in its newest GPT-5 model based on concern about users with mental health issues. It trained the model to be less sycophantic, or less likely to constantly validate the user’s thoughts, even when they’re self-distructive, for example.
This week the company introduced further changes. Chatbot responses to distressed users may now include links to crisis hotlines. The chatbot might reroute sensitive conversations originating to safer models. Some users might see gentle reminders to take breaks during long chat sessions.
OpenAI says it tested its models’ responses to 1,000 challenging self-harm and suicide conversations, finding that the new GPT‑5 model gave 91% satisfactory answers compared to 77% for the previous GPT‑5 model. But those are just evals performed in a lab—how well they emulate real-world conversations is anybody’s guess. As OpenAI itself has said, it’s hard to consistently and accurately pick up on signs of a distressed user.
The problem began coming to light with research showing that ChatGPT users—especially younger ones—spend a lot of time talking to the chatbot about personal matters including self-esteem issues, friend relationships, and the like. While such conversations are not the most numerous on ChatGPT, researchers say they are the lengthiest and most engaged.
Anthropic shows that AI models can think about their own thoughts
It may come as a surprise to some people that AI labs cannot explain, in mathematical terms, how large AI models arrive at the answers they give. There’s a whole subfield in AI safety called “mechanistic interpretability” dedicated to trying to look inside these models to understand how they make connections and reason.
Anthropic’s Mechanistic Interpretability team has just released new research showing evidence that large language models can display introspection. That is, they can recognize their own internal thought processes, rather than just fabricate plausible-sounding answers when questioned about their reasoning.
The discovery could be important for safety research. If models can accurately report on their own internal mechanisms, researchers could gain valuable insights into their reasoning processes and more effectively identify and resolve behavioral problems, Anthropic says. It also implies that an AI model might be capable of reflecting on wrong turns in its “thinking” that send it in unsafe directions (perhaps failing to object to a user considering self-harm).
The researchers found the clearest signs of introspection in its largest and most advanced models—Claude Opus 4 and Claude Opus 4.1—suggesting that AI models’ introspective abilities are likely to become more sophisticated as the technology continues to advance.
Anthropic is quick to point out that AI models don’t think introspectively in the nuanced way we humans do. Despite the limitations, the observation of any introspective behavior at all goes against prevailing assumptions among AI researchers. Such progress in investigating high-level cognitive capabilities like introspection can gradually take the mystery out of AI systems and how they function.
Can AIs be taught morals and values?
Part of the problem of aligning AI systems with human goals and aspirations is that models can’t easily be taught moral frameworks that help guide their outputs. While AI can mimic human decision-making, it can’t act as a “moral agent” that understands the difference between right and wrong, such that it can be held accountable for its actions, says Martin Peterson, a philosophy professor at Texas A&M.
AI can be observed outputting decisions and recommendations that sound similar to those humans might produce, but the way the AI reasons toward constructing them isn’t very humanlike at all, Peterson adds. Humans make judgements with a sense of free will and moral responsibility, but those things can’t currently be trained into AI models. In a legal sense (which may be a reflection of society’s moral sense), if an AI system causes harm, the blame lies with its developers or users, not the technology itself.
Peterson asserts that AI can be aligned with human values such as fairness, safety, and transparency. But, he says, it’s a hard science problem, and the stakes of succeeding are high. “We cannot get AI to do what we want unless we can be very clear about how we should define value terms such as ‘bias,’ ‘fairness,’ and ‘safety,’” he says, noting that even with improved training data, ambiguity in defining these concepts can lead to questionable outcomes.
More AI coverage from Fast Company:
- Harvey, OpenAI, and the race to use AI to revolutionize Big Law
- The 26 words that could kill OpenAI’s Sora
- Exclusive new data shows Google is winning the AI search wars
- OpenAI finalizes restructure and revises Microsoft partnership
Want exclusive reporting and trend analysis on technology, business innovation, future of work, and design? Sign up for Fast Company Premium.
 
        