OpenAI watchers have spotted something curious over the last week.
References to GPT-5.1 keep showing up in OpenAI’s codebase, and a “cloaked” model codenamed Polaris Alpha and widely believed to have come from OpenAI randomly appeared in OpenRouter, a platform that AI nerds use to test new systems.
Nothing is official yet. But all of this suggests that OpenAI is quietly preparing to release a new version of their GPT-5 model. Industry sources point to a potential release date as early as November 24.
If GPT-5.1 is for real, what new capabilities will the model have?
As a former OpenAI Beta tester—and someone who burns through millions of GPT-5 tokens every month—here’s what I’m expecting.
A larger context window (but still not large enough)
An AI model’s context window is the amount of data (measured in tokens, which are basically bits of words) that it can process at one time.
As the name implies, a larger context window means that a model can consider more context and external information when processing a given request. This usually results in better output.
I recently spoke to an artist, for example, who hands Google’s Gemini a 300-page document every time he chats with it. The document includes excerpts from his personal journal, full copies of screenplays he’s written, and much else.
This insanely large amount of context lets the model provide him much better, more tailored responses than it would if he simply interacted with it like the average user.
This works largely because Gemini has a 1 million token context window. GPT-5’s, in comparison, is relatively puny at just 196,000 tokens in ChatGPT (expanded to 400,000 tokens when used by developers through the company’s API).
That smaller context window puts GPT-5 and ChatGPT at a major disadvantage. If you want to use the model to edit a book or improve a large codebase, for example, you’ll quickly run out of tokens.
When OpenAI releases GPT-5.1, sources indicate that it will come with a 256,000 token context window when used via the ChatGPT interface, and perhaps double that in the API.
That’s better than today’s GPT-5, to be sure. But it still falls far short of Gemini—especially as Google prepares to make its own upgrades.
OpenAI could make a surprise last-minute upgrade to 1 million tokens. But if it keeps the 256,000 token context window, expect plenty of grumbling from the developer community about why the window still isn’t big enough.
Even fewer hallucinations
OpenAI’s GPT-5 model falls short in many ways. But one thing it’s very good at is providing accurate, largely hallucination-free responses.
I often use OpenAI’s models to perform research. With earlier models like GPT-4o, I found that I had to carefully fact-check everything the model produced to ensure it wasn’t imagining some new software tool that doesn’t actually exist, or lying to me about myriad other small, crucial things.
With GPT-5, I find I have to do that far less. The model isn’t perfect. But OpenAI has largely solved the problem of wild hallucinations.
According to the company’s own data, GPT-5 hallucinates only 26% of the time when solving a complex benchmark problem, versus 75% of the time with older models. In normal usage, that translates to a far lower hallucination rate on simpler, everyday queries that aren’t designed to trip the model up.
With GPT-5.1, expect OpenAI to double down on its new, hallucination-free direction. The updated model is likely to do an even better job at avoiding errors.
There’s a cost, though. Models that hallucinate less tend to take fewer risks, and can thus seem less creative than unconstrained, hallucination-laden ones.
OpenAI will likely try to carefully walk the link between accuracy and creativity with GPT-5.1. But there’s no guarantee they’ll succeed.
Better, more creative writing
In a similar vein, when OpenAI released their GPT-5 model, users quickly noticed that it produced boring, lifeless prose.
At the time, I predicted that OpenAI had essentially given the model an “emotional lobotomy,” killing its emotional intelligence in order to curb a worrying trend of the model sending users down psychotic spirals.
Turns out, I was right. In a post on X last month, Sam Altman admitted that “We made ChatGPT pretty restrictive to make sure we were being careful with mental health issues.”
But Altman also said in the post “now that we have been able to mitigate the serious mental health issues and have new tools, we are going to be able to safely relax the restrictions in most cases.”
That process began with the rollout of new, more emotionally intelligent personalities in the existing GPT-5 model. But it’s likely to continue and intensify with GPT-5.1.
I expect the new model to have the overall intelligence and accuracy of GPT-5, but with a personality to match the emotionally deep GPT-4o.
This will likely be paired with much more robust safeguards to ensure that 5.1 avoids conversations that might hurt someone who is having a mental health crisis.
Hopefully, with GPT-5.1 the company can protect those vulnerable users without bricking the bot’s brain for everyone else.
Naughty bits
If you’re squeamish about NSFW stuff, maybe cover your ears for this part.
In the same X post, Altman subtly dropped a sentence that sent the Internet into a tizzy: “As we roll out age-gating more fully and as part of our ‘treat adult users like adults’ principle, we will allow even more, like erotica for verified adults.”
The idea of America’s leading AI company churning out reams of computer-generated erotica has already sparked feverish commentary from such varied sources as politicians, Christian leaders, tech reporters, and (judging from the number of Upvotes), much of Reddit.
For their part, though, OpenAI seems quite committed to moving ahead with this promise. In a calculus that surely makes sense in the strange techno-Libertarian circles of the AI world, the issue is intimately tied to personal freedom and autonomy.
In a recent article about the future of artificial intelligence, OpenAI again reiterated that “We believe that adults should be able to use AI on their own terms, within broad bounds defined by society,” placing full access to AI “on par with electricity, clean water, or food.”
All that’s to say that with the release of GPT-5.1 (or perhaps slightly after the release, so the inevitable media frenzy doesn’t overshadow the new model’s less interesting aspects), the guardrails around ChatGPT’s naughty bits are almost certainly coming off.
Deeper thought
In addition to killing GPT-5’s emotional intelligence, OpenAI made another misstep when releasing GPT-5.
The company tried to unify all queries within a single model, letting ChatGPT itself choose whether to use a simpler, lower-effort version of GPT-5, or a slower, more thoughtful one.
The idea was noble—there’s little reason to use an incredibly powerful, slow, resource-intensive LLM to answer a query like, “Is tahini still good after one month in the fridge?”
But in practice, the feature was a failure. ChatGPT was no good at determining how much effort was needed to field a given query, which meant that people asking complex questions were often routed to a cheap, crappy model that gave awful results.
OpenAI fixed the issue in ChatGPT with a user interface kludge. But with GPT-5.1, early indications point to OpenAI once again bifurcating their model into Instant and Thinking versions.
The former will likely respond to simple queries far faster than GPT-5, while the latter will take longer, chew through more tokens, and yield better results on complex tasks.
Crucially, it seems like the user will once again be able to explicitly choose between the two models. That should yield faster results when a query is genuinely simple, and a better ability to solve complicated problems.
OpenAI has hinted that its future models will be “capable of making very small discoveries” in fields like science and medicine next year, with “systems that can make more significant discoveries” coming as soon as 2028. GPT-5.1 will likely be a first step down that path.
An attempt to course correct
Until OpenAI formally releases GPT-5.1 in one of its signature, wonky livestreams, all of this remains speculative. But given my history with OpenAI—going back to the halcyon days of GPT-3—these are some changes I’m expecting when the 5.1 model does go live.
Overall, GPT-5.1 seems like an attempt to correct many of the glaring problems with GPT-5, while also doubling down on OpenAI’s more freedom-oriented, accuracy-focused approach.
The new model will likely be able to think, (ahem) “flirt,” write, and communicate better than its predecessors.
Whether it will do those things better than a growing stable of competing models from Google, Anthropic, and myriad Chinese AI labs, though, is anyone’s guess.