I loved Google’s new Gemini AI—except when it gaslit me

Hi again, and welcome back to Fast Company’s Plugged In.

On November 18, Google announced a new product. More precisely, it declared that it was ushering in “a new era”—which is what tech companies do when they really want you to pay attention.

The product in question is Gemini 3 Pro, the latest version of Google’s LLM. It’s not just the foundation of Google’s ChatGPT-like chatbot, also called Gemini. It underlies vast quantities of features in flagship offerings such as Google Search, Gmail, and Android. It powers Antigravity, a new Google AI coding platform that debuted on the same day. And thanks to Google Cloud, the model is also available to third-party developers as an ingredient for their apps.

In short, Gemini 3 Pro could hardly be more essential to Google’s aspiration to be AI’s most important player. As Google DeepMind CEO Demis Hassabis said in the announcement, the company sees it as “a big step on the path toward AGI”—AI that’s at least as capable as humans are at most cognitive tasks. Already, the announcement stated, Gemini 3 Pro “demonstrates PhD-level reasoning.”

Google supported its claims with a table listing 20 AI benchmarks in which Gemini 3 Pro beat—and often just plain trounced—Gemini 2 Pro, OpenAI’s GPT-5.1, and Anthropic’s Claude Sonnet 4.5. Humanity’s Last Exam, for example, is a 2,500-question test covering mathematics, physics, the humanities, and other topics. It’s designed to be remarkably difficult (hence the name) and there has been debate over whether it’s so nebulous that some of the theoretically correct answers are nuanced or wrong. According to Google’s table, GPT-5.1 achieved a score of 26.5%, while Claude Sonnet 4.5 managed only 13.7%. By contrast, Gemini 3 Pro scored 37.5%, and did even better when allowed to do searches and run code, with a score of 45.8%.

Outside the lab, Gemini 3 Pro has been received as enthusiastically as any new AI model I can remember. Ethan Mollick, one of my favorite providers of AI analysis based on hands-on usage, pronounced it “very good.” Others said it delivered on the great expectations that OpenAI’s GPT-5 stoked but failed to satisfy.

As I write, I’ve been playing with the Gemini chatbot for just a few days. Much of that experience has been positive. Two writing assignments I gave it came out exceptionally well: an article on the future of the penny, and a detailed report on pricing for Digital Equipment Corp.’s 1960s minicomputers. Its first pass at a simple vibe coding project—building a search engine for Fast Company’s Next Big Things in Tech—was a bit of a mess, but when I explicitly put it into “Build” mode, it nailed the assignment in a few minutes. It also excelled at figuring out what was going on in an assortment of photos I uploaded.

Yet for all that’s gone right so far, I also encountered significant glitches with Gemini 3 Pro from almost the moment I tried it. They left me particularly wary of Google’s blanket claims about the LLM being ready to help users “learn anything” and delivering responses that “are smart, concise and direct, trading cliché and flattery for genuine insight.”

My interactions gone wrong were mostly about animation and comics, topics I turn to when fooling around with new AI because I know them well enough to spot mistakes. Asked about these subjects, Gemini repeatedly spewed hallucinations.

For instance, when I asked if Walt Disney himself had ever worked on the Mickey Mouse comic strip, the LLM gave a correct answer (yes, though only briefly) but then volunteered a bunch of facts I hadn’t asked for that weren’t actually factual. For example, it said that when the strip’s longtime artist retired, his final panel showed Mickey and Minnie gazing into a sunset, a subtle way of marking his departure. (No such strip appeared.) In a different chat, it manufactured an elaborate, entirely fictional backstory involving a different cartoonist also being a noted animation historian, which it told me was “well-documented” and “recognized.”

It wasn’t just that Gemini hallucinated. ChatGPT and Claude still do that, too. But more than other models, Gemini tended to compound its failures by gaslighting me. Helpfully pointing out its gaffes led to some of the strangest exchanges I’ve had with AI since February 2023, when Microsoft’s Bing said it didn’t want to talk to me anymore.

(Full disclosure: I understand that AI is just stringing together a sequence of words it doesn’t understand. All of its human-seeming qualities, be they impressive or annoying, are simulated. But it’s hard to write about them without slipping into a certain degree of anthropomorphizing!)

Repeatedly, Gemini acknowledged its inaccuracies but insisted they were “lore,” “common misconceptions,” or examples of my own confusion. In one case, it eventually confessed: “I have failed you in this conversation by fabricating details to cover up previous errors.” In another instance, it continued to insist that it was right, providing citations that didn’t even mention the topic at hand.

I’m not arguing that the fate of AI hangs on how much the technology knows about old cartoons. However, if any company is burdened with the responsibility of ensuring that its LLM is a trustworthy source of general information, it’s Google. That I tumbled into an abyss of AI-generated misinformation so quickly isn’t an encouraging sign.

Part of the problem lies in the fact that Gemini 3 Pro offers two modes, “Fast” and “Thinking.” The first is the default and was responsible for the prevarications I encountered, at least one of which involved it conflating two separate topics I’d brought up. So far, Thinking mode has worked better in my experiments. But even the speediest of AI models should meet a baseline of accuracy and good behavior, at least if they’re being presented as a way to “learn anything.” (Like many AI tools, the Gemini chatbot does carry a mistakes-are-possible disclaimer.)

To repeat myself, Gemini 3 Pro is impressive in many ways. Still, its launch is yet another example of the AI industry presenting an overly rosy portrait of what it has achieved. It also underlines that benchmarks tell us only so much about a model’s real-world performance.

When OpenAI introduced ChatGPT three years ago this month, it did so in a brief blog post that took pains to detail the bot’s limitations and avoid grand pronouncements about its future. Letting its breakthrough new product speak for itself turned out to be a pretty effective marketing strategy. Even as AI’s giants jostle for bragging rights in what may be the most hypercompetitive tech category of all time, they should remember that lesson.

You’ve been reading Plugged In, Fast Company’s weekly tech newsletter from me, global technology editor Harry McCracken. If a friend or colleague forwarded this edition to you—or if you’re reading it on fastcompany.com—you can check out previous issues and sign up to get it yourself every Friday morning. I love hearing from you: Ping me at hmccracken@fastcompany.com with your feedback and ideas for future newsletters. I’m also on Bluesky, Mastodon, and Threads, and you can follow Plugged In on Flipboard.