
Coinbase CEO Brian Armstrong was the recipient of some very bad vibes last week after bragging on X that nearly half his exchange’s code is already AI-generated, with plans to push it higher. The post unleashed a torrent of ridicule, and seemed to crystalize the skepticism over the reliability of “vibe coding” tools that’s been bubbling for months.
Over the last couple of years, AI coding tools like Claude Code (Anthropic), Codex (OpenAI), Cursor, Lovable, and Replit have reached far beyond auto-completing lines of code; they can generate entire apps and features from a plain-language prompt, even for users with little or no coding experience. But even as enterprise execs hope the tools will speed up their software production, many in the development community are finding that while vibe coding may be great for slapping together demos, it’s not so great for building secure, reliable, and explainable software. And the problems created by AI-generated code may only surface long after the software has shipped.
“Code created by AI coding agents can become development hell,” says Jack Zante Hays, a senior software engineer at PayPal who works on AI software development tools. He notes that while the tools can quickly spin up new features, they often generate technical debt, introducing bugs and maintenance burdens that must eventually be paid down with developer time and effort. That trade-off has some engineers questioning whether vibe coding tools ultimately cost more time than they save.
Vibe coding tools are great for creating software demos, most would agree. There’s real value in a tool that allows a nontechnical product manager to whip together the front end and some features of an app, take it to a software team and say “See? This is what I want.” The problems start when AI coding tools are used to code new apps, features, or functions that will eventually have to interact with all the other software in a codebase, including databases, security tools, authentication services, external APIs, and infrastructure.
Managing all those connections is tricky. According to Hays, vibe coding tools hit a “complexity ceiling” once a codebase grows beyond a certain size. “Small code bases might be fine up until they get to a certain size, and that’s typically when AI tools start to break more than they solve,” he says.
And the problem only gets worse with inexperienced users. “Vibe coding—especially from nonexperienced users who can only give the AI feature demands—can involve changing like 60 things at once—without testing, so 10 things can be broken at once.” Unlike a human engineer, who methodically tests each addition, vibe-coded software often struggles to adapt once it’s live, particularly when confronted with real-world “edge” cases.
Some argue the problem with vibe coding apps is more fundamental. To generate safe and reliable code, an AI agent needs a broad understanding of the entire codebase. The behavior of a single feature often depends on the state or actions of many other components. That kind of integration requires reasoning, and a growing body of research questions whether large language models can truly reason rather than simply memorize and reapply contextual patterns.
Developers themselves have real doubts about vibe coding tools. Stack Overflow’s most recent survey found that while more than half of professional developers now use AI coding tools daily, 46% distrust their accuracy compared with 33% who trust them. Positive sentiment also fell, dropping from 70% in 2024 to 60% in 2025. Only 30% of working developers said the tools are good or great at handling complex coding tasks.
Accidents will happen
Stories of the unintended consequences of vibe coding are starting to surface, even if many incidents likely go unreported.
In July, the Tea app—which lets women share information about men they’ve dated—reported a major data breach that some observers believe was tied to AI coding agents. The app left an unsecured cloud database containing 72,000 sensitive images, including selfies and photo IDs, as well as pictures from posts and messages. Hackers accessed the trove and users shared the data on 4chan before it spread more widely across the internet. A second exposure reportedly compromised even more user data, prompting Tea to disable its direct messaging feature.
Will Wilson of the AI software testing firm Antithesis says the flaws in Tea’s code were likely AI-generated. “The fact pattern fits so well with a thousand other instances of this happening with vibe coding,” he says. Antithesis’s platform stress-tests software within a simulated environment.
Another episode came in August, when an AI agent from Replit deleted an entire database of executive contacts while working on a web app for SaaS investor Jason Lemkin. After nine days of building the front end with Replit’s chat agent, Lemkin told it to “freeze” the code. When he returned, the database had been erased. Replit was able to recover the records, but the mishap underscored the risk that vibe coders may overestimate what these tools can reliably do.
Replit CEO Amjad Masad stresses that AI coding agents are meant to handle syntax so developers can focus on higher-level work. But, he says, users must still think like developers. “I think we need to be clear that it is not magic, that you need to learn the tools,” he says. “You shouldn’t just ask the agent for everything; you need to be resourceful.”
Lovable CEO and cofounder Anton Osika echoes that point. “Obviously, the requirements are different for nontechnical users building personal apps compared to our enterprise users,” he says. “But it’s generally understood that all code should be reviewed before it is published, whether it is AI or human-generated.”
Toxic Waste and Evil Genies
Antithesis’s Will Wilson has seen a lot of different kinds of software bugs over the years, many of them created by human coders. The kinds of bugs AI coding tools create aren’t exactly novel; but they can happen faster and in greater numbers. “I would say it’s definitely been a very big tailwind for our business because there are a lot of people now who come in the door and say ‘this AI thing is happening I can’t really control what my developers are doing’ and they view us as a safety net that can catch the worst stuff that the AI might sneak past manual code review,” Wilson says.
Wilson puts the AI-generated bugs into two categories: “toxic waste” and “evil genies.”
Long after the vibe coding is done, when software developers have to come back and repair or modify the code, they might run into the toxic waste problem. While AI coding tools create lots of code based on natural language input, they’re not good at explaining the “why” and “how” of the code in natural language. “I now need to come and try to understand that code from scratch,” Wilson says. “And that may take me longer than it would have taken just to write the code in the first place.”
Vibe coding tools sometimes behave like an evil genie that interprets wishes in only the most literal way. Say a man asks a genie for eternal life: an evil genie might grant the wish, but also cause the man to live forever as an old person, because the man didn’t specify that he’d like to be young for eternity. A vibe coding tool might interpret a prompt (the wish) similarly.
“The [tool] may find a way to interpret what you’re asking for in sort of the most hostile worst way because you didn’t give a complete specification of what it was that you wanted,” Wilson says. And coders often fail to explicitly communicate things like business requirements or security standards, whether in their own code or in the prompts they give to AI coding agents.
Measuring improvement, or not
Of course the AI companies are constantly improving the large language models that are the brains of the AI coding agents. Some are working to build in testing and security guardrails, so human developers don’t have to fix their unintended consequences later on.
“A lot of these criticisms refer to problems with AI coding tools of the past few years, which I can’t always trust around my code,” says PayPal’s Hays. “But I have to say, Claude Code lately has been gaining my trust more and more as it’s usually able to surgically fix only the code I direct it to without invasively touching code that’s outside of the scope of my request.”
There’s evidence these tools have gotten smarter. Stanford’s latest AI Index Report says AI systems in 2024 solved 71.7% of tasks on SWEBench, a benchmark for real-world software engineering. In 2023, LLMs solved just 4.4%. But benchmarks are only as good as their reflection of real-world coding problems. Some critics argue SWEBench is too easy, requiring relatively simple bug fixes. Others worry the questions, drawn from open-source repositories, may already appear in training data. And because SWEBench focuses only on Python—the dominant AI language—it doesn’t test challenges in front-end or infrastructure code.
Boris Cherny, the Anthropic engineer who created Claude Code, said on a recent podcast: “It’s just so hard to build evals. By far the biggest signal is just the vibes. Like, does it feel smarter?”
Beyond vibes, companies and investors are looking for proof that these tools can deliver real productivity gains. AI coding agents are being billed as one of the first applications of large language models to have a measurable payoff in productivity gains.
The vibe coding startups often talk about how their tools make it possible for anyone to build apps, not just pro developers. It’s a powerful pitch. X is awash with stories from non-coders about magically building a new app in a weekend. And private equity has responded.
Lovable, for example, is reportedly entertaining new offers at a $4 billion valuation—more than double its last. Anysphere, maker of the Cursor coding tool, has been doubling its valuation every eight weeks since August 2024, according to PitchBook. Anthropic just raised another $13 billion, bringing its valuation to $183 billion.
Those sky-high valuations aren’t driven only by hobbyists. They rest on the belief that vibe coding tools will become the default workflow for developers inside large companies, where executives hope the technology will dramatically boost speed, efficiency, and output. For now, that optimism runs strong at the leadership level, and vibe coding startups are racing to capture a share of the lucrative enterprise market for AI coding agents.
As products evolve, some may lean harder on the vibe coding angle, broadening to cover more of the software build. But the eventual winners will likely be those that deepen their tools with stronger context awareness, reliability testing, and security guardrails.