Getty Images; Alyssa Powell/BI
New web standards could limit AI‘s access to websites in profound ways, and Big Tech companies including Google, Microsoft, and OpenAI are trying to halt or water down the effort.
The battle is being quietly waged inside the Internet Engineering Task Force, an important industry group that creates standards such as TCP/IP, HTML, and robots.txt that underpin much of what happens online.
The IETF is now tackling one of the thorniest issues it’s ever encountered: How to keep the web thriving in the era of chatbots and AI answer engines.
For 25 years, search engines like Google and Microsoft’s Bing crawled and indexed websites, sending users to relevant pages. This drove traffic, supporting the web’s grand bargain: sites let tech firms copy their data for free in exchange for referrals. Ads and subscriptions funded content creation, which in turn improved search results.
In today’s new AI era, products such as Google’s AI Overviews and AI Mode, OpenAI’s ChatGPT, and Perplexity deliver answers directly, often eliminating the need to visit the original source. These systems still rely on content scraped for free, but by cutting off traffic, they threaten the revenue that funds the creation of quality information online.
The IETF is working on new standards that would define search engines and generative AI answer engines as different entities. It would also provide a way for sites block AI bot crawlers, while still allowing other bots that feed traditional search engines and send users to the original sources of information.
In recent weeks, executives and lawyers for Big Tech companies, including Google, Microsoft, OpenAI, and Amazon, have come out against parts of these standards and are trying to postpone or narrow the scope of the initiative.
Meanwhile, publishers and other content owners are trying to push the standards forward in a process that could wrap up by the end of 2025. While the IETF isn’t a regulatory body, the new standards are likely to influence how the internet works in the new AI era, and big tech companies have largely followed such guidelines in the past.
“Whole business models and lots of revenue are riding on the outcome,” said Alissa Cooper, a former Cisco executive who is part of the IETF working group that’s developing the new standards.
Big Tech wants free access to data to train, refine, and run AI models and use content for outputs. Without high-quality information, model performance could degrade and new AI products may not be as useful or accurate. There are trillions of dollars riding on the future success of AI services, so any limits on how bots access data are controversial.
Meanwhile, publishers and other content creators want to be paid for their data and are demanding the ability to block the new AI bot swarm, which can spike traffic costs while sending fewer real users to websites.
Separating search bots from AI bots
The debate focuses on an IETF document that proposes a new standard for how websites and other digital assets are accessed and used by automated systems such as bot crawlers, AI models, and search engines.
It defines different categories so that website owners can decide whether to allow their content to be scraped and collected for AI model training, search engine results, and other uses.
The document, dated July 21, included a diagram that clearly separates AI model training from search engines. It also had a separate category for “AI Use,” which includes how AI models are run (known as AI inference).
IETF
This opened the possibility that website owners and other content companies would be able to block bot crawlers from collecting their data for use in AI model training and AI model outputs — while still approving access for traditional search engines that send traffic to the original sources of information online.
The key definition centers on search engines. The IETF working document describes a search engine as an “application that directs users to the location from which the assets were retrieved.”
That means, under these emerging new internet standards, search engines would have to generate results that send users to the original sources of information to be considered truly a search engine.
That’s different from how search has been evolving recently. Google added AI Overviews and AI Mode to its search engine this year. These new features use website content for free to answer user questions directly, and this can send less traffic to the original sites.
AI answer engines hit traffic
Recent Pew research found that users who encountered an AI summary clicked on a traditional search result link only 8% of the time. Those who did not encounter an AI summary clicked through 15% of the time. And Google users who saw an AI summary rarely clicked on a link in the summary itself — just 1% of the time. Google has questioned the methodology of this study, although other studies have show referrals from Google have fallen as AI Overviews expand.
Trusted Reviews, which publishes product reviews, was recently crawled by bots 1.6 million times in a single day, and got just 603 users visiting its website. “That’s dramatically lower than you would expect from traditional search,” Chris Dicker, CEO of Candr Media Group, owner of Trusted Reviews, wrote on LinkedIn recently.
CloudFlare CEO Matthew Prince estimates it’s about 10 times harder to get traffic from Google since the search giant introduced AI Overviews and AI Mode. This makes the new IETF standards essential, he told Business Insider in a recent interview.
Google says its new AI-powered search features still send traffic to websites, and may even send higher-quality traffic to sites.
That’s not what Dicker found with Trusted Reviews, though. Those 603 users spent 58% less time on the website and viewed 10% fewer pages than the average user.
He’s looking forward to the new IETF standards. “It looks like it is only a matter of time until they are forced to break out the Google search bot from their other services,” Dicker told Business Insider. “At that point we would look to block every Google bot we can, other than the one specifically sent out for search.”
Can you treat AI differently from search?
In emails to the IETF working group, Google, Microsoft, and other big tech companies have argued that it’s hard to separate search engines from AI because the two technologies have become so intertwined.
“There are very many highly problematic areas in the document,” Krishna Madhavan, a principal product manager for Microsoft AI and the Bing web data platform, wrote in an email to the IETF working group on August 13.Â
Modern search systems rely heavily on AI, even for basic rankings, he wrote in another email on August 4. “ Attempting to isolate ‘search’ as a distinct, opt-out category ignores this integration and creates a false dichotomy that is neither meaningful nor enforceable,” Madhavan added.Â
In late July, Google copyright lawyer Caleb Donaldson warned that the company could drop websites from Search because the IETF definitions are confusing and could leave site owners opting out of broader automated bot crawling by accident.Â
“There’s no meaningful distinction between ‘AI Use’ and ‘Search’ given all of Search runs on AI,” he wrote in an email to the IETF group.
Lawyers from Amazon and OpenAI concurred with Donaldson in other emails seen by Business Insider. OpenAI’s lawyer even insinuated that the IETF’s guidelines could be cited by regulators.
“We must also acknowledge, not ignore, the legal overhang: once published, this vocabulary will be cited by regulators and others,” Esther Tetruashvily, an AI Standards Specialist at OpenAI, wrote.Â
Business Insider asked Google, Microsoft, and OpenAI for comment. Microsoft declined to comment. OpenAI said its stance is represented by Tetruashvily’s emails to the IETF, while Google said the views expressed by the Google executives and lawyers in the IETF group are not “official.”
“AI models have been core to how Search works for over a decade, helping surface relevant sites and driving traffic to them,” a Google spokesperson said. “Unlike many others, we respect the choices that sites make through robots.txt, and we also provide snippet controls that let publishers opt out of having their content appear in Search features like AI Overviews.”
AÂ ‘carve-out’ for Search
In another email to the group, Alissa Cooper shared a new diagram that she said represented the reality of the current situation — and the desire of some website owners to opt out of AI bot crawling while still getting the benefits of traditional search bots and the traffic they send.Â
Alissa Cooper
John Mueller, a veteran search expert at Google, warned that any websites opting out of these categories could mean that a search system would not be able to include the sites in results.
“This feels counter to the goal” of being able to “separate out ‘Search,'” he wrote in an email to the IETF working group on July 31.
Martin Thomson, a distinguished engineer at Mozilla who’s co-authoring the working document, pushed back by noting that the IETF definition of search focuses on whether the technology sends users to the original source of information, rather than what type of AI might be at work in the background.Â
Indeed, the working document creates a “distinct Search category” that “allows for preferences specific to search applications, even if the use of AI is involved in their implementation.”
“In creating a carve-out for search, we’ve focused in on this question: Does it reasonably lead to someone to visit the original location of the content?” Thomson told Business Insider in a recent interview. “And I think we’re starting to come around to the view that that’s a better framework overall.”
He noted that this would specifically rule out Google’s AI Overviews.Â
“If you’re producing a summary that is not intended to direct people to the original source of the content, then that’s off limits,” Thomson explained. “You’re not providing a search application, you’re providing something else.”
Intense debate
Cooper went further by saying that AI answer engines are not a natural evolution of search engines. Instead, they are the result of intentional product choices made by Google and other Big Tech and AI companies.
“A tech company crawls and indexes content and sends traffic to the origin of that information. This has been the experience for two decades,” Cooper explained. “To come along now and say ‘we’ve made a product choice that replaces that and conflates two different things’ — that’s not the same. It’s not delivering the same product experience for users or providers.”
The debate inside the IETF got particularly tense when publishers and others pressed their case to have the ability block AI bot crawlers while still allowing traditional search bots — and the traffic they bring.
“There is some conflation here,” wrote Bradley Silver, global head of public policy, AI & IP at Advance, the owner of Condé Nast and a major shareholder of Reddit and Warner Bros. Discover. “The use of AI to rank and re-rank organic search results should be distinguished from the use of generative AI to produce search results which aggregate and summarize the indexed content.”
He said this conflation is being driven by the need to compete with emerging AI answer engines, and argued this ignores creators’ and publishers’ wishes and is “causing real harm.”
Christopher Flammang, from publishing giant Elsevier, summed up those wishes forcefully in an email to the ITEF working group.
“Today, there is no effective mechanism for those who produce intellectual and creative content to say: ‘yes, I want to be found through search; no, I do not want to be summarized or rewritten by an AI system,'” he wrote in early August. “Without that line, we blur two fundamentally different uses: pointing to content (as search has historically done) and substituting for it (as AI summaries increasingly do).”
A change pushed through
By early September, pressure from Big Tech members of the IETF working group resulted in the “AI inference” category of the new standards being removed.
A new version of the standards included this updated diagram. The separate Search category remains, but “AI Use” is gone.
IETF
The core definition of Search — as an “application that directs users to the location from which the assets were retrieved” — remains.
Still, the removal of the AI Use category forced Silver, the executive from Advance, to propose more targeted ways to help websites signal they don’t want their content used “in ways that are likely to result in AI-generated outputs that substitute for the original asset,” he wrote in an email in early September.
A nightmare scenario
The IETF group is set to meet again in person in coming weeks. OpenAI said it plans to be there.
Google’s spokesperson said, “we continue to actively engage with the ecosystem and listen to feedback from publishers and creators.”
Some members of the working group expect pressure from tech giants to change more parts of the standards, but they see the central carve-out for search remaining.Â
“If you crawl for Search, that’s different than if you crawl for AI,” said Prince, the CEO of CloudFlare, which chairs the IETF group pushing this forward. “Everyone will likely fall in line and agree to that, potentially over the objections of Google. I expect Google will kick and scream, but eventually fall in line, too.”
Without this outcome, some members of the group worry what might happen to the internet.
One IETF member described to Business Insider a nightmare scenario in which there will no longer be an incentive to publish independent websites. Instead, publishers will just submit verified information directly to three or four AI companies, possibly via an Application Programming Interface, a common way for apps to share data.Â
Thomson, the Mozilla engineer who’s co-authoring the IETF document, was more diplomatic. His goal is to produce a new standard that is supported by enough constituents that it’s used and followed widely.
“We are getting a lot more scrutiny and I think the ultimate result will be that it’s going to be much stronger,” he told Business Insider.Â
“The question of how we integrate AI into society and how you balance the concerns of people who produce content against the need to find advancements and make the world a better place through things like AI — this is a difficult one,” he added.Â
Sign up for BI’s Tech Memo newsletter here. Reach out to me via email at abarr@businessinsider.com.
Â