
Last month, the U.S. Copyright Office released a report on generative AI training, concluding that use of copyrighted materials to train artificial intelligence models is not fair use. That conclusion is wrong as both a matter of copyright law and AI policy.
AI is too important to allow copyright to impede its progress, especially as America seeks to maintain its global competitiveness in tech innovation.
Fair use is a defense to copyright infringement that allows companies like Google to reproduce web pages in order to develop their search engines. If Google faced liability for copying the material it indexes, it would go out of business. Similarly, generative AI systems should have permission to be trained on material that is often copyrighted, such as images or articles, to achieve sufficient accuracy.
AI datasets currently utilize a process called “backend copying,” often mimicking how humans learn words and ideas from reading copyrighted materials, which has never before been seen as infringement. These datasets use texts as training data to create a large vocabulary of words, which is analogous to creating a dictionary, while also capturing facts, ideas and expression.
Copyright owners argue that treating backend copying as fair use will stifle creativity and impoverish artists. That’s false. Creators can still make money by selling copies of their work, and any payments from AI use would be miniscule. Without fair use, the VCR, iPhone and Google’s search engine wouldn’t exist. And copyright owners can still sue if AI systems produce replicas of their works as outputs.
Treating backend copying as infringement by denying fair use risks crippling innovation and forfeiting American technological leadership. AI training datasets include hundreds of millions to billions of works; payments for using them would be enormously expensive and difficult to coordinate, hobbling startup companies and reinforcing the dominance of Big Tech. AI is far more than chatbots; the technology is revolutionizing medicine, research, education and the military.
Currently, America is the world leader in AI innovation. However, geopolitical competitors such as China are rapidly advancing, as the release of DeepSeek demonstrates. Limiting AI innovation with copyright threatens American economic and national security.
Humans, like AI datasets, learn language by example; by seeing how words are used by different authors in various texts, well-read individuals become more articulate and knowledgeable. When humans learn facts or ideas from an authorized copy of a work, it is not treated as copyright infringement. In the same way, AI systems should be allowed to learn without the presumption that their output will inevitably infringe.
Furthermore, generative AI is increasingly employed in contexts that are traditionally protected by fair use. For example, AI can be used to train medical students to perform surgical procedures or to conduct academic research. To condemn all AI training as beyond fair use short-circuits the crucial inquiry, which is how AI systems are used — not how they are built.
Courts have protected copying by search engines like Google because search is highly transformative and creates important social value. AI-driven search is beginning to replace traditional search and should enjoy the same fair use protection, because it works better and faster. AI-based search serves the same basic purpose as traditional search but provides even more powerful features, such as the ability to summarize information from large numbers of websites simultaneously and to tailor answers to a user’s specific needs.
AI also boosts innovation in other areas. Some important AI models are released as “open-source” or “open-weight” models, under licenses that allow anyone to download and use them free of charge. The potential downstream uses for these models are nearly unlimited and go far beyond the uses contemplated by the companies that initially trained them. While some of these downstream uses could produce works that infringe, others might involve only non-infringing facts and ideas, or be used in contexts that are also fair use.
Generative AI is a technology that is capable of both infringing and non-infringing uses, similar to VCRs and search engines, and should be assessed in context. To stop the inquiry of fair use at the training stage is to ignore all these remarkable possibilities and to risk impairing the most important information technology since the printing press.
The U.S. Copyright Office’s report is shortsighted. Protecting AI innovation through fair use fits traditional copyright law and supports American leadership in this vital new technology.
Thinh H. Nguyen, J.D., is a legal skills professor at the University of Florida Levin College of Law and the director of its Innovation and Entrepreneurship Clinic. Derek E. Bambauer, J.D., is the current Irving Cypen Professor of Law at the UF Levin College of Law, a National Science Foundation-funded researcher in law and AI, and a former principal systems engineer at IBM.