Artificial intelligence is headed toward a litigation iceberg of copyright complaints, which could sink development of this nascent technology. Actors and writers have expressed fears over AI integration in the entertainment sector, and some are pointing to AI’s supposed copyright violations as a fear factor. Now, the news media is trying to get their swing at the AI piñata, with The New York Times’ new lawsuit first in line.
Before New Years, the Times filed a lawsuit against OpenAI and Microsoft, alleging the unauthorized use of its content to train AI tools. This complaint echoes the failed legal theory presented in Sarah Silverman’s recently decided copyright suit against Meta’s AI, LlaMA.
These legal actions misinterpret the nature of generative AI, which learns from diverse data without direct copying or reproduction, thus falling under the purview of “fair use.” While Silverman’s claims were deemed “nonsensical” by a court and dismissed, the Times’ suit is requesting a jury trial rather than a decision from a judge. This litigation strategy introduces a new layer of complexity to this issue and represents a major escalation in the confrontation between traditional media and AI technology.
The suit requests a settlement of “billions of dollars” from OpenAI, which, if successful, could stifle the entire AI industry. The threat of being buried in legal costs and settlements creates a chilling effect, deterring innovation and experimentation.
But we can glean a bit about how Times suit should play out legally by analyzing Sarah Silverman’s swift defeat.
Claiming that because Meta’s LlaMA may have been trained on a dataset containing her works, Silverman demanded payment and royalties for all the content the AI model generates related to her work. The court responded with a scathing rejection of her claims as “nonsensical.” Let’s look at why this didn’t work for SIlverman from both a legal and logical perspective.
Starting with the basics, Silverman claimed that Meta’s AI-generated content, derived from its LlaMA program, infringed upon her copyrighted work. In essence, when a generative AI model is trained, it ingests trillions of pieces of information to learn about sentence structure, the world and many basic and complex inputs that we as humans understand intrinsically through our lived experiences. Sometimes, that process includes copyrighted material.
Then, the court considered this question: if an AI model ingests something that Sarah Silverman wrote, does Silverman have a right to all the content that AI model produces? The core of Silverman’s argument was that LlaMA’s models, by virtue of being trained on a vast array of texts (including potentially her own works), somehow adapted her books into AI-generated outputs. The court, rightfully, found this logic to be flawed.
For one, Silverman’s team couldn’t identify a single instance where the AI’s output directly infringed upon her work. It’s as if someone accused an artist of stealing their painting because they once visited the same museum.
Silverman’s team also failed to establish any direct link between the AI’s training on a diverse dataset and the alleged infringement. The court pointed out that, for an AI output to qualify as a derivative work, it must bear some resemblance or contain elements of the original work. In the absence of this crucial link, the argument fell apart.
The court made clear that there was no substantial similarity in protected expression between the LlaMA AI software and Silverman’s books. It was not a copy or a derivative creation; instead, the product from the AI was original and thus not violative of copyright law.
The court even rejected Silverman’s Digital Millennium Copyright Act (DMCA) claims, too. Crucial for any DMCA claim, a plaintiff must show that the alleged infringer distributed the copyrighted works without proper copyright management information. Silverman couldn’t demonstrate anything of the sort. The court’s message was clear: just because an AI is trained on a dataset does not mean it is distributing or reproducing the works contained in them.
As for Silverman’s claims of unjust enrichment and negligence, these, too, were brushed aside for being redundant and lacking in substance. This ruling is a robust affirmation of the transformative nature of AI-generated content. It emphatically declares that the mere use of copyrighted material in AI training does not equate to infringement, especially when the AI’s output is as transformative and distinct as LlaMA’s. To argue otherwise is to misunderstand the nature of AI and the principles of copyright law.
Along with the legal failures is a logical failure. The crux of Silverman’s claim is that if someone once reads her writing, she is entitled to anything that person ever creates. It sounds absurd, but that was essentially what she claimed. Fortunately, the court called it out – and the same is likely for The New York Times’ suit.
As more industries seek a financial payout from AI businesses, the courts will need to step in to affirm the principle that AI development is transformative and a form of fair use – not subject to copyright complaints like those from Silverman and the Times.
Judicial affirmation is crucial not only to protect AI innovation from being overwhelmed by legal challenges and to allow for small, new competitors to enter the marketplace, but also to ensure that the evolution of technology continues to benefit society at large. Without such judicial guidance, we risk entering an era where legal uncertainty and fear sink the vast potential of AI technology.
Featured image created using DALL-E, a product of OpenAI.