Meta in the Dock: the justice system evaluates whether Llama AI was trained on works protected by copyright

A group of American authors accuses the company of having used their works without permission and without paying royalties. Meta is defending itself.

March 10, 2025

Meta Platforms, the parent company of Facebook, faces a lawsuit brought by renowned authors, including Sarah Silverman, Richard Kadrey, and Ta-Nehisi Coates, who accuse the platform of using their works—protected by copyrights—to train its artificial intelligence models Llama without obtaining the required permission. This situation has sparked a debate about the use of protected content in the development of generative AI technologies and could set a key precedent.

The case has gained prominence after a district judge in the United States allowed the lawsuit to proceed, dismissing Meta’s request to annul it. The company has argued that the use of these materials to train its AI is protected under the principle of fair use (fair use), but the authors contend that the technology company has violated the law by using their texts without a license or compensation.

What the authors accuse Meta of

The complainants allege that Meta collected large volumes of protected content to train Llama, its generative AI model. According to court documents, the company is said to have used databases that include pirated books, such as those hosted in unauthorized online libraries, to fuel its machine learning system. This type of practice has been the subject of controversy in the technology industry as it raises questions about respect for creators’ rights.

The litigation is part of a broader set of lawsuits against major tech companies for using protected materials in the training of their AIs. OpenAI has also faced similar lawsuits from writers and artists who seek to protect their intellectual property from AI models that, in their opinion, benefit from their work without offering any compensation.

In this case, in addition to seeking monetary compensation, the authors intend to establish a clearer regulatory framework regarding the use of copyrighted works in the development of AI. According to the complainants, the lack of transparency in the training processes of these models prevents creators from having control over how their texts are used and whether they can receive fair returns for them.

The judge endorses the copyright lawsuit but dismisses other accusations

The district judge assigned to the judicial filing, Vince Chhabria, allowed the authors’ lawsuit against Meta to proceed, considering that the alleged copyright infringement represents a “sufficient tangible harm” to justify the litigation. However, he criticized the rhetoric used by the complainants’ lawyers, describing it as “exaggerated.”

In his ruling, Chhabria noted that the authors have presented sufficient evidence to allege that Meta intentionally removed the copyright management information (CMI) from their works. According to the judge, this action could have been an attempt by the company to prevent Llama from generating content that revealed the use of protected materials in its training.

However, the magistrate dismissed the allegations based on the California Comprehensive Data Access and Fraud Act (CDAFA), a state law that penalizes unauthorized access to computer systems, networks, and data stored on computers.

The judge argued that the complainants did not demonstrate that Meta had accessed their computers or servers, only their books, which does not constitute a violation of this legislation.

Meta’s defense: the fair use argument

The company led by Mark Zuckerberg has responded to the lawsuit by arguing that the use of the texts in question is legitimate under the doctrine of fair use (fair use), a legal principle in the United States that allows certain uses of protected materials without the need for permission or payment of royalties. The company argues that its model does not directly reproduce the content of the books but uses them as reference to generate new and original responses.

However, this defense is questioned by the authors, who argue that Meta’s AI not only has been trained with their books without authorization but also can generate content that mimics their style or reproduces segments close to their original works. This, they argue, represents a violation of their rights as creators and endangers the viability of their work in the digital age.

The debate about fair use in AI is one of the key issues in this case. Although the law in the United States permits certain transformative uses of protected content, it is not clear if the training of AI models fits within these parameters. So far, the courts have not established definitive rules on the matter, which means that this case could be decisive for future judicial decisions in this area.

Possible consequences of the ruling

If justice rules in favor of the authors, Meta and other companies could be required to pay licenses or compensation to the creators whose works are used to train AI.

On the other hand, a favorable ruling for Meta could set a precedent that allows technology companies to continue using protected content without requiring explicit permissions, as long as the use is considered sufficiently transformative. This would generate concern within the creative community, as it could mean that their works can be freely used without receiving any compensation.

Meanwhile, the publishing industry and other affected sectors are closely observing this case, as its outcome could influence future negotiations between creators and technological platforms. As artificial intelligence advances and becomes more sophisticated, the need grows to define legal frameworks that balance innovation with the protection of copyright.

What do you think about this matter? Is Meta entitled to use the authors’ material to train its AIs, or should it pay for the works’ rights? We would like to hear from you 👇

Photo: Flux