Meta is accused of training artificial intelligence with pirated content

Meta was sued in 2023 for using pirated content to develop its large language model (LLM) Llama. “Kadrey and others v. This lawsuit, known as “Meta Platforms,” was filed by authors Richard Kadrey and Christopher Golden. The authors claim that Meta has used their copyrighted works without permission.

Judge Vince Chhabria of the United States District Court for the Northern District of California ruled that documents previously submitted to the court by Meta, but some of which were suppressed, should be made publicly available. The disclosed documents revealed discussions among Meta employees on artificial intelligence and Llama.

The documents included a message in which a Meta engineer said, “Downloading torrents from a Meta-owned corporate laptop doesn’t feel right.” These statements confirm that the company uses pirated content in artificial intelligence training. It is also implied that Meta CEO Mark Zuckerberg, referred to as “MZ” in another message, approved the use of pirated materials.

Meta allegedly benefited from a large library of pirated content such as LibGen during artificial intelligence training. Founded in Russia in 2008, LibGen contains content such as books, magazines and academic articles, and therefore has been the subject of many copyright cases. It is also claimed that Meta also makes use of other sources known as “shadow libraries”.

In response to these claims, Meta argues that it uses publicly available materials within the framework of the “fair use” doctrine. The company states that copyrighted content is used to create language models and produce original expressions.

Meta is not the first company to face this accusation

Accusations that major technology companies are committing copyright infringement to develop artificial intelligence models are not new. For example, a 2022 study found that Apple’s OpenELM model included captions for more than 170,000 YouTube videos. However, Apple clarified that OpenELM is only an open source model and is not used to support Apple Intelligence. The company emphasized that its artificial intelligence was trained with licensed and publicly available data.

Meanwhile, it is known that major publications such as The New York Times and The Atlantic have chosen not to submit their content to Apple Intelligence’s training.