ARTICLE AD BOX
A California federal court has cleared the way for a billion-dollar class action lawsuit against Anthropic, the company behind the Claude language model, over claims of large-scale copyright infringement.
The suit alleges that Anthropic downloaded as many as seven million books from pirate sites like LibGen and PiLiMi between 2021 and 2022. This puts the company in the crosshairs for potentially massive damages, even after a partial win on fair use grounds just weeks earlier.
A "Napster-style" piracy case
According to the court order from July 17, 2025, Anthropic is accused of using the BitTorrent protocol to download pirated books from LibGen and PiLiMi. These files - typically in .epub, .pdf, or .txt format - were stored in a central internal database, regardless of whether they were later used to train AI models.
Judge William Alsup described the company's actions as "Napster-style downloading of millions of works." The order details how, between January 2021 and July 2022, an Anthropic co-founder first downloaded about 200,000 books from the Books3 collection, followed by roughly five million from LibGen and another two million from PiLiMi, targeting titles not already in LibGen.
Ad
THE DECODER Newsletter
The most important AI news straight to your inbox.
✓ Weekly
✓ Free
✓ Cancel at any time
The court decided the case should move forward as a class action, given the sheer volume and complexity of the evidence. Only works sourced from LibGen and PiLiMi are included; Books3 was left out due to missing metadata.
The financial risk for Anthropic is significant. Under US law, damages for willful copyright infringement can reach up to $150,000 per work. Even a much smaller amount per title could still total billions.
Anthropic must turn over a complete metadata list of its LibGen and PiLiMi downloads by August 1, 2025, while plaintiffs are required to submit a detailed list of titles and registrations by September 1, 2025.
Fair use doesn't apply to piracy
In June, the same court ruled that training AI models on legally obtained books may qualify as fair use, especially if the use is "transformative" and no copies are distributed. But the court also made it clear: storing pirated works in an internal library doesn't qualify as fair use.
While the legal status of mass web scraping and the use of public data for AI training is still up in the air, the court’s ruling sets a clear boundary: pirated content can't be justified as fair use, even for AI research or innovation.
Recommendation
The Anthropic case could set a major precedent for the industry, making it clear that AI companies can't sidestep copyright laws when sourcing training data, regardless of how they use it later. The decision could ripple out to ongoing lawsuits against Meta, OpenAI, and others accused of using copyrighted material to train language models.