Log in
/
October 25, 20203 reports

Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Meta and Bloomberg allegedly used Books3, a dataset containing 191,000 pirated books, to train their AI models, including LLaMA and BloombergGPT, without author consent. Lawsuits from authors such as Sarah Silverman and Michael Chabon claim this constitutes copyright infringement. Books3 includes works from major publishers like Penguin Random House and HarperCollins. Meta argues its AI outputs are not "substantially similar" to the original books, but legal challenges continue.

Deployers
Various generative AI developers
Meta
EleutherAI
Bloomberg
Developers
Various generative AI developers
The Pile
Shawn Presser
Meta
EleutherAI
Bloomberg