SafetyOS

October 25, 20203 reports

Meta Allegedly Used Books3, a Dataset of 191,000 Pirated Books, to Train LLaMA AI

Meta and Bloomberg allegedly used Books3, a dataset containing 191,000 pirated books, to train their AI models, including LLaMA and BloombergGPT, without author consent. Lawsuits from authors such as Sarah Silverman and Michael Chabon claim this constitutes copyright infringement. Books3 includes works from major publishers like Penguin Random House and HarperCollins. Meta argues its AI outputs are not "substantially similar" to the original books, but legal challenges continue.

Deployers

Various generative AI developers

Developers

Various generative AI developers

The Pile

Shawn Presser

Reports

Report 1 Report 2 Report 3

Linked Risk Vectors

#12 Corporate Power Concentration

#13 Regulatory Lag & Capture