SafetyOS

February 28, 20251 report

Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

A dataset used to train large language models allegedly contained 12,000 live API keys and authentication credentials. Some of these were reportedly still active and allowed unauthorized access. Truffle Security found these secrets in a December 2024 Common Crawl archive, which spans 250 billion web pages. The affected credentials could have been exploited for unauthorized data access, service disruptions, financial fraud, and a variety of other malicious uses.

Deployers

OpenAI

Microsoft Azure OpenAI Service

Microsoft

Common Crawl

Developers

OpenAI

Microsoft

Common Crawl

Reports

Report 1

Linked Risk Vectors

#3 Recursive Power Accumulation

#2 Loss of Human Control