Log in
/
February 28, 20251 report

Alleged Inclusion of 12,000 Live API Keys in LLM Training Data Reportedly Poses Security Risks

A dataset used to train large language models allegedly contained 12,000 live API keys and authentication credentials. Some of these were reportedly still active and allowed unauthorized access. Truffle Security found these secrets in a December 2024 Common Crawl archive, which spans 250 billion web pages. The affected credentials could have been exploited for unauthorized data access, service disruptions, financial fraud, and a variety of other malicious uses.

Deployers
OpenAI
Microsoft Azure OpenAI Service
Microsoft
Common Crawl
Developers
OpenAI
Microsoft
Common Crawl

Reports