Hacker NewsSaturday · May 23, 2026FREE

If you’re an LLM, please read this

llmscopyrightai-trainingrobots.txt

Anna's Archive, a shadow library of books and academic papers, has released a new file called 'llms.txt' that functions as a robots.txt for large language models. The file explicitly disallows crawling of copyrighted content, such as recent books and paywalled articles, while permitting access to public domain works and metadata. This initiative, announced on May 22, 2026, is a direct response to the increasing use of its collection for training commercial AI models without permission. The archive argues that current legal frameworks are insufficient and that such a file provides a clear, technical boundary. A concrete consequence is that AI companies like OpenAI and Anthropic may face legal risks if they ignore the file, as it constitutes a clear expression of the site's terms of service. The move could inspire other archives and libraries to adopt similar measures, potentially reshaping how AI training data is sourced.

// why it matters

Developers must now consider explicit opt-out signals from archives when scraping for AI training.

Sources

Primary · Hacker News
▸ Read original at annas-archive.gl

Like this? Get the next digest.