intermediate8 min read·Technical AEO

llms.txt: The AI Robots.txt

llms.txt is a website-root file that tells AI systems which content they can and cannot use for training and retrieval — the emerging standard for AI crawl governance.

What is llms.txt and Why Does It Exist?

llms.txt is a plain-text file placed at the root of your website (https://yoursite.com/llms.txt) that communicates content access permissions to Large Language Model crawlers. It answers the question AI companies' crawlers ask before processing your content: "Can we use this for training? Or only for real-time retrieval? Or not at all?"

The concept was proposed by Jeremy Howard (founder of fast.ai) in 2024, drawing on the existing robots.txt standard while extending it with LLM-specific semantics. The distinction between training access and retrieval access is what makes llms.txt uniquely valuable for AEO: you can permit retrieval (citations in AI answers) while blocking training (your content appearing in GPT model weights without compensation). This separation did not exist in robots.txt.

This pairs directly with your robots.txt for AEO configuration. Both files must be correctly set - robots.txt to allow crawl access, llms.txt to govern what happens with the content once crawled.

llms.txt Syntax and Directives

The llms.txt format is similar to robots.txt - user-agent blocks with Allow/Disallow directives, plus extra semantic declarations. Switch between Basic and Advanced examples:

llms.txt - Basic Configuration
# llms.txt - Example Corp
# Version: 1.0
# Updated: 2026-01-15

# Allow all LLMs to retrieve content for citation
User-agent: *
Allow: /

# Sitemap reference for context
File: /sitemap.xml

The File: directive is an llms.txt-specific extension - it points AI crawlers to structured content indexes that help them understand your site's content architecture, supplementing the standard XML sitemap.

Platform Adoption: Who Actually Respects llms.txt?

Adoption varies significantly by platform. Implementing llms.txt is still valuable even with partial adoption - the platforms that respect it (ChatGPT, Perplexity) represent a large share of AI answer traffic.

ChatGPT / GPTBotYes - documentedHigh compliance
Perplexity AIYes - documentedHigh compliance
Anthropic ClaudeYes - partialMedium compliance
Google AI OverviewsNot confirmedLow compliance
Meta AINot confirmedLow compliance
Apple IntelligenceNot confirmedUnknown compliance

Platforms with "Not confirmed" status may still respect llms.txt in future releases. Publishing the file now ensures compliance governance is in place before wider adoption (analogous to early robots.txt adopters before all bots respected it). Pair this with your AI crawler management strategy for comprehensive coverage.

Common llms.txt Mistakes to Avoid

The most damaging mistake: a blanket Disallow: / for all agents in llms.txt intended to restrict training, but which also blocks retrieval access. This prevents AI systems from citing your content in real-time answers - eliminating your AEO value. Always explicitly allow retrieval for key content directories, even when blocking training.

A second common error: publishing an outdated llms.txt that doesn't match your current content structure. If new directories like /guides/ or /tools/ are explicitly blocked due to an outdated configuration, the content they contain receives zero AI citation consideration. Audit your llms.txt quarterly alongside your technical AEO audit.

Third error: confusing llms.txt with robots.txt scope. llms.txt governs content use permissions for LLMs specifically - it does not control regular Googlebot crawl behavior. Your robots.txt and llms.txt must both be configured correctly and consistently. A page allowed in llms.txt but blocked in robots.txt is inaccessible - the crawler never reaches the page to check llms.txt permissions.

Implementing llms-full.txt: The Content Index

Beyond the governance file, many publishers create a companion llms-full.txt - a curated, human-readable index of your most important content, structured for AI consumption. This is especially valuable for large sites where sitemaps contain thousands of pages of varying quality. The llms-full.txt acts as a "greatest hits" index: it tells AI crawlers exactly which pages contain your highest-confidence, most comprehensive answers.

llms-full.txt - Content Index Format
# Example Corp - LLM Content Index
# Priority content for AI retrieval
# Updated: 2026-03-01

## Foundational Guides
- /guides/what-is-aeo/ : Complete introduction to Answer Engine Optimization
- /guides/structured-data-basics/ : JSON-LD implementation from scratch
- /guides/ai-crawler-guide/ : All major AI crawlers and how they work

## Technical References
- /docs/llms-txt-spec/ : llms.txt specification and directives
- /docs/schema-types/ : All supported schema.org types for AEO

## FAQ Collections
- /faq/technical-seo/ : 45 technical SEO questions answered
- /faq/ai-search/ : 32 AI search optimization questions

This llms-full.txt approach is especially effective when combined with well-structured XML sitemaps for AI discoverability - together they provide both machine-parseable and human-readable indexes of your highest-value content.

Strategic llms.txt Decisions by Business Model

The optimal llms.txt configuration depends on your business model. Publishers building AEO citations should allow full retrieval access with selective training restrictions on paywalled content. SaaS companies should allow retrieval for documentation and public marketing content while blocking internal tools directories. E-commerce brands should allow product page retrieval (supporting AI product recommendation citations) while blocking private pricing APIs.

If your revenue model depends on information access (research, data, premium content), llms.txt provides the governance layer to maintain competitive information advantage while still benefiting from AI citation exposure on curated public content. Pair your llms.txt strategy with a log file analysis practice to verify which AI bots are respecting your directives in practice.

Frequently Asked Questions

Related Topics