Skip to content

Add explicit AI crawler allowlist to docs robots.txt#43599

Draft
pelikhan with Copilot wants to merge 2 commits into
mainfrom
copilot/geo-optimizer-add-robots-txt-another-one
Draft

Add explicit AI crawler allowlist to docs robots.txt#43599
pelikhan with Copilot wants to merge 2 commits into
mainfrom
copilot/geo-optimizer-add-robots-txt-another-one

Conversation

Copilot AI commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

The docs site already ships sitemap assets, but its robots.txt did not match the GEO audit’s expected AI crawler signaling and pointed at the wrong sitemap entrypoint. This updates the docs-site crawler policy so AI indexers can unambiguously discover and crawl github.github.com/gh-aw/.

  • Robots policy

    • Reduced docs/public/robots.txt to the crawler directives called out by the audit
    • Explicitly allows:
      • GPTBot
      • OAI-SearchBot
      • ChatGPT-User
      • anthropic-ai
      • ClaudeBot
      • PerplexityBot
      • Perplexity-User
      • Google-Extended
      • Google-CloudVertexBot
  • Sitemap discovery

    • Switched the sitemap reference from sitemap.xml to sitemap-index.xml, which matches the built docs output and GitHub Pages deployment shape
  • Focused regression coverage

    • Added a docs Playwright spec that fetches /gh-aw/robots.txt and asserts the expected bot directives and sitemap index URL
User-agent: GPTBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: Google-CloudVertexBot
Allow: /

Sitemap: https://github.github.com/gh-aw/sitemap-index.xml

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Add robots.txt to docs site for AI crawler indexing Add explicit AI crawler allowlist to docs robots.txt Jul 5, 2026
Copilot AI requested a review from pelikhan July 5, 2026 17:02
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Great work on the robots.txt GEO audit fix! 👋 The change cleanly consolidates the AI crawler allowlist and corrects the sitemap reference to sitemap-index.xml, and the new Playwright spec in docs/tests/robots-txt.spec.ts provides solid regression coverage for both the bot directives and the sitemap URL.

This PR looks well-scoped and ready for review. Nothing blocking here — nice and tidy.

Generated by ✅ Contribution Check · 373.1 AIC · ⌖ 21 AIC · ⊞ 6.2K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[geo-optimizer] Add robots.txt to docs site to enable AI crawler indexing

2 participants