Page

AI Crawler List

A practical list of AI-related crawlers and user agents for AEO planning, including OpenAI, Anthropic, Perplexity, and Google crawler controls.

This AI crawler list is for AEO planning. It helps teams separate search inclusion, user-triggered retrieval, model-training controls, ordinary search crawling, and crawler-policy documentation.

The list is not a magic ranking lever. It is an operational tool. If you want answer engines to find, understand, and cite your pages, you need to know which crawlers you are allowing, which you are blocking, and why.

Which AI crawlers matter for AEO?

The crawlers that matter depend on the surface: ChatGPT search, Claude search, Perplexity, Google AI features, Gemini or Vertex AI controls, and ordinary Google Search. Each crawler name should be tied to a purpose before it is allowed or blocked.

Crawler	Platform	AEO note
OAI-SearchBot	OpenAI	Search inclusion planning.
GPTBot	OpenAI	Model-improvement or training policy planning.
ChatGPT-User	OpenAI	User-triggered access planning.
ClaudeBot	Anthropic	Anthropic crawler access planning.
Claude-SearchBot	Anthropic	Search or retrieval surface planning.
PerplexityBot	Perplexity	Perplexity access planning.
Googlebot	Google	Search crawling and AI feature eligibility through Search.
Google-Extended	Google	Gemini and Vertex AI model-use control, separate from Search crawling.

How should teams use this crawler list?

Use it as a planning checklist before editing robots.txt. Decide what you want to allow or block by crawler purpose, not by brand name alone. The same company may operate more than one crawler, and those crawlers may represent different products or use cases.

For example, a site may want public guides available to search-related crawlers while keeping private research blocked at the server level. Another site may allow ordinary search crawling but opt out of some model-training uses. Those are business choices, not formatting choices.

What should be tested after changes?

Important URLs return 200.
robots.txt does not block desired crawlers.
Canonical pages remain in sitemaps.
Server and CDN rules do not rate-limit the crawlers you want to allow.
Pages render useful HTML without requiring client-side interaction.
The pages you want cited are linked from hubs, glossary entries, and related-reading blocks.

How should crawler rules be documented?

Keep a crawler policy table with each user agent, whether it is allowed, the reason, the date changed, and the person responsible. That prevents accidental policy drift and makes future audits much easier.

Field	Example
User agent	OAI-SearchBot
Policy	Allow public source pages
Reason	ChatGPT search visibility
Review date	Monthly

How crawler policy connects to AEO

AEO is not only content formatting. It is access, structure, evidence, and measurement. Crawler policy is the access layer. If the access layer is wrong, the best article on the site can still fail to appear as a cited source.

That is why crawler documentation belongs near your publishing workflow. When a new guide, glossary page, or tool landing page goes live, it should be checked against robots.txt, the sitemap, llms.txt, and internal linking. This is the difference between publishing content and publishing source material.

Recommended crawler-policy workflow

List the answer surfaces that matter to the business.
Collect the official crawler documentation for those surfaces.
Decide policy by crawler purpose, not crawler brand.
Update robots.txt and server rules carefully.
Fetch test the most important source URLs.
Add canonical source pages to the XML sitemap and llms.txt where appropriate.
Review citation behavior monthly.

What are the limits?

A crawler list is not a guarantee of compliance or visibility. Some traffic may come through different infrastructure, and platforms may change crawler names, documentation, or behavior. Treat the list as an operational starting point, then validate with logs, fetch tests, and answer-surface checks.

Also remember that allowing a crawler does not guarantee a citation. It only makes access possible. Pages still need a clear answer, strong structure, useful evidence, internal links, and enough authority to be selected over competing sources.

How to prioritize crawlers

Not every crawler deserves the same amount of attention on day one. Start with the surfaces that can plausibly send visibility or influence buying decisions. For many sites, that means Googlebot for search and Google AI features, OAI-SearchBot for ChatGPT search visibility, and PerplexityBot if Perplexity citations matter in the category.

After that, broaden the list based on your audience. Technical buyers may use different answer engines than consumers. Researchers may rely on source panels more heavily than casual searchers. The crawler policy should follow the real discovery journey, not a generic fear of AI bots.

What to log during an audit

Audit field	Why it matters
Crawler name	Prevents broad, ambiguous policy notes.
Official documentation URL	Lets the team re-check behavior later.
Current policy	Shows whether the crawler is allowed or blocked.
Business reason	Connects technical rules to visibility goals.
Last verified	Prevents stale crawler assumptions.

How this supports first-page strategy

Crawler access will not put a weak page on the first page by itself. It does support the larger ranking strategy by making sure high-quality source pages are available to the systems that discover, summarize, and cite web content. For a young authority site, the win comes from combining crawler access, deep content, internal links, glossary definitions, original examples, and consistent publication.

The crawler list is therefore part of the operating system for the site. It tells you which doors are open, but the pages behind those doors still need to be worth entering.

FAQ

Is Googlebot an AI crawler?

Googlebot is the ordinary Google Search crawler, but pages indexed by Search can be eligible for Google Search AI features, so it still matters for AEO.

Is Google-Extended the same as Googlebot?

No. Google-Extended is a separate control related to Gemini and Vertex AI model use, not ordinary Search crawling.

Should crawler rules be copied from another site?

No. Copy the decision model, not the exact rules. Your goals may differ.

Should crawler lists be updated?

Yes. AI crawler documentation changes, and new user agents appear. Review the list at least monthly for a site that depends on answer-engine visibility.