Pages get cited when they survive a chain of gates: access, discovery, retrieval, passage selection, answer composition, and citation display. Most failed AEO work happens because teams rewrite the page before finding which gate failed.

TL;DR

  • A page cannot be cited if the relevant crawler or search surface cannot reach it. Google, OpenAI, and Anthropic all document access controls that can change whether pages are available to their systems.
  • Citation is not the same as ranking. A page may be indexed, mentioned, summarized, or used as background without being shown as the visible citation.
  • The most reliable AEO workflow is diagnostic: confirm access, confirm indexability, confirm textual clarity, confirm passage-level usefulness, then test prompts and source visibility.

What is the simplest mental model?

Answer engines work like a pipeline, not a magic box. A practical AEO audit should ask one question at each stage:

Stage Question Failure symptom
Access Can the system fetch the page? No crawl, no retrieval, no citation
Discovery Can the system find the URL? Page exists but is invisible
Retrieval Does the page match the prompt? Competitors cited instead
Passage selection Is the best section self-contained? Page mentioned but weakly summarized
Answer composition Does the page support a useful claim? Generic answer, no source
Citation display Does the platform expose this source? Used or remembered, but not cited

This model keeps the team honest. If robots.txt blocks a search bot, rewriting an H2 is theater. If the page is accessible but every claim is vague, crawler access is not the problem.

How do answer engines discover pages?

Answer engines discover pages through a mix of classic search indexes, their own crawlers, user-triggered retrieval, APIs, sitemaps, links, and web-scale datasets. The exact mix varies by platform, and most platforms do not fully document their citation-selection systems.

Google is the most explicit for Google Search AI features. Its AI features documentation says AI Overviews and AI Mode use Search systems and that the same foundational SEO best practices apply. For a page to be eligible as a supporting link, Google says it must be indexed and eligible to appear in Search with a snippet.

OpenAI documents several crawlers with different purposes. OAI-SearchBot is tied to surfacing websites in ChatGPT search results. GPTBot is tied to training use. ChatGPT-User is used for certain user-triggered actions and is not the search opt-out control.

Anthropic also documents separate bots. ClaudeBot relates to model training, Claude-User supports user-initiated page access, and Claude-SearchBot improves search result quality. Anthropic says blocking search or user retrieval bots can reduce visibility in those contexts.

The practical rule: do not talk about "AI bots" as one thing. Build a table by platform, user agent, purpose, and desired policy.

How should crawler access be configured?

Crawler access should separate training, search, and user-triggered retrieval where the platform gives you separate controls. That lets a site protect one use case without accidentally suppressing another.

A policy table can look like this:

Platform Training control Search/retrieval control AEO risk
Google Search AI features Google-Extended affects some non-Search AI uses Googlebot and Search preview controls affect Search Blocking Googlebot can remove Search eligibility
OpenAI GPTBot OAI-SearchBot, with ChatGPT-User for user actions Blocking OAI-SearchBot can keep pages out of ChatGPT search answers
Anthropic ClaudeBot Claude-SearchBot, Claude-User Blocking search or user bots can reduce Claude search visibility

For example:

User-agent: OAI-SearchBot
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: Claude-User
Allow: /

User-agent: ClaudeBot
Disallow: /

This is not a universal prescription. It is a template for making the decision explicit. Legal, security, licensing, and brand teams may choose different policies. The AEO point is that search visibility and training permission should not be conflated.

How does retrieval choose what to use?

Retrieval chooses candidate content that appears relevant to the user's query, but each platform can use different sources and techniques. Google documents query fan-out for AI Mode and AI Overviews, where multiple related searches across subtopics and data sources may support a response. OpenAI and Anthropic document crawlers, but not a public formula for visible citations.

For page teams, the useful inference is modest: a page should match a query at the passage level, not only at the title level. If the prompt is "how do I compare SOC 2 Type I and Type II for a vendor review?", the page needs a section that answers that question directly. A generic "Security" page may be accessible and authoritative but still lose retrieval to a clearer explainer.

A retrievable passage usually has:

  • A question-shaped heading.
  • A direct first sentence.
  • The entity named in the passage, not only in the title.
  • A concrete example or table.
  • A date or scope limit when the answer can change.
  • A link to the source-of-truth page or documentation.

That is not a ranking formula. It is content hygiene for retrieval systems that may evaluate chunks, snippets, sections, or passages.

What makes a page citable?

A citable page makes one clear claim per section and gives the answer engine a reason to trust that claim. The source should be both easy to parse and worth citing.

Good citation candidates tend to have one of these roles:

Page type Why it gets cited Example section
Documentation It is the source of truth "API rate limits by plan"
Methodology It explains how data was produced "How we collect AI visibility prompts"
Comparison It resolves buyer choice "SOC 2 Type I vs Type II"
Research It contains original data "Sample: 590M searches analyzed"
Policy It controls a decision "Crawler access policy"
FAQ/support It answers a narrow task "How to export raw citation data"

The page also needs visible text. Google specifically recommends making important content available in textual form for AI features. Hidden content, image-only claims, heavy client-side rendering, and vague copy all make extraction harder.

How are citations chosen or exposed?

Citations are exposed differently by platform, and many platforms do not document the full selection logic. Google says AI features surface relevant links to help users find information and explore. OpenAI says OAI-SearchBot is for appearing in ChatGPT search results. Anthropic says Claude-SearchBot improves search result quality.

That is enough to set expectations but not enough to claim a universal citation algorithm. Do not promise a client that adding a table, schema, or llms.txt file will cause citations. Promise a testable improvement in eligibility, clarity, and evidence.

Track four states separately:

State Meaning What to do
Not visible Brand/page absent Check access, indexability, prompt match
Mentioned Brand named but no source shown Improve source-of-truth pages and off-site corroboration
Cited Page shown as source Improve passage accuracy and conversion path
Mis-cited Wrong page or stale source cited Consolidate, canonicalize, update internal links

This distinction prevents false negatives. A mention can be commercially useful even without a citation. A citation can be bad if it points to an outdated page.

How should you verify a page after shipping?

Verification should be a small, repeatable prompt panel, not a one-off screenshot. Use stable prompts and record the engine, mode, date, exact answer, cited URLs, and whether your target page appeared.

Example panel:

1. What is the difference between [problem A] and [problem B]?
2. Best [category] tools for [specific user type]
3. How does [brand] handle [policy or feature]?
4. [brand] vs [competitor] for [use case]
5. How do I implement [task] in [platform]?

Also inspect server logs or CDN logs for relevant crawlers. If no relevant bot ever requests the page, the prompt test is not enough. If bots request the page but the answer cites a competitor, the issue is likely relevance, evidence, or authority.

What are the common failure modes?

The first failure is blocked access. A site may block the crawler that matters while allowing the crawler that does not matter for search visibility.

The second failure is a page that is technically crawlable but semantically vague. "Our platform helps teams work faster" is not a citable claim. "Our SOC 2 Type II report covers the period from January 1 to December 31, 2025" is.

The third failure is buried evidence. If original data sits behind a PDF, image, accordion, script, or gated report, answer systems may cite a third-party article that summarizes it instead.

The fourth failure is no source hierarchy. A blog post, help article, product page, and docs page all answer the same question differently. The answer engine picks one, and it may not be the one you wanted.

What should an AEO-ready page include?

An AEO-ready page should include a direct answer, evidence, scope, structure, and verification hooks.

Use this template:

# [Specific page title]

Updated: [Month Day, Year]
By: [Author/entity]

## TL;DR
- [Direct answer]
- [Key evidence]
- [Who this applies to]

## [Question the buyer/searcher actually asks]
[Direct answer in sentence one.]
[Concrete example.]
[Caveat or scope.]

## Evidence
[Data, docs, screenshots, methodology, or source-of-truth links.]

## How to verify this
[Prompts, tests, logs, or checklist.]

This structure helps readers and machines. It keeps the answer close to the evidence, and it gives future reviewers a way to test whether the page still deserves citation.

How do you diagnose a mention without a citation?

A mention without a citation usually means the answer engine recognizes the brand or concept, but does not expose your page as the supporting source. That is not automatically a failure. It is a different state with different fixes.

Start by asking what kind of mention it is:

Mention type Example Likely fix
Brand-only mention "Tools include Ahrefs, Semrush, and HubSpot" Build citable product and methodology pages
Category association "Ahrefs offers AI visibility tracking" Strengthen source-of-truth pages and docs
Unsupported recommendation "Use Brand X for enterprise teams" Add proof, comparisons, and customer evidence
Wrong description "Brand X is a social listening tool" Fix entity consistency across owned and third-party pages
Old positioning Old product name or acquired brand appears Update internal links, redirects, schema, and external profiles

Then test whether a source page exists for the claim the engine made. If the answer says your product tracks AI citations, but your product page never defines "citation," the system may cite a help doc, review page, or competitor comparison instead. That is not mysterious behavior. It is the answer engine looking for a page that states the claim more clearly.

The fix is to create a source hierarchy:

Primary source:
- Product or documentation page that states the current claim.

Supporting source:
- Methodology page explaining how the claim is measured.

Proof source:
- Case study, dataset, benchmark, screenshot, or customer example.

Clarifying source:
- FAQ, glossary, comparison, or support article answering narrow follow-up questions.

When that hierarchy exists, answer engines have several possible citation targets that agree with each other. When it does not, a third-party article often becomes the cleanest source.

What source architecture helps citations?

Source architecture helps citations by making the site's facts internally consistent and easy to follow. A single long page can work, but clusters usually work better for complex topics.

For an AEO product, a useful cluster might be:

Page Job
Product page Defines the product and buyer value
Methodology page Explains data sources, limits, and calculations
Docs/help page Explains how to use the feature
Comparison page Clarifies tradeoffs against alternatives
Case study Shows a real implementation
Glossary page Defines ambiguous terms like "mention," "citation," and "visibility"

Each page should link to the others in a way that expresses authority. The product page should point to methodology for evidence. The methodology page should point back to the product for context. The case study should point to the feature it used. The glossary should link to pages that operationalize the terms.

This is not internal linking for pageviews. It is internal linking for source clarity. If the site itself cannot show which page is the source of truth, an answer engine has to guess.

What should you not infer from one citation test?

You should not infer a durable ranking rule from one citation test. Answer engines vary by prompt wording, logged-in state, location, model, mode, freshness, and retrieval path.

A single test can tell you:

  • The page appeared or did not appear for that prompt at that time.
  • The answer engine preferred certain source types.
  • The answer phrasing was accurate or inaccurate.
  • The visible citation was your page, a competitor, a third party, or none.

A single test cannot prove:

  • The platform always prefers that source type.
  • A content change caused a citation.
  • Schema alone changed retrieval.
  • A missing citation means the page was not used at all.
  • A citation means the page is commercially effective.

That is why the output of a test should be a hypothesis and a next action, not a victory lap. If a docs page is cited twice and the product page is never cited, the next action is to add clearer source-of-truth sections to the product page and re-run the prompt panel. If a competitor is cited for every comparison prompt, the next action may be a comparison asset, not another how-to article.

What to do Monday morning

1. Pick five important pages and map each one to the six gates: access, discovery, retrieval, passage selection, answer composition, citation display. 2. Review robots.txt, CDN rules, and bot-management settings for Googlebot, OAI-SearchBot, Claude-SearchBot, and user-triggered agents. 3. Rewrite the top three H2s on each page so the first sentence directly answers a real prompt. 4. Add one evidence block to every page: methodology, source links, table, example, or original data. 5. Build a 20-prompt verification panel and run it every two weeks. 6. Log mentions and citations separately. 7. Fix stale or wrong cited sources before publishing new pages.

Sources