Answer-engine citation studies need a method because AI answers are variable. A prompt can produce different wording, different sources, or no sources depending on the engine, product surface, time, location, personalization, and whether live search is active.

Our goal is not to pretend the system is perfectly stable. Our goal is to make the observation repeatable enough that patterns can be trusted.

What we record

Field Why it matters
Engine ChatGPT, Perplexity, Google AI features, Claude, Copilot, or another answer surface.
Prompt The exact query or instruction tested.
Answer summary The core claim or recommendation returned.
Cited URL The exact page shown as a source.
Citation surface Inline citation, source card, source panel, related link, or no visible source.
Result type Exact citation, domain mention, wrong-page citation, competitor citation, or no citation.
Notes Important caveats, odd behavior, or follow-up checks.

Prompt panel design

A prompt panel is a fixed set of prompts that tests a topic from several angles. For example, an AEO tools panel might include “best free AEO tools,” “how to check if a page is ready for AI citations,” and “local tools for answer engine optimization.” The exact wording is recorded so future runs can be compared.

Prompts are grouped into families: definition, comparison, recommendation, troubleshooting, local tool, and buyer-intent prompts. That helps separate content discovery from commercial recommendation behavior.

Sampling rules

Small studies are useful if they are labeled correctly. A first pass may use 25 to 50 prompts to find directional patterns. A larger pass may use 100 or more observations across multiple engines. The important rule is that the sample size and prompt mix must be stated before the conclusion.

We avoid mixing unlike prompts into one claim. A definition prompt, a buyer prompt, and a troubleshooting prompt may produce different source behavior. The result table should keep those prompt families separate so the conclusion does not blur the pattern.

How we classify citations

  • Exact citation: the answer cites the target URL that directly supports the claim.
  • Domain mention: the answer mentions the brand or domain but does not cite the exact page.
  • Wrong-page citation: the answer cites the right site but the wrong supporting URL.
  • Competitor citation: another source is used for the answer.
  • No citation: the answer gives information without a visible source.

Source-type classification

Every cited URL is classified by source type. The current categories are official documentation, comparison page, glossary/reference page, long-form guide, tool page, forum/community thread, product page, review site, news article, and unknown. This helps separate “who won” from “what kind of source won.”

If official documentation dominates crawler prompts, that tells us to cite and explain official docs clearly. If comparison pages win AEO-vs-SEO prompts, that tells us to build more comparison pages. If tool pages appear for implementation prompts, that tells us the tools section can become a ranking and citation asset.

Quality controls

Every study should include enough prompts to produce a pattern, not just a single interesting screenshot. Each result should be logged with the date, engine, citation surface, and exact URL. When possible, we rerun prompts and compare whether the same sources appear again.

We also avoid overclaiming. If a study is small, it is labeled as a small study. If a result depends on a logged-in experience, that limitation is stated. If we do not have enough observations, we publish the protocol before publishing conclusions.

How results change the site

The research is not decorative. Each study should create a backlog of site improvements: pages to deepen, glossary terms to add, tool pages to build, crawler checks to run, internal links to add, and claims that need better evidence. If a study does not change what we publish next, the study was too vague.

Tooling

The AI Citation Tracker is the local logging tool for prompt panels. It does not call an API. It turns rows of observed answers into an exportable CSV that can become the dataset behind a research post.