I keep seeing AEO advice that treats your page as one document the engine reads end-to-end. That’s not how retrieval actually works. Most modern search and answer systems split documents into chunks, embed and index those chunks, and then retrieve the few that match a user’s question. Your page is a collection of retrievable units, not a single object.

Once you internalize that, a lot of AEO advice that previously sounded mystical becomes mechanical. Lead with the answer because the chunk needs to be self-sufficient. Use question-form headings because they create stronger query-to-chunk matches. Keep entity names, definitions, and caveats close together because if they’re split across chunks, the model loses context that would have made the answer correct.

This piece is a reference on chunking — what it is, why it shapes citation behavior, and how to write sections that survive the trip through retrieval.

tl;dr

  • Search systems and RAG pipelines retrieve chunks of pages, not whole pages. Google documents passage ranking explicitly. OpenAI’s retrieval docs describe ~800-token chunking with overlap. Anthropic published research showing how chunks fail without local context.
  • A retrievable section is self-contained: it names the entity, answers the question, and keeps the caveat nearby — all in one local area.
  • Long pages are not punished. Buried answers are. A 4,000-word page can win citations if every section is locally complete.
  • The fix is mechanical, not stylistic: question-form headings, answer-first first sentences, examples and caveats close to the claim they support.

What chunking actually is

Chunking is the process of splitting a document into smaller units before retrieval. The system embeds each chunk into a vector, stores those vectors in an index, and at query time it pulls the top-N chunks closest to the user’s question. The model then synthesizes an answer from those chunks.

This isn’t a hypothetical AI thing. Google’s documentation describes passage ranking as a search system that helps identify individual sections of a page. OpenAI’s retrieval guide describes vector stores with configurable chunking — defaults around 800 tokens with overlap. Anthropic published a research piece on contextual retrieval showing how chunking strategies affect retrieval accuracy.

Different systems chunk differently. Some split by sentence, some by paragraph, some by token window with sliding overlap, some by heading boundary, some by LLM-guided semantic break. The Zhou et al. paper from February 2026 (“Beyond Chunk-Then-Embed”) evaluated several methods and found that for in-corpus retrieval — finding the right document among many — simple structure-based methods (fixed-size, sentence, paragraph) often outperform fancier LLM-guided alternatives. That’s a useful result for AEO: you don’t need to optimize for some exotic chunking strategy. Cleanly structured content survives most chunking algorithms reasonably well.

The AEO consequence is what matters: your page is being evaluated as a collection of chunks, not as a single coherent essay. Even when the engine ultimately cites the page as a whole, the path to that citation runs through one or two chunks that matched the query well enough to surface.

Why this changes how you should write

If chunks are the unit of retrieval, the unit of writing matters more than the unit of the page.

Consider a section that says: “This usually works after a few days, but results vary.” Pulled out as a chunk, that sentence is useless. It contains no entity, no question, no specific answer, no qualifier. A retrieval system can index it, but no model will cite it because the chunk is uninterpretable on its own.

Now consider a section that says: “ChatGPT search pickup can vary by site and query, so treat 7-14 days as a testing window rather than a guarantee.” Same approximate content, but as a chunk it’s complete: it names the entity (ChatGPT search), answers the question (how long does pickup take), gives a usable range (7-14 days), and qualifies it (testing window, not guarantee). A retrieval system will index this and a model can cite it.

The difference isn’t writing skill. It’s locality. The first sentence depends on context the chunk doesn’t have. The second carries its own context.

This is where most AEO advice gets practical. “Lead with the answer” isn’t a stylistic preference. It’s a chunk-survival strategy. “Use question-form headings” isn’t headline trickery. It’s giving the chunk a bound that matches the way users phrase queries. “Keep your caveats close to your claims” isn’t editing convention. It’s making sure the qualifier doesn’t get stranded in a different chunk than the assertion it qualifies.

How context gets lost across chunks

The clearest illustration of chunk failure I’ve seen comes from Anthropic’s contextual retrieval research. Their setup: take a chunk that contains a meaningful statement (like “the company’s revenue grew 3% over the previous quarter”), embed it, retrieve it, and try to use it. The chunk is technically informative — but on its own, “the company” is unidentified. Which company? Which quarter? Which year? The information that disambiguates the claim lived in earlier sections that didn’t make it into the same chunk.

Anthropic’s proposed fix for RAG systems was to prepend a generated context blurb to each chunk before embedding (“In a 2023 SEC filing by ACME Corp, the company reported…”). They reported that contextual embeddings reduced top-20 chunk retrieval failure rates by 35% (from 5.7% to 3.7%). Combined with contextual BM25, the failure rate dropped to 2.9% (49% improvement). With reranking on top, it dropped to 1.9% — a 67% improvement over baseline.

That’s a meaningful number. It also has direct implications for how you should write web content, even though the research is about closed RAG systems.

The takeaway isn’t “Anthropic improved chunking by 67%, so chunking matters.” That’s a small overclaim. The takeaway is: the systems that read your content are aware of the context-loss problem and are actively engineering around it. The work being done at the retrieval layer is partially compensating for content that isn’t locally complete. But that compensation isn’t perfect, and your content can either help the system or fight it.

You help the system by writing chunks that already carry their context. You fight the system by relying on document-level coherence — the kind that works for human readers reading top-to-bottom but fails when only one section of the page makes it through retrieval.

What a retrievable section looks like

A retrievable section names the entity, answers the question, and keeps the caveat in the same local area. Concretely:

Weak section:

## Setup

This usually works after a few days, but results vary.

The heading is generic (“Setup” of what?). The first sentence has no entity. There’s no specific answer. There’s no methodology to inspect. As a chunk, this is uncitable.

Stronger section:

## How long does ChatGPT search take to pick up a new product page?

ChatGPT search pickup can vary by site and query, so treat 7-14 days as 
a testing window rather than a guarantee. Log the publication date, 
crawler access status, sitemap inclusion, and the exact prompts you 
re-run, so you can tell which factor moved the needle.

The heading is the question itself. The first sentence answers it directly with a specific range. The second sentence adds method. As a chunk pulled out for a query like “how long does ChatGPT search take to index?”, this section carries everything a retrieval system needs.

This pattern scales. Every important section on a page should be writeable in this form: heading-as-question, first-sentence-as-direct-answer, second-and-third-sentences for caveats and method.

Misconceptions worth naming

A few things I see repeated that aren’t quite right:

1. “Schema replaces prose.” It doesn’t. Schema clarifies entity relationships and page structure for parsers, but the model still has to retrieve and read prose chunks. Schema is necessary infrastructure; it isn’t a substitute for writing well-bounded sections.

2. “Long content is automatically better for AI citation.” Length itself isn’t the variable. Locally complete sections at any length get cited. A 4,000-word page where every section answers a specific question outperforms a 2,000-word page with one buried answer. A 600-word page with a tight answer outperforms a 2,000-word page that meanders.

3. “Every paragraph should be a standalone FAQ.” This is overcorrection. Pages that read like a list of disconnected Q&As feel robotic and damage trust. The goal is local clarity, not aggressive fragmentation. A flowing piece can have natural transitions while still keeping each H2-bounded section self-sufficient.

4. “Chunking is something the engine does to you.” It’s something you can write toward. The engine will chunk your page either way. The question is whether your chunks happen to be useful or accidentally broken.

Implementation: rewrite five sections, not the whole site

Most chunking-aware rewrites I’ve seen go bad because the team tries to overhaul everything at once. The result is months of work and no signal on whether it helped.

A better approach: pick five pages that already get traffic or impressions (Search Console gives you this). For each, identify one section that should be answering a real buyer question. Rewrite that section using the heading-question / first-sentence-answer / caveat-and-method pattern. Don’t touch the rest of the page yet.

Then run a stable prompt panel before and after. Same prompts, same engines, same week. Log:

  • Whether the page is cited
  • Whether the rewritten section is the part that gets summarized
  • Whether the answer engines describe the claim accurately
  • Whether a competitor still owns the citation

You won’t get a clean causal answer from five pages. But you will get directional evidence — and more importantly, you’ll learn which kinds of rewrites work for your specific audience and content type. That feedback loop is worth more than another 50 pages of speculative optimization.

I haven’t run this experiment systematically yet. The patterns I’m describing are based on what I see in retrieval research, in citation behavior I’ve observed across answer engines, and in how the underlying systems (OpenAI’s retrieval guide, Anthropic’s contextual retrieval research, Google’s passage ranking docs) describe their own machinery. The closer you get to running real experiments on real pages, the more you should weight your own data over my generalizations.

What to do Monday morning

  1. Audit your top 20 informational pages for vague H2 headings. Replace generic ones (“Setup”, “Overview”, “How it works”) with question-form headings that match how buyers phrase the query.
  2. Rewrite the first sentence under each important H2 so it answers the heading directly. Cut warm-up phrases like “in this section,” “let’s look at,” “many people wonder.”
  3. Keep entity names, dates, caveats, and examples within the same H2 section as the claim they support. If a qualifier lives three sections away, retrieval may not bring them together.
  4. Avoid burying the answer below background paragraphs. The first 60 words after a heading should contain the actual answer.
  5. Add tables where comparisons span multiple attributes — they survive chunking better than long discursive paragraphs that compare across sentences.
  6. Re-run a fixed prompt panel two weeks after publishing changes. Same prompts, same engines, dated log. Note which engine started citing differently and which didn’t move.

The mechanical version of all of this is one rule: write sections that work when read alone. That’s not a stylistic preference; it’s how the systems that decide whether to cite you actually consume your content.

Sources