Over the last 18 months, AI assistants went from tech curiosity to legitimate tools for search, product comparison, and buyer education.
For brands, this raises an increasingly urgent question (one we explored in depth in our State of AI Discovery Report)
What type of content do models actually use to build their answers?
To find out, we analyzed 5,000 prompts across four intent types common in buying processes. The goal? Identify which kinds of pages end up cited, paraphrased, or used as knowledge sources. We analyzed two things:
- How citation behavior changed by prompt type
- What structural elements the most frequently cited pages had in common
The results are straightforward: LLMs cite pages that are already structured to be credible to humans.
This isn’t a comprehensive study of every possible prompt type or industry vertical. It’s a directional analysis focused on identifying structural patterns worth testing.
How We Designed the Analysis
We grouped prompts into four categories to avoid traditional SEO bias:
- Branded factual: “What features does [brand] have”, “What is [brand]’s return policy”
- Branded competitive: “[brand] vs [competitor]”, “alternatives to [brand]”
- Category buying queries: “best CRMs for small business”, “accounting software for freelancers”)
- Informational category: what is a CRM”, “how does payroll software work”
For each response we identified URLs that were cited or paraphrased, which formats dominated, and which structural elements appeared most often.
The Page Format That Works in LLMs Depends on User Intent
Even though structure mattered across the entire dataset, different prompt types leaned on different kinds of page formats and different structural patterns. This is where the analysis gets interesting.
For Branded Prompts, Naming and Lists Were Key
Examples: “What features does [brand] offer?”, “What is the return policy for [brand]?”
The most frequently cited pages in this category shared three patterns:
- 82% used explicit entity naming. Pages mentioned brand + product name explicitly (like “Shopify Payments”), not just “our product”. This likely helps models anchor entities correctly.
- 64% included feature or capability lists. These lists were short and scannable, covering features, requirements, or limitations.
- 71% kept paragraphs under 4 lines. Dense, unbroken text blocks appeared rarely in cited factual responses.
For Competitive Prompts, Comparison Tables and Evaluation Criteria Win
Examples: “HubSpot vs Salesforce”, “Alternatives to QuickBooks”
These queries showed a clear preference for evaluative structures:
- 52% used comparison tables or matrices. Tables compared features, plans, or pricing side by side.
- 67% used evaluation criteria headers. Headers like “Pros”, “Cons”, “Best for”, “Pricing”, and “Use cases” made evaluation extractable.
For Category Prompts Segmentation Was Prefered
Examples: “Best CRM for SMBs”, “Top accounting software for freelancers”
These surfaced a clear pattern around shortlist building:
- 74% were structured as ranked lists. “Top X”, “Best X”, or ranked comparisons dominated.
- 46% segmented by ideal customer profile. SMB vs Enterprise, Freelancers vs Agencies, and similar segmentations appeared frequently.
- 69% included mini product summaries. Short feature capsules per brand covered name, value prop, pricing, and ideal user.
For Informational Categories the Clear Winners Were Educational Structures with Low Brand Presence
Examples: “What is a CRM?”, “How does payroll software work?”
These queries highlighted educational patterns:
- 58% followed Definition → Context → Example structure. This matches textbook and glossary writing styles.
- 42% included taxonomy or categorization. “Types of…”, “Categories of…”, or similar structural taxonomies were common.
- Only 11% mentioned brand names. When they did, it was in examples, not as recommendations.
What All Cited Pages Have in Common?
Once we segmented by prompt type, we zoomed in on the overlapping structural traits across all categories. We analyzed 12 different attributes. Four patterns showed up consistently.
1. One Header Every 100 to 200 Words
Pages used by models averaged 1 header every 100–200 words. Pages that barely appeared had 1 header every 400+ words.
Headers act as semantic breakpoints for models, making extraction easier.
58% of cited pages used interrogative headers like “What is…?”, “How does… work?”, and “What are the types of…?” This mirrors how LLMs structure their own responses.
2. Lists Appeared on 63% of Cited Pages
Lists were the most common structural element across the entire study. They were used to define features, compare alternatives, identify steps, classify categories, and outline pros and cons.
One important observation: when models generated structured answers, 76% of the time they transformed source content into lists, even if the original wasn’t formatted that way.
This suggests LLMs prefer atomic information units.
3. Tables Appeared on 39% of Cited Pages
Tables appeared on 39% of cited pages overall, but this jumped significantly for competitive and buying queries.
Tables provided explicit relationships like Plan → Feature, Tier → Price, and Product → Best use case.
4. FAQ Sections Appeared on 47% of Cited Pages
FAQ-type sections were present in 47% of cited pages, especially for factual and informational prompts.
FAQs mimic the prompt → answer behavior of LLMs. We consistently observed direct phrase extraction like “Yes, [brand] supports…”, “[Brand] offers…”, and “The process is…”
This reduces ambiguity, a key advantage for generative synthesis.
What Didn’t Get Cited
Formats that almost never appeared:
- Opinion pieces or storytelling
- Blogs without intermediate headers
- Pages with more images than text
- Pure conversion landing pages
These formats aren’t “bad for marketing.” They’re just not useful for knowledge synthesis, which is what LLMs do.
What This Actually Means for Marketers
If LLMs are going to accompany purchase consideration, it’s not enough for brands to rank. They have to be citable.
Three practical implications to get citations in LLMs.
- Match structure to intent. Factual queries need entity clarity. Competitive queries need comparison units. Buying queries need shortlist formats. Informational queries need definitions and taxonomies.
- Write definitions, not descriptions. Models prioritize what it is, who it’s for, and how it works.
- Produce knowledge, not just content. Pages that win are built like reference material, not campaign assets.
The brands that show up best don’t just publish more. They publish in the formats that machines can reuse.
Navigate the future of search with confidence
Let's chat to see if there's a good fit
SEO Jobs Newsletter
Join our mailing list to receive notifications of pre-vetted SEO job openings and be the first to hear about new education offerings.