What We Actually Know About LLM Citation Signals — And What We Don’t

Every week, someone publishes a piece claiming to definitively rank what drives AI citations. Structured data is the #1 factor. No, it’s domain authority. No, it’s third-party citations. No, it’s brand mentions across the web. Each of these claims has truth in it. None of them is the full picture, and most of them are wrong if you treat them as universal.

According to a 2025 HubSpot report, original research is one of the most effective ways to generate high-quality backlinks and citations, a trend that is now directly translating into Generative Engine Optimization (GEO).

This piece is about being honest. What do we actually know about how LLMs decide what to cite? What do we not know yet? And how should marketing leaders make GEO investment decisions in the gap between the two?

I’d rather you read this and walk away with calibrated uncertainty than read a definitive ranking and act on it confidently. The confident rankings out there right now are mostly wrong in ways that will cost you money if you optimize against them.

What we actually know

Some things about LLM citation behavior are stable enough across platforms and time that we can treat them as known.

Authority and Recency Weights
LLMs draw from a corpus of sources weighted by perceived authority and recency. This is the core mechanism. Every major LLM with retrieval (which is most of them now) maintains some weighting system that prefers sources it has cited before, sources from publications with strong trust signals, and sources that are recent. This isn’t surprising — it mirrors how Google’s algorithms work — but it matters because it explains why incumbents in any given category tend to stay incumbents.
The Power of Specificity
Specific, verifiable claims with named sources are weighted higher than vague claims. “78% of B2B organizations” cites better than “most B2B organizations.” A claim with a year, a sample size, and a methodology citation will be lifted into LLM responses more readily than the same claim phrased loosely. This is true across every LLM we’ve tested.
Structural Cleanliness
Structured data helps LLMs parse content, but it isn’t a ranking factor in the way SEO professionals sometimes describe it. Schema markup, FAQ blocks, clean H1/H2 structure — these make it easier for an LLM to extract specific information from a page. They don’t directly determine whether you get cited. They determine whether the LLM can cite you cleanly. Two pages with equivalent authority but different structural quality will be cited differently because one is easier to extract from. But structure without authority underneath it doesn’t move the needle.
Off-Site Ecosystems
Off-site presence matters more than most SEO-focused content suggests. LLMs read Reddit, peer-review sites, LinkedIn, YouTube descriptions, and analyst reports. The proportions vary by tool and by query type, but the general pattern is consistent — companies with strong off-site presence get cited more often than companies optimized only on-site. This is the single biggest pattern we see in our consulting work that conflicts with how most SEO agencies are still thinking.

Evidence of these shifts can be seen in our guide on 7 real ways businesses use AI for personalization at scale.
First-Party Data
First-party data and named research get cited disproportionately. If you publish original research — even small, focused, transparently methodologically — you get cited at a rate that’s hard to achieve any other way. Industry studies have unusually long citation half-lives because they become the source other writers cite when they need a number.

Our 2026 State of GEO in B2B Marketing study, based on 225 responses from B2B leaders, gives you a data-grounded starting point.

That’s roughly the list of things I would defend confidently. The rest of what gets written about LLM citation signals is either speculation or contingent on factors that change.

What we don’t know

Here are the things that get claimed as universal truths and that we genuinely don’t know yet.

The exact weighting of different signal types.
Anyone claiming to know that structured data weights X% and third-party citations weight Y% is making it up. The LLM vendors don’t publish this, the weighting differs across platforms, and the proxies we use to estimate it (tools like Profound, Athena, etc.) are inference systems with significant variance. We can spot patterns. We can’t claim precision.
How signal weighting changes by buyer context.
Earlier I wrote that the way a buyer prompts changes what gets cited. That’s true, but we don’t know exactly how. A CFO asking Claude about treasury management software gets different sources cited than a CMO asking ChatGPT about marketing automation, but the precise reasons — how much is the tool, how much is the prompt type, how much is the user’s prior context — aren’t separable in any clean way.
How much retrievals shift over time.
We’ve tracked the same prompts month over month for several clients and watched the citation sources reshuffle. Why? New content gets published. Reddit threads gain or lose engagement. Models get retrained. New versions get released. The fact that citation patterns shift is known. The exact half-life of a citation in a given category is not.
The displacement cost.
I’ve written elsewhere that AI search is path-dependent — once a model has learned to associate your category with three specific brands, displacing them is harder than getting there first. That’s true in the directional sense. But how hard? How long does it take? What specifically moves an incumbent out of the cited set? Honest answer: we don’t have good data on this yet. Anyone telling you they do is selling you something.
The interaction between different signal types.
Off-site presence helps. On-site optimization helps. First-party data helps. But how do these interact? Does strong Reddit presence amplify on-site work? Does poor structured data nullify off-site authority? We see patterns suggesting they reinforce each other, but the multiplicative effects are not well-characterized.
How different LLMs differ specifically.
ChatGPT, Claude, Perplexity, Google AI Overviews, and Copilot all cite differently. We see patterns. ChatGPT pulls heavily from Reddit. Perplexity over-indexes on LinkedIn. Google AI Overviews weight YouTube heavily. But the platform-specific differences aren’t stable enough to treat as ranking factors — they shift with new model versions and updated retrieval systems.

Why the universal-ranking content is wrong

When you see someone publish “the 7 factors that determine LLM citations, ranked,” ask yourself what they’re actually measuring. Usually, one of three things is going on:

They’re inferring from output. They prompted an LLM, got a list of cited sources, and reverse-engineered what those sources had in common. This is useful as pattern detection. It’s not a ranking. The same prompt next month would produce a different list.

They’re describing one buyer’s context. They tested with prompts that match their own usage patterns and generalized. The ranking is true for that buyer in that context. It’s not universal.

They’re describing one platform. They tested heavily on ChatGPT (because that’s what most people test on) and extended the findings to “AI.” The ranking might be roughly right for ChatGPT and significantly wrong for Claude or Perplexity.

None of these methodologies are illegitimate. They just don’t support the confident “here are the universal ranking factors” claim that gets made on top of them.

How to act in the gap

This is the practical question. If we know less than the confident rankings suggest, how should a marketing leader make investment decisions?

Four principles we work from.

Optimize for the things that are stable across LLMs and time.
First-party data, off-site presence, specific verifiable claims with named sources, structural cleanliness on owned content. These are the things that work regardless of which model is running or which buyer is asking. Invest in these first. They compound.
Test in your actual buyer’s environment, not in a generic environment.
The signals that matter for your buyer in your category in their preferred AI tool right now are different from the universal rankings. Build the testing discipline. Run the same prompts your buyer would run, monthly, and track what changes.
Watch the trend, not the level.
Absolute citation counts and visibility scores are noisy. The trend over time — are you moving up or down in citation frequency across the prompts you care about — is meaningful. Make investment decisions based on the directional movement, not the absolute numbers.
Hedge across platforms.
Don’t optimize for ChatGPT alone. Don’t optimize for Google AI Overviews alone. The LLM market is still consolidating, and the brand that’s invisible to one tool is invisible to the share of buyers who use that tool. Build presence across the major surfaces, accepting that the optimal mix will keep shifting.

What we’re trying to figure out

Honest list of what we’re researching at GNW right now, because the uncertainty above isn’t something we’re content to leave permanently uncertain.

We’re tracking citation patterns across the major LLMs for a panel of B2B SaaS clients to map how citations shift month-over-month, and what content changes correlate with movement. We’re testing identical content across different buyer-context simulations to see how prior conversation context shifts what gets cited. We’re building a longer-term dataset of when brands enter and leave the cited set for high-value commercial queries, to start putting real numbers behind the path-dependency claim.

We’ll publish what we learn as we learn it. Some of it will validate the directional patterns I’ve described in this piece. Some of it will probably surprise us. That’s the honest version of doing this work — operating with the certainty we have, naming the uncertainty where it exists, and building the data to close the gap over time.

Frequently Asked Questions

What is the most important factor for AI citations?

While no single factor is universal, original first-party research and specific, data-backed claims consistently see higher citation rates across all major LLMs.

Does SEO still matter for AI search?

Yes, but the focus shifts to “structural cleanliness.” Technical SEO elements like H2 structure and Schema markup don’t necessarily rank you higher, but they make it possible for the LLM to extract and cite your data accurately.

Why does ChatGPT cite Reddit so often?

Many LLMs use high-engagement social platforms like Reddit to gauge “human” consensus and real-world sentiment, which is why a strong off-site presence is critical for GEO.

GNW Consulting runs free competitive GEO audits that include the specific patterns we see in your category — which platforms are citing your brand, which are citing competitors instead, and what content changes correlate with movement. Request your audit here. 5-7 days, no commitment, no sales call unless you want one.

Related reading: What Generative Engine Optimization Actually Is — the canonical definition piece. Why GEO has to start with the buyer — on buyer-prompt patterns and the methodology we use with every client.

AUTHOR

Andrea Lechner- Becker

Chief Strategy Officer at GNW Consulting

Hard problems are Andrea’s favorite to solve. She believes solving big problems requires a forensic approach. Through systematic and scientific methods, all problems can be solutioned.

Packages & Offerings

The GNW Difference

What We Actually Know About LLM Citation Signals — And What We Don’t

What we actually know

Authority and Recency Weights

The Power of Specificity

Structural Cleanliness

Off-Site Ecosystems

First-Party Data

What we don’t know

The exact weighting of different signal types.

How signal weighting changes by buyer context.

How much retrievals shift over time.

The displacement cost.

The interaction between different signal types.

How different LLMs differ specifically.