AI Brand Visibility Metrics Are Mostly Noise. Here’s 1 Metric That Isn’t.
Jen Carroll
Whether you sell to businesses or consumers, the mechanism governing brand visibility in AI search is the same. Every credible mention of your brand name in proximity to the language your buyers use can become a data point in the training corpus for the next model version.
The Dames
The human brain doesn’t run on probabilities. It runs on stories.
In particular, we love protagonists who beat the odds, lovers who find each other across impossible distances, and underdogs who win the championship on the last play. Probability is what happens across 1,000 coin flips. Narrative is about THIS flip, right now, and why it’s going to change everything.
With humans, narrative wins almost every time because it’s far more emotionally satisfying.
So, you can imagine just how UNsatisfying it was to hear SparkToro’s Rand Fishkin say in his recent webinar, AI Visibility: What Actually Matters (And What Doesn’t), “These [AI assistants and generative search summaries] are probability engines: they’re designed to generate unique answers every time. Thinking of them as sources of truth or consistency is provably nonsensical.”
Flip.
New Research Confirms AI Brand Visibility Rankings Are Unreliable. Here's What the Data Actually Shows.
In January Fishkin and the teams at SparkToro and Gumshoe published some genuinely useful research on AI consistency when recommending brands or products. I highly encourage other marketers to read it.
In brief: AI tools almost never give the same list of brand recommendations twice, and they almost never give the same list in the same order. Fishkin recruited 600 volunteers to run identical prompts through ChatGPT, Claude, and Google AI hundreds of times each, then measured how often the outputs matched. The odds of getting the same list twice were less than 1 in 100. The odds of getting the same list in the same order were less than 1 in 1,000.
Not exactly the story we want to hear, I know.
Plus, as I watched and listened to the webinar replay, I noticed Fishkin repeatedly referring to “top brands.” I assume he meant the ones identified in the charts as most prominent in AI responses, but I don’t think he ever defined the term. Minor omission or deliberate shorthand? It turns out the answer matters quite a bit—for AI brand visibility strategy, for AI search visibility measurement, and for every marketer being asked to justify what they’re spending on AI brand tracking tools.
"Top Brand" in AI has three definitions. Only one holds up.
When Fishkin mentioned “top brand,” the data supported at least three distinct interpretations, and he used the phrase to mean all of them at different points.
The first was raw frequency: the brand that appeared most often across all responses to a given prompt. By this measure, Smartsites was the top brand for e-commerce marketing consultants; it appeared in 85 of 95 Google AI responses. City of Hope was the top brand for West Coast cancer care hospitals in ChatGPT; it appeared in 69 of 71 responses.
The second was visibility percentage: the frequency of appearance normalized across the prompt set. Fishkin argues this is the most statistically defensible of all AI visibility metrics—and the only one worth building strategy around.
The third was first-position frequency: the brand that was listed first most often. Fishkin’s charts track 1st, 2nd, and 3rd position appearances separately, and City of Hope was in 69 of 71 ChatGPT answers but was first on the list in only 25 of those. That gap matters, and Fishkin notes it. But the phrase “top brand” still floated between these definitions depending on context.
The ambiguity isn’t just academic. When “top brand” can mean three different things depending on which chart you’re looking at, AI visibility tools vendors can cherry-pick whichever definition makes their dashboard look most compelling—and most executives will never know the difference.
AI Search Rankings Are Noise. Here's the Proof.
The “top brand = most often listed first” definition falls apart the moment you accept Fishkin’s core finding: AI tools are probability engines. They are designed to generate unique outputs every time. The position a brand occupies in any given response is an artifact of token probability, not a judgment about quality, relevance, or authority.
Fishkin made this point explicitly. He argued that any tool claiming to track your AI search rankings is, in his words, full of baloney. We’ve been making a version of this argument since we wrote about the answer engine optimization gold rush—that an entire measurement industry has been built on approximations of approximations, and that ranking position in AI is noise dressed up as signal.
If first position is random, calling the brand that lands there most often a “top brand” imports an SEO logic that simply DOES NOT APPLY to a system that generates novel outputs every time. It’s like determining the best swimmer by which lane they happen to occupy at the starting block.
The one AI visibility metric with statistical integrity
That leaves frequency and visibility percentage. Raw frequency is directionally useful but sensitive to how many times a prompt was run and how the prompt set was constructed. Visibility percentage—how often a brand appears across a large, diverse set of prompts, normalized for sample size—is the most defensible of all AI visibility metrics Fishkin’s research produces.
And here’s what visibility percentage is actually measuring: how deeply embedded a brand is in the AI’s statistical model for a given topic space. Not whether the AI thinks the brand is best. Not whether the brand deserves to be recommended. Just how consistently the probability engine draws from it when constructing an answer about that category.
The honest framing, then, is that a “top brand” in AI doesn’t necessarily mean most authoritative, most trusted, or even most recommended. It means most probable.
What Actually Determines AI Brand Visibility
And why it predates AI entirely
Here’s where it gets interesting. If visibility percentage reflects how embedded a brand is in an AI’s probability distribution, the obvious next question is: what built that distribution in the first place?
For pure language models (the trained-prior-only systems underlying tools like base ChatGPT and Claude), the answer is clean. The model was trained on a corpus of text that existed before the model was deployed. Whatever brands had accumulated presence in that corpus before the training cutoff are the brands embedded in the model’s probability distribution. The model didn’t make them prominent. Their prior prominence in the written record made them statistically probable outputs. The brand was there first. Cause precedes effect cleanly.
City of Hope appeared in 69 of 71 ChatGPT responses about West Coast cancer care hospitals. That’s not AI search optimization. That’s decades of accumulated presence in every credible source that ever wrote about cancer care, including medical journals, news coverage, patient forums, industry associations, and health publications. AI didn’t make City of Hope prominent. Prominence made City of Hope probable.
Which means “most probable” and “most authoritative” are closer to the same thing than you might think. Not because AI is making quality judgments, but because the corpus it learned from was shaped by humans who were. The probability distribution inherited the authority judgments of everyone who ever wrote about that category. The brands most embedded in AI’s probability distribution probably got there by being genuinely authoritative in the first place.
Where it gets both circular and concerning is in the retrieval layer. Google AI Overviews, Perplexity, ChatGPT with web search enabled, and Claude in search mode all combine trained priors with live web retrieval. Web search capability doesn’t change a model’s underlying probability distribution because trained weights remain a fixed snapshot until the next training run. What changes is what the model can draw on in the moment of generating a response.
In retrieval mode, these systems pull live indexed content to inform their answers. AI brand mentions that get indexed today become content the retrieval layer can pull into tomorrow’s responses. Today’s AI-surfaced brand mention becomes tomorrow’s indexed content, which gets retrieved to support the next response. The AI feedback loop doesn’t run through the model weights. It runs through the retrievable web, and it runs faster than retraining cycles.
The mechanism compounds when you factor in that the humans doing the downstream writing are increasingly using AI to do it. A brand gets surfaced by AI. Someone uses AI to write about it. That content gets indexed. The retrieval layer pulls it into future responses. The signal and the noise become indistinguishable—laundered through human publishing decisions, but originating from the same probability engine that started the loop. AI may not just be reading the record of brand authority. In retrieval-based systems, it may be participating in writing it, leading to cannibalistic optimization, AI slop, and a retrievable web that increasingly reflects AI’s own conclusions back at itself rather than independent human judgment.
Why citations are not a reliable AI brand tracking strategy
This brings me to a mechanism Fishkin described in his webinar, which others have observed in these systems more broadly: the post-hoc architecture of AI responses.
In retrieval-augmented systems like Google AI Overviews, the model doesn’t search the web for the truth and then report it. It generates a probable answer based on its trained statistical patterns first. Then the retrieval system goes looking for documents that support or elaborate on what the model has already decided to say, and attaches them as citations.
The conclusion precedes the sourcing. The sourcing is selected to fit the conclusion.
This is why optimizing for citations as a primary AI brand tracking strategy misunderstands the architecture. A brand can appear in an AI response as a cited source even when the cited document isn’t what drove the inclusion. The trained prior drove the inclusion, and the citation was likely the closest available match in the retrievable web. Citations are outputs of the process, not inputs to it—like a confident person stating their position and then Googling for evidence afterward.
As with so much in life, this is both good and bad. Good because gaming citations is, as things stand right now, very difficult to do. Bad because it means the distribution was set (for the time being) before you started paying attention to AI brand visibility at all.
AI brand visibility works the same way for every business. Here's the mechanism.
Whether you sell to businesses or consumers, the mechanism governing brand visibility in AI search is the same. Every credible mention of your brand name in proximity to the language your buyers use—a trade publication feature or a consumer magazine review, an industry association reference or a Reddit thread, an architect’s blog or a food blogger’s post, a Wikipedia citation or a trusted retail platform—can become a data point in the training corpus for the next model version.
Diverse, credible co-occurrence of brand name and relevant language, across the sources your audience actually trusts, is what apparently moves the needle on trained priors. The retrieval layer then potentially amplifies that if the content is indexable and well-structured.
3 AI Search Optimization Mistakes Marketers Make + 3 Strategies to Build Brand Visibility
Don't track (or pay a third party to track) ranking position in AI responses.
Don't optimize for citations as a primary strategy.
Don't mistake a favorable screenshot for a trend.
Do pay attention to visibility percentage...
Do invest in activities that build the kind of presence that moves a probability distribution.
Examples: credible, resilient marketing through mentions in diverse sources, editorial recognition in your industry, consistent association of your brand with the language your audience uses when they’re looking for what you do. This is PR and SEO content and brand-building, described in terms of what it actually accomplishes in an AI-mediated world.
Do recognize that this work is slow, cumulative, and impossible to shortcut.
TL;DR
Rand Fishkin and the teams at SparkToro and Gumshoe published landmark research confirming that AI brand recommendations are unreliable as ranked outputs—the same list repeats fewer than 1 in 100 times, and in the same order fewer than 1 in 1,000 times. But in proving that AI search rankings are noise, the research opens a harder question: if AI recommendations are random draws from a probability distribution, what determined that distribution in the first place?
The answer is prior brand authority. The brands with the highest AI brand visibility didn’t earn it through AI search optimization. They earned it through decades of accumulated presence in credible sources—the kind that shaped the training corpus before the model was ever deployed. In that sense, “most probable” and “most authoritative” are closer to the same thing than most AI visibility tools dashboards suggest.
The one AI visibility metric with statistical integrity is visibility percentage: how often a brand appears across a large, diverse set of prompts, normalized for sample size. Everything else, including ranking position and citation count, is either noise or a post-hoc artifact of a conclusion the model had already reached.
For marketers, the strategic implication is clear: stop chasing AI search rankings, stop treating citations as a primary lever, and stop building AI marketing strategy around favorable screenshots. Start investing in the credible brand mentions, editorial recognition, and consistent language association that moves a probability distribution over time. That work is slow and impossible to shortcut, which is exactly why it creates the kind of brand visibility AI search rewards, and competitors can’t easily replicate.
Still trying to figure out what your AI brand visibility strategy should actually look like? We can help you sort the signal from the noise—and build the kind of credible, lasting brand presence that moves a probability distribution. Contact us.