SEO

SEO

SEO

Why AI Brand Recommendations Change With Every Query: Research Analysis and Strategic Implications

February 2, 2026

Summarize this article with

Summarize this article with

Table of Contents

Don’t Just Read About SEO & GEO Experience The Future.

Don’t Just Read About SEO & GEO Experience The Future.

Join 500+ brands growing with Passionfruit! 

Groundbreaking research reveals that AI tools like ChatGPT, Claude, and Google AI produce different brand recommendation lists more than 99% of the time. Understanding what this means for AI visibility strategy separates informed investment from wasted budget.

Organizations are investing substantial resources into tracking brand visibility across AI search platforms. Conservative estimates suggest over $100 million annually flows into AI tracking tools and services, with the broader AI SEO software market projected to reach nearly $5 billion by 2033. Yet until recently, no public research addressed a fundamental question: are AI recommendation lists consistent enough to make tracking meaningful?

New research from SparkToro, combined with supporting data from SE Ranking and Ahrefs, reveals that AI tools produce dramatically inconsistent results. These findings have profound implications for how organizations should approach AI visibility measurement, budget allocation, and strategic planning.

This analysis synthesizes findings from multiple research sources to provide a comprehensive view of AI recommendation behavior and its strategic implications for marketing leaders.

What Does the Research Reveal About AI Recommendation Consistency?

The SparkToro Methodology

SparkToro co-founder Rand Fishkin partnered with Patrick O'Donnell from Gumshoe.ai to conduct the most comprehensive public study of AI recommendation consistency. The research involved 600 volunteers running 12 different prompts through ChatGPT, Claude, and Google AI (AI Overviews and AI Mode) a combined 2,961 times over November and December 2025.

The prompts covered diverse categories including chef's knives, headphones, digital marketing consultants, cancer care hospitals, science fiction novels, and cloud computing providers. Each prompt ran 60-100 times per platform to generate statistically meaningful sample sizes.

The Core Finding: Near-Total Variability

The research produced a striking conclusion: AI tools almost never produce the same recommendation list twice. When asking ChatGPT or Google AI for brand recommendations 100 times, there is less than a 1-in-100 chance of receiving the same list in any two responses.

The variability extends beyond list composition to ordering. The probability of receiving the same list in the same order drops below 1-in-1,000 across all tested platforms. Claude showed marginally higher consistency in producing identical lists but was actually less likely to produce identical ordering.

Every response varied in three dimensions simultaneously: the brands included, the order of recommendations, and the number of items returned. Some responses contained as few as two or three recommendations while others listed ten or more.

Supporting Research: URL-Level Inconsistency

SE Ranking's research analyzing 10,000 keywords through Google AI Mode produced aligned findings at the URL level. Only 9.2% of cited URLs remained consistent when running the same query three times on the same day. In 21.2% of cases, there was zero URL overlap between response sets.

Domain-level consistency performed slightly better at 14.7%, indicating that while AI Mode frequently cites different specific pages, it may return to the same domains. However, in 29.4% of queries, not even a single domain repeated across all three response versions.

Ahrefs' analysis of 730,000 query pairs revealed that Google's AI Mode and AI Overviews cite the same URLs only 13.7% of the time, despite reaching semantically similar conclusions 86% of the time. The systems agree on what to say but disagree on which sources to cite.

Why Does This Variability Occur?

The Probabilistic Nature of Large Language Models

AI systems are fundamentally probability engines designed to generate unique responses. Each output represents a probabilistic selection from potential answers weighted by training data, context, and model architecture. Expecting consistent, deterministic outputs misunderstands how these systems operate.

The research confirms this architectural reality. As Fishkin noted, thinking of AI tools as sources of truth or consistency is "provably nonsensical." They are designed to produce varied responses, making the observed inconsistency an expected feature rather than a bug.

Category Breadth Affects Consistency Patterns

The research revealed that response variability correlates strongly with category breadth. In narrow categories with limited options, like Los Angeles Volvo dealerships or SaaS cloud computing providers, top brands appeared in most responses with relatively high pairwise correlation.

In broader categories like science fiction novels or brand design agencies, the results scattered dramatically. The AI systems simply have more options to choose from, producing wider distribution across potential recommendations.

This pattern explains why some organizations report more consistent AI visibility than others. Businesses operating in concentrated markets with few competitors may see more stable mention patterns than those in fragmented industries with many alternatives.

Prompt Diversity Compounds Variability

Beyond model-level inconsistency, the research examined how real humans craft prompts for identical intents. Across 142 responses to the same underlying question about headphone recommendations, barely two prompts showed any similarity. The semantic similarity score averaged just 0.081, meaning prompts were as different as "Kung Pao Chicken and Peanut Butter" in the researchers' analogy.

This prompt diversity creates additional measurement challenges. Even if AI responses were perfectly consistent for identical prompts, the infinite variation in how humans phrase questions introduces another layer of unpredictability in real-world visibility scenarios.

What Does This Mean for AI Visibility Tracking?

Ranking Position Metrics Are Meaningless

The research delivers a definitive verdict on one common AI tracking metric: ranking position in AI responses lacks validity. Any tool claiming to track where your brand ranks in AI recommendation lists is providing essentially random data points that shift with every query.

Fishkin stated this conclusion directly: "any tool that gives a 'ranking position in AI' is full of baloney." The underlying variability makes position tracking statistically meaningless regardless of how sophisticated the tracking methodology appears.

Organizations currently paying for rank-position metrics in AI responses should reconsider that investment. The data cannot support strategic decision-making because it reflects momentary snapshots of probabilistic outputs rather than stable competitive positioning.

Visibility Percentage Emerges as a Defensible Metric

Despite the discouraging findings on consistency, the research identified one metric that survives statistical scrutiny: visibility percentage across many runs of similar prompts.

When tracking how often a brand appears across dozens or hundreds of prompt runs, patterns emerge. In the headphone category, brands like Bose, Sony, Sennheiser, and Apple appeared in 55-77% of responses regardless of dramatic prompt variation. The AI systems captured underlying intent and returned answers from a relatively consistent consideration set.

This finding validates visibility tracking approaches that aggregate results across large sample sizes. If a brand appears in 85% of relevant AI responses versus a competitor appearing in 40%, that difference reflects meaningful positioning within the AI's consideration set, even though individual responses vary unpredictably.

Understanding what AI search is and how it's reshaping SEO provides essential context for interpreting these findings. The shift from deterministic ranking to probabilistic visibility fundamentally changes how organizations should measure and optimize for AI-driven discovery.

Statistical Significance Requires Scale

The research implies minimum thresholds for meaningful AI visibility measurement. Running a prompt once or twice produces random noise rather than actionable data. Meaningful visibility tracking requires running prompts 60-100 times to achieve statistical stability.

This requirement has direct implications for tracking tool evaluation. Solutions that sample infrequently or report results from limited query runs cannot provide reliable visibility metrics. Organizations should verify that their tracking providers run sufficient query volume to produce statistically significant results.

Some leading tools now run each prompt five times minimum before reporting, with more sophisticated platforms sampling hundreds of prompt variations to capture the full distribution of potential responses.

How Should Organizations Approach AI Visibility Investment?

Evaluate Tracking Providers Critically

The research exposes a significant buyer-beware situation in the AI tracking market. Organizations spending thousands or millions on AI visibility tools should demand transparent methodology documentation before committing budget.

Key questions to ask tracking providers include how many times they run each prompt before reporting results, whether they publish research validating their methodology, how they handle the documented variability in AI responses, and whether they report ranking positions (which the research shows are meaningless).

Providers unable or unwilling to answer these questions transparently may be selling metrics that lack statistical validity. The research suggests many current AI tracking products provide data that cannot support strategic decision-making.

Focus on Visibility Trends Rather Than Snapshots

Given documented variability, single-point-in-time visibility measurements have limited value. Strategic insight comes from tracking visibility trends over time across large sample sizes.

If your brand's visibility percentage increases from 35% to 55% across relevant prompts over several months, that trend reflects genuine improvement in AI consideration set positioning. The individual data points remain noisy, but the directional signal emerges from aggregated measurement.

This approach mirrors how the research validated visibility percentage as a meaningful metric. Individual responses varied dramatically, but aggregate patterns revealed which brands the AI systems consistently associated with specific topics and intents.

Connect AI Visibility to Business Outcomes

Visibility percentage only matters if it connects to business results. Organizations should establish measurement frameworks linking AI visibility to downstream metrics like branded search volume, direct traffic, and conversion events.

Recent research from Ahrefs found that AI search visitors convert 23 times better than traditional search visitors, despite representing less than 1% of total traffic. Understanding how AI search traffic compares to traditional clicks helps contextualize the business value of AI visibility improvements.

This conversion advantage suggests AI visibility may warrant investment despite the measurement challenges. Users who click through from AI recommendations arrive further along the decision journey and demonstrate higher purchase intent.

Maintain Traditional SEO Investment

The research confirms that AI systems rely heavily on traditional search rankings for source selection. SE Ranking found only 14% URL overlap between Google AI Mode citations and top 10 organic results, but the majority of cited sources still come from well-ranked pages.

Ahrefs research shows 76% of AI Overview citations pull from pages ranking in the top 10 organic positions. This means traditional SEO remains the foundation for AI visibility, not a parallel concern that can be deprioritized.

Understanding the four pillars of SEO ensures your AI visibility strategy builds on proven fundamentals rather than chasing emerging channels while neglecting established foundations.

What Are the Implications for Content Strategy?

Comprehensive Content Supports AI Visibility

While the research focused on measurement validity rather than optimization tactics, findings align with broader AI search research about content characteristics that support visibility.

Google's SAGE research on training AI agents for deep search tasks found that comprehensive content allowing agents to find complete answers without additional searching outperforms fragmented content. This suggests that content consolidation strategies supporting traditional SEO may also benefit AI visibility.

Reviewing generative engine optimization strategies provides tactical guidance for structuring content to support AI citation and recommendation patterns.

Brand Authority Influences Consideration Set Inclusion

The research revealed that top brands in each category appeared consistently despite response variability. Bose, Sony, and Apple showed up in 55-77% of headphone-related responses regardless of prompt phrasing. This pattern suggests that established brand authority influences AI consideration set inclusion.

Building brand authority through the signals that influence AI systems, including media coverage, authoritative citations, and consistent brand information across digital touchpoints, may improve visibility percentage over time. Understanding what E-E-A-T means in SEO provides context for the expertise, experience, authoritativeness, and trustworthiness signals that likely influence AI recommendation patterns.

Topic Association Matters More Than Query Matching

The research demonstrated that highly varied prompts with low semantic similarity still produced relatively consistent brand consideration sets. AI systems captured underlying intent and returned brands associated with relevant topics rather than matching specific query language.

This finding suggests optimizing for topic authority rather than specific query variations. Becoming the definitive resource on a topic may improve visibility across the infinite prompt variations users might employ to explore that topic.

What Questions Remain Unanswered?

Sample Size Requirements Need Further Research

The SparkToro research suggests 60-100 prompt runs produce meaningful visibility data, but optimal sample sizes for different category types and business contexts remain undefined. Narrow categories may require fewer runs while broad categories might need hundreds to achieve stability.

API Versus Interface Behavior Differences

Most AI tracking tools rely on API calls rather than mimicking human interface usage. Early research suggests API results may differ from interface results, but the magnitude and implications of these differences require further investigation.

Longitudinal Stability of Visibility Patterns

The research captured visibility patterns over a two-month window. Whether visibility percentages remain stable over longer periods, and how quickly optimization efforts translate to visibility improvements, remains unclear.

Cross-Platform Visibility Correlation

The research showed significant differences between platforms, but systematic analysis of whether brands visible in ChatGPT also appear in Google AI or Perplexity could inform platform prioritization decisions.

Frequently Asked Questions

How consistent are AI brand recommendations?

AI tools like ChatGPT, Claude, and Google AI produce different brand recommendation lists more than 99% of the time when asked the same question repeatedly. The probability of receiving identical lists in the same order drops below 0.1%. This variability is a fundamental characteristic of how large language models operate.

Is AI visibility tracking worthwhile given this inconsistency?

Visibility tracking can be meaningful when measuring visibility percentage across large sample sizes (60-100+ prompt runs) rather than tracking ranking positions. Aggregate patterns reveal which brands AI systems consistently associate with specific topics, even though individual responses vary unpredictably.

What AI visibility metrics should organizations track?

Focus on visibility percentage (how often your brand appears across many prompts) rather than ranking position (where your brand appears in individual responses). Ranking position lacks statistical validity given documented variability. Connect visibility metrics to business outcomes like branded search volume and conversion events.

How should organizations evaluate AI tracking tools?

Ask providers how many times they run each prompt before reporting, whether they publish methodology research, and whether they report ranking positions (which research shows are meaningless). Providers unable to answer these questions may be selling metrics that lack validity.

Does traditional SEO still matter for AI visibility?

Yes. Research shows 76% of AI Overview citations come from pages ranking in the top 10 organic positions. AI systems rely heavily on traditional search rankings for source selection, making SEO the foundation for AI visibility rather than a separate concern.

Why do AI systems produce such varied recommendations?

Large language models are probability engines designed to generate unique responses. Each output represents probabilistic selection weighted by training data and context. Expecting consistent, deterministic outputs misunderstands how these systems fundamentally operate.

Strategic Conclusions

The research on AI recommendation consistency delivers several clear strategic implications for organizations investing in AI visibility.

First, abandon ranking-position tracking. Any metric claiming to show where your brand ranks in AI responses lacks statistical validity. The documented variability makes position tracking meaningless regardless of methodology sophistication.

Second, embrace visibility percentage measured at scale. Tracking how often your brand appears across many prompt runs reveals genuine consideration set positioning even though individual responses vary unpredictably. Ensure tracking providers run sufficient query volume for statistical significance.

Third, maintain traditional SEO investment. AI systems rely on traditional search rankings for source selection. Organizations that deprioritize SEO while chasing AI visibility risk losing visibility in both channels.

Fourth, connect AI visibility to business outcomes. Visibility percentage only matters if it correlates with meaningful metrics. Establish measurement frameworks linking AI visibility trends to branded search, traffic, and conversion events.

Fifth, approach the AI tracking market skeptically. Demand methodology transparency before committing budget. Many providers may be selling metrics that cannot support strategic decision-making given documented variability levels.

Understanding how important SEO is for businesses in the current landscape provides context for balancing traditional and emerging visibility investments. The research suggests traditional SEO remains foundational while AI visibility represents an emerging layer requiring careful measurement and strategic patience.

Ready to develop an AI visibility strategy grounded in research rather than vendor hype? Get started with a consultation to learn how revenue-focused SEO builds durable visibility across traditional and AI search channels.

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes

Read More
Read More

The latest handpicked blog articles

Grow with Passion.

Create a systematic, data backed, AI ready growth engine.

Grow with Passion.

Create a systematic, data backed, AI ready growth engine.

Grow with Passion.

Create a systematic, data backed, AI ready growth engine.