Want a self serve tool to track AI Visibility? Checkout Passionfruit Labs

Learn More

Want a self serve tool to track AI Visibility? Checkout Passionfruit Labs

Learn More

Want a self serve tool to track AI Visibility? Checkout Passionfruit Labs

Learn More

SEO

How to Audit Brand Visibility in ChatGPT, Perplexity, and Gemini

How to Audit Brand Visibility in ChatGPT, Perplexity, and Gemini

How to Audit Brand Visibility in ChatGPT, Perplexity, and Gemini

Summarize this article with

Summarize this article with

Table of Contents

Don’t Just Read About SEO & GEO Experience The Future.

Don’t Just Read About SEO & GEO Experience The Future.

Join 500+ brands growing with Passionfruit! 

The complete methodology. Build a balanced 30-prompt test set, run it across four AI platforms, score responses against four metrics, and turn the output into a prioritized 30-60-90 day fix plan. Research-backed, tool-agnostic, built for in-house teams and agencies.

Most marketing leaders already suspect their brand is invisible in AI search. The problem is proving it. Running ten queries in ChatGPT and eyeballing the results is not an audit. Rather, a demo. Real AEO decisions require measurement, and measurement requires a methodology strict enough that the score you produce this month is comparable to the score you produce next month. Otherwise lift is unprovable and budget is undefendable.

The stakes are already material. According to joint Forrester and 6sense 2025 research, 94% of B2B buyers now use large language models during their purchase journey. Gartner projects 90% of B2B buying will be AI-agent-intermediated by 2028, moving more than $15 trillion in spend through agent exchanges. On the consumer side, Adobe Analytics found AI-referred retail traffic grew 693.4% year over year during the 2025 holiday season, with AI referrals converting 31% higher than non-AI traffic and driving 254% more revenue per visit. And Pew Research’s 2025 study of 900 US users across 68,879 actual Google searches found that click-through rates drop from 15% to 8% when an AI summary appears. The distribution is shifting to AI platforms faster than most marketing plans are measuring. (Our primer on SEO vs GEO vs AEO covers where each discipline starts and ends, and whether AI search referrals are the new clicks breaks down the channel economics.)

The guide below is the methodology we run at Passionfruit across every client audit. You can execute it manually, with any toolset, or through Passionfruit Labs. The framework is the same. Work through it once and you have a baseline. Work through it monthly and you have a trend line.

Why a Single-Platform Audit Is Not an Audit

The first thing most teams get wrong is treating “run some ChatGPT queries” as an audit. ChatGPT, Perplexity, Gemini, and Google AI Overviews retrieve content differently, cite different source types, and reward different content structures. A content strategy that wins in Perplexity will often lose in ChatGPT, and vice versa. Auditing one platform leaves 75% of your exposure uncovered.

The academic research confirms the fragmentation. A 2024 paper from researchers at Princeton University, Georgia Tech, the Allen Institute for AI, and IIT Delhi (Aggarwal et al., published at ACM SIGKDD 2024) introduced the GEO framework and the GEO-bench benchmark of 10,000 queries. Rigorous evaluation showed that visibility-improvement strategies produced up to 40% lift in generative engine responses, but the effectiveness of each strategy varied significantly across domains. A follow-up arXiv paper from July 2025 analyzed 366,000 citations across 65,000 AI responses and found dramatic differences in which sources each platform cites. Research on citation consistency, including the SourceCheckup framework published in Nature Communications in April 2025 (Wu et al.), found 88.7% agreement between automated LLM citation evaluation and medical expert consensus across 7 different LLM models, but substantial variance in how different LLMs handled the same claims.

The takeaway for audit design: your methodology must cover ChatGPT, Perplexity, Gemini, and Google AI Overviews as separate surfaces. Each needs its own score. The aggregate is only meaningful if the components are tracked. For the broader optimization playbook across these platforms, our GEO guide for ChatGPT, Perplexity, Gemini, Claude, and Copilot covers the platform-specific content and schema patterns.

The 4-Metric Scoring Framework

Every response in your audit set gets scored on four metrics. Keep the rubric tight and consistent. The goal is measurable lift month over month, not a one-time report.


Metric

Definition

Why it matters

Scoring

Citation presence

Does the AI platform mention or cite your brand in the response?

The most direct visibility measure. Zero presence = zero influence.

Binary: Yes (1) / No (0)

Position

Where in the response is your brand? Primary answer, secondary mention, or footnote-level citation?

Position drives attention. Primary mentions are far more influential than footnoted ones.

Primary (3) / Secondary (2) / Mentioned (1) / Absent (0)

Sentiment

Is your brand described positively, neutrally, negatively, or factually?

A mention without positive framing can actually hurt. A “not recommended” is worse than absence.

Positive (2) / Neutral (1) / Negative (-1) / Factual (1)

Share of voice

Of the brands cited in the response, what percentage are yours vs competitors?

Tracks relative dominance. A lone mention in a response citing 8 competitors is weak.

% of total brand mentions

Four metrics, four columns in your scoring sheet. That is the audit. Do not add more. Teams that try to track 15 metrics collapse into inconsistency by prompt 30.

Step 1: Build Your 30-Prompt Balanced Test Set

The 30-prompt test set is the foundation of every rigorous audit. Structure it in three categories of 10 prompts each.

10 brand-direct prompts

Queries that explicitly name your brand, measuring how the AI platform characterizes you when asked. Examples:

  • “What does [Brand] do?”

  • “Is [Brand] good for [ICP]?”

  • “What are [Brand]'s pricing tiers?”

  • “What do people say about [Brand]?”

  • “[Brand] alternatives”

  • “Is [Brand] better than [top competitor]?”

  • “Reviews of [Brand]”

  • “[Brand] integrations / features / use cases”

  • “Who should use [Brand]?”

  • “[Brand] for [specific vertical]”

10 category-level prompts

Queries where you would want to appear but your brand is not named, measuring organic citation frequency. Examples:

  • “Best [category] tools for [ICP] in 2026”

  • “Top [category] vendors”

  • “[Category] software comparison”

  • “Enterprise [category] platforms”

  • “Open-source alternatives to [dominant vendor]”

  • “What is the best [category] for a small team?”

  • “Most affordable [category] solution”

  • “[Category] tools with [key feature]”

  • “Which [category] integrates with [adjacent tool]?”

  • “[Category] software for [use case]”

10 scenario-based prompts

Queries that describe a buyer situation and ask for a recommendation, measuring how AI systems translate outcomes into vendor suggestions. Examples:

  • “I need to [goal] at [company size], which vendor should I use?”

  • “Our [existing tool] is too expensive, what should we switch to?”

  • “I am evaluating [top 3 competitors], what are the pros and cons?”

  • “We need [specific use case], what is the best tool?”

  • “Our team is [size] and we want [outcome], which [category] fits?”

  • “What is the easiest [category] to implement?”

  • “Best [category] for a technical team vs a non-technical team”

  • “Which [category] has the best customer support?”

  • “[Category] with the lowest total cost of ownership”

  • “If I am migrating from [old vendor], which [category] is the closest fit?”

Phrase every prompt in natural buyer language. Do not keyword-stuff. The AI platforms reward natural conversational queries because that is what users actually type.

Step 2: Run the Audit Across 4 Platforms

Run each of the 30 prompts across ChatGPT, Perplexity, Gemini, and Google AI Overviews. For each response, capture:

  • Full response text

  • Whether your brand is mentioned (Yes/No)

  • Position (primary/secondary/mentioned/absent)

  • Sentiment classification

  • All competitor brands cited in the response

  • All external sources/URLs the platform cited

That produces 120 data points (30 prompts × 4 platforms). Two to four hours of focused work if you do it manually. Running this through a tool like Passionfruit Labs collapses it to minutes since the prompts run across every platform automatically and the scoring populates the four metrics directly, but the underlying methodology is the same whether you run it by hand or through a tool.

Platform-specific capture notes

Each AI platform has quirks that affect how you capture responses:

ChatGPT. Use the default model with web browsing enabled for category-level and scenario-based prompts. For brand-direct prompts, also capture the response without browsing to see what the model’s training data alone surfaces about your brand. ChatGPT often retrieves via Bing’s index when browsing is active, so your Bing presence matters.

Perplexity. Capture responses from the default model. Perplexity shows inline citations natively, so record every source URL. According to a July 2025 arXiv paper analyzing 366,000 AI citations, Perplexity tends to cite more sources per response than ChatGPT, so share of voice calculations often look different here.

Gemini. Use Google’s Gemini consumer app. Capture both the primary answer text and any “source” or “related” panel. Gemini inherits heavily from Google’s index and Knowledge Graph, so entity signals matter disproportionately here.

Google AI Overviews. Run each query in a standard Google search (signed out, incognito, or via an AI Overviews testing tool). Capture the AI Overview text, the “show more” expansion, and the cited source panel. According to Pew Research’s 2025 study of 900 users and 68,879 searches, only 1% of users click a link inside the AI Overview itself, so presence in the Overview matters but the click-through economics are different.

Timing and repeatability

AI platforms produce non-deterministic responses. The same prompt can produce different answers minutes apart. Two defenses:

  1. Run each prompt three times per platform and record all three responses. Score the median response. That three-run minimum matches the academic literature and what the Princeton GEO paper used (five generations per query to reduce statistical noise).

  2. Standardize the timing. Always run audits on the same day of the week, at roughly the same time, from the same geography. Temporal drift compounds with model-update drift if you do not.

Step 3: Classify Your Visibility Pattern

Once the responses are captured and scored, the aggregate patterns usually fall into one of three buckets. Classify each platform separately.

Pattern 1: Invisible

Your brand appears in fewer than 10% of responses across the 30-prompt set. You are not in the consideration set. Most enterprise teams discover they are invisible on at least one of the four platforms when they run their first audit.

Fixes for invisible:

  • Establish citation-worthy content on your top 10 category-level prompts

  • Get mentioned on third-party sources the platform trusts (industry review sites, directories, Wikipedia if applicable)

  • Strengthen entity signals: Organization schema with sameAs links, Wikipedia presence, Crunchbase, LinkedIn, verified accounts (our AI-friendly schema markup guide covers the exact templates)

  • Audit crawler accessibility; if your site is a client-side rendered SPA, AI crawlers may not be able to read it at all (our JavaScript rendering and AI crawlers guide covers the diagnosis)

Pattern 2: Underranked

Your brand appears in 10-40% of responses, but behind competitors, with lower share of voice, or in secondary/footnote positions. You exist in the consideration set but lose the primary-answer slot.

Fixes for underranked:

  • Rewrite the pages AI platforms do cite to lift them into primary-answer position (direct-answer intros in 40-60 words, FAQ schema, citation density)

  • Close the content gaps where competitors get cited and you do not have the equivalent piece

  • Improve the technical foundation (crawler access, structured data, rendering); our AI search readiness audit checklist covers the full technical diagnostic

Pattern 3: Misrepresented

Your brand appears in 40%+ of responses but with wrong positioning, outdated information, negative sentiment, or mischaracterization (e.g., described as a competitor of someone you do not compete with, or missing a core capability you have). Misrepresentation is the most expensive visibility pattern because the buyer is getting bad information about you that they will not bother to verify.

Fixes for misrepresented:

  • Audit the first-party pages AI platforms are likely retrieving from and rewrite them for factual precision

  • Correct third-party sources where possible (reach out to review sites, update listings)

  • Publish a definitive “about / what we do” page that matches the query language buyers actually use

  • Ensure FAQ schema content directly addresses the misrepresented aspects

  • Check that canonical and deduplication signals are not routing AI platforms to an outdated version of the page; our guide on canonical tags and AI search covers the failure modes

Most brands have different patterns on different platforms. Invisible in Perplexity but underranked in ChatGPT, or misrepresented in Google AI Overviews because the AI is pulling from a five-year-old review page. The fix list is platform-specific.

Step 4: Build the Prioritized Fix List

The temptation after a baseline audit is to try to fix everything. Do not. Every enterprise team that tries to close all visibility gaps in one quarter ships zero of them well. Prioritize using the 2×2 below.

Impact × Effort prioritization

For every identified fix, score:

Impact: How many prompts in the 30-set would this fix affect? High (6+), Medium (3-5), Low (1-2).

Effort: How much work is this? High (new pillar content, 20+ hours), Medium (rewrite with schema, 5-10 hours), Low (schema addition, metadata fix, 1-3 hours).

Prioritize in this order:

  1. High impact / Low effort: do these first, within 7 days

  2. High impact / Medium effort: month 1

  3. Medium impact / Low effort: month 1 parallel

  4. High impact / High effort: month 2-3, usually new pillar pieces

  5. Medium impact / Medium effort: month 2-3 parallel

  6. Everything else: monitor only, do not invest until baseline is proven

Most baseline audits produce a top-5 list that can be shipped in 30 days. Ship it. Re-audit. Measure lift. Move to the next tier.

Fix types that work (what the research shows)

The Aggarwal et al. 2024 GEO paper tested nine content-optimization strategies across the GEO-bench. The methods that produced the largest visibility lift (up to 40% improvement on diverse queries, with Perplexity-specific gains of up to 37%) included:

  • Citation addition: adding citations from credible external sources throughout the content

  • Quotation addition: embedding relevant quotations from authoritative sources

  • Statistics addition: adding specific numerical data with source attribution

  • Fluency optimization: improving readability and clarity

  • Technical term usage: adding domain-appropriate precision

Traditional SEO techniques like keyword stuffing performed poorly. The academic finding matches what we see in field audits: AI platforms reward content that reads like expert reference material, with embedded evidence. The formula is not radical. The discipline of actually doing it at scale is where most teams fail.

For the full content framework, our SEO principles guide covers the foundation, and our guide to writing SEO-optimized articles walks through the AI-optimized version.

Step 5: Set the Monthly Re-Audit Cadence

A baseline audit without a re-audit plan is a report, not a system. Commit to monthly re-audits from day one.

What to re-measure every month

  • The same 30-prompt set (do not rotate prompts mid-quarter; rotate only after the full quarter baseline is established)

  • The same four metrics

  • Month-over-month delta on each metric, per platform

  • New competitors appearing in the responses (someone who was not cited last month but is cited this month)

  • New source types being cited (if Perplexity suddenly starts citing a Reddit thread, that is a signal)

What to look for in the trend line

  • Upward trajectory on citation presence and position: your fixes are working

  • Flat trajectory after 2-3 months of fixes: your fixes are not the right fixes, rethink before investing more

  • Downward trajectory: either a competitor made a move, a platform changed its retrieval model, or your existing visibility content decayed

Cadence for smaller teams

Monthly is the gold standard. Quarterly is the floor. Below quarterly and you cannot separate platform drift from your own changes. AI platforms update their retrieval models and training data frequently enough that quarterly is the longest interval where cause-and-effect remains legible.

For the analytics side (tracking which AI platforms are driving referral traffic to your site), our guide on tracking AI chatbot traffic in GA4 walks through the regex and the UTM setup.

Step 6: Expand the Audit as Your Baseline Stabilizes

Once you have 2-3 months of stable monthly audits, the baseline framework can be extended without breaking comparability.

Geographic audits

Run the 30-prompt set from different geographies using VPN or testing tools. Brand visibility in AI platforms varies significantly by region. Most brands find they are measurably weaker in non-primary markets. Prioritize the markets that matter to your revenue plan. If you are running a multi-region site, CDN configuration can affect which regions AI crawlers actually reach; our guide on CDN configuration and AI crawler access covers the common failure modes.

Device and interface audits

Mobile versus desktop AI interfaces sometimes produce different responses. ChatGPT mobile, Perplexity mobile, and Google’s AI Overviews on mobile all warrant separate capture for any brand with meaningful mobile traffic. According to Adobe Digital Economy Index data from Q2 2025, AI-referred retail traffic in the US runs 82% desktop, but that is an aggregate. Your mix may be different.

Competitor-specific audits

Add a supplementary 15-prompt set focused specifically on your top 5 competitors. The supplementary set reveals queries where competitors dominate and you do not appear, which are the highest-leverage content gap candidates.

7 Audit Mistakes That Invalidate Your Results

1. Running the audit once and calling it done

A single-point audit is a report, not a system. The value compounds with the re-audit cadence. Teams that run one audit, write a deck, and never re-measure cap their own ROI.

2. Rotating prompts mid-quarter

If you change even 3 of 30 prompts in month 2, your month-to-month comparison is broken. Freeze the test set for a full quarter minimum. Rotate at quarter boundaries with clear documentation of what changed.

3. Ignoring response variance

Running each prompt once and scoring a single response over-weights random variation. The academic literature (Aggarwal et al. 2024) used five generations per query. Minimum discipline is three generations per prompt per platform, scoring the median.

4. Treating ChatGPT as representative of all AI platforms

ChatGPT dominates user-volume, but the July 2025 arXiv citation study of 366,000 AI citations found substantial variance across platforms. A strategy optimized only for ChatGPT leaves Perplexity, Gemini, and AI Overviews uncovered. Four-platform coverage is the minimum.

5. Skipping the sentiment column

Teams that track presence and position but skip sentiment miss the most painful category: the brand that is mentioned frequently but negatively or inaccurately. Addressing sentiment is often cheaper than building new visibility; first-party content corrections can move misrepresentation scores within 30-60 days.

6. Conflating brand visibility with SEO rankings

AI citations do not correlate neatly with Google rankings. The academic research and field audits both show that pages ranking 3-8 organically sometimes get cited more frequently than the page ranking 1, because citation algorithms reward different signals (direct-answer structure, citation density, entity clarity) than Google’s ranking algorithm. Build your fix list from the audit data, not from your Search Console rankings.

7. Auditing without a fix plan

The most expensive failure mode. Teams that run audits, identify gaps, and then wait three months for “content ops bandwidth” waste the entire audit. The audit is only as valuable as the 30-day fix plan it feeds. If your team cannot ship the top 5 fixes within the month, the audit is premature. For teams using Claude as the AEO production engine, our Claude for AEO guide walks through the 7-day system that collapses audit-to-fix time.

Frequently Asked Questions

How often should we re-run the full audit?

Monthly is ideal. Quarterly is the floor. AI platforms update retrieval models and training data frequently enough that anything less than quarterly makes cause-and-effect unreadable.

Do we need a tool, or can we do this manually?

Manual is 2-4 hours of focused work for a full 30-prompt, 4-platform baseline. Perfectly doable for a single brand. Tools like Passionfruit Labs automate the run, handle the response capture, and populate the scoring rubric, which matters most when you are running audits across multiple clients or running the methodology monthly across a large enterprise. The methodology is the same either way.

How many prompts should we actually run?

30 is the minimum for a statistically useful baseline. Below 20, random variance dominates the signal. Above 60, the marginal insight per added prompt declines rapidly and the human scoring burden grows. 30 is the sweet spot for monthly re-auditing.

Which platforms matter most?

Depends on where your buyers research. For B2B SaaS audiences, Perplexity and ChatGPT dominate. For consumer research, ChatGPT and Google AI Overviews. For information-dense or technical categories, Claude and Perplexity produce disproportionate share. Four-platform coverage is the safe default until your data proves otherwise.

Can we benchmark our scores against competitors?

Yes, and you should. Run the same 30-prompt set and score competitor citation presence, position, and share of voice using the same rubric. Our AI visibility benchmarking guide walks through the competitor audit variant in detail.

Why are AI responses different every time?

LLM outputs are non-deterministic by design. The same prompt produces different responses across runs, users, and geographies. Research from 2025 documented that even with identical prompts, AI platforms produce meaningfully different recommendations. Our analysis of why AI brand recommendations change with every query covers the variance pattern and what it means for measurement.

Is the 30-prompt methodology biased toward branded queries?

Only if you under-weight category-level and scenario-based prompts. The 10/10/10 balance exists for exactly this reason. A brand-only test set would measure what AI says about you when asked, which matters far less than what AI says when your buyer asks about the problem you solve.

Your Next Move

Run the baseline audit this week. Manually, with Labs, or with any tool. The point is to have a defensible score by the end of the month that you can compare against in month two.

The marketing leaders winning the next 24 months are the ones treating AEO audits as standing infrastructure, not one-off consulting deliverables. Pew Research’s 2025 study found 34% of US adults and 58% of those under 30 have used ChatGPT. Cloudflare Radar’s 2025 Year in Review documented AI bot traffic (excluding Googlebot) averaging 4.2% of all HTML requests across Cloudflare’s network, with ClaudeBot and ChatGPT-User both growing at triple-digit rates through the year. Adobe’s data on AI-referred retail traffic shows 693.4% year-over-year growth during the 2025 holiday season, with AI referrals converting 31% higher than non-AI traffic. The distribution is shifting. The brands that establish baseline measurement now compound over the next 24 months. The brands that wait are still relying on ten-query demos.

If you want an expert set of eyes on where your brand actually shows up across AI search (and where it does not), get a free SEO and AEO audit from Passionfruit. We run the 30-prompt methodology across ChatGPT, Perplexity, Gemini, and Google’s AI Overviews, score against the four-metric rubric, and hand you a prioritized 30-60-90 day plan you can execute whether or not you work with us.

An AI visibility audit is not a report. Rather, a repeatable measurement discipline that compounds into competitive advantage over time.

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes.

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes.

grayscale photography of man smiling

Dewang Mishra

Content Writer

Senior Content Writer & Growth at Passionfruit, with a decade of blogging experience and YouTube SEO. I build narratives that behave like funnels. I’ve helped drive over 300 millions impressions and 300,000+ clicks for my clients across the board. Between deadlines, I collect miles, books, and poems (sequence: unpredictable). My newest obsession: prompting tiny spells for big outcomes.

Trusted by teams at high growth companies

Ready to win search?

End to End, managed experience to drive growth from Google and AI search

Passionfruit

Trusted by teams at high growth companies

Ready to win search?

End to End, managed experience to drive growth from Google and AI search

Passionfruit

Trusted by teams at high growth companies

Ready to win search?

End to End, managed experience to drive growth from Google and AI search

Passionfruit