SEO

Join 500+ brands growing with Passionfruit!
Keyword clustering in 2026 is no longer about grouping words that look similar. It is about mapping the entities, relationships, and semantic structures that AI search engines use to decide who gets cited. Here is the complete process for clustering keywords around entities instead of strings, with the specific steps, tools, and measurement framework that connect clustering to AI visibility.
Google's Knowledge Graph contained over 500 billion facts about 5 billion entities as of 2020, according to Google's Danny Sullivan. By 2024, independent analysis from Kalicube estimated this had grown to approximately 1.6 trillion facts on 54 billion entities (Search Engine Land, Knowledge Graph Guide). When Gemini generates an AI Overview, when ChatGPT pulls a citation, when Perplexity selects a source, they are not matching keywords. They are traversing entity relationships in knowledge graphs, evaluating which content demonstrates the deepest, most structured understanding of a topic.
This changes what keyword clustering means at a fundamental level. The old process (export keywords from Ahrefs, group by SERP similarity, assign to pages) still produces useful output. But it misses the layer that determines AI visibility: entity coverage, entity relationships, and entity density.
According to BrightEdge's one-year AI Overview analysis (February 2026), only about 17% of sources cited in AI Overviews also rank in the organic top 10. Roughly 5 out of 6 AIO citations pull from content that is not on page one of traditional results (BrightEdge, AI Overviews at the One-Year Mark). A separate Ahrefs study of 863,000 keyword SERPs found that only 38% of URLs cited in AI Overviews appeared within the first 10 result blocks (Ahrefs, February 2026). Pages that rank on page three for keywords can still get cited in AI answers if they provide clearer entity relationships than the pages ranking above them.
This guide gives you the complete process for clustering keywords around entities, not just strings. For how topic clusters work at the strategic level, see our complete topic cluster guide for SaaS and ecommerce. For the broader AI search optimization framework, see our GEO guide.
What Has Changed About Keyword Clustering in 2026
Traditional keyword clustering groups search terms by how similar they look (semantic/NLP clustering) or by how much their search results overlap (SERP clustering). Both methods still work for organizing content. But neither one tells you what AI search engines actually evaluate when deciding which content to cite.
AI search engines evaluate three things that traditional clustering ignores. Google itself described the Knowledge Graph vision as understanding "things, not strings" when it launched the feature in 2012 (Google Blog, Amit Singhal). That principle now drives how AI Overviews select sources. BrightEdge data shows AI Overviews trigger on approximately 48% of all tracked queries as of early 2026, up from 31% a year earlier, a 58% increase year over year (BrightEdge, AI Overviews One-Year Report).
Entity completeness: Does your content cover all the entities (people, concepts, products, processes) that a comprehensive answer requires? BrightEdge's research explains that AI Overviews use a "query fan-out" mechanism where one conversational query triggers multiple sub-queries, each pulling from different sources (BrightEdge, AI Search Insights). A page about "CRM software for small business" that mentions pricing models, integration partners, deployment types, and compliance frameworks covers more of those sub-query entities than one that just lists features. The AI cites the entity-complete page.
Entity relationships: Does your content demonstrate how entities connect to each other? Research on GEO applied to visual platforms found that "generative systems preferentially cite well-connected, contextually rich content surfaces over isolated pages: consolidated sources with clear semantic structure provide interpretable evidence chains that support answer generation" (GEO Visual Content Study, ResearchGate). A page that explains how "email deliverability" relates to "sender reputation," which relates to "SPF records," which relates to "domain authentication" shows the AI a chain of entity relationships. A page that mentions all four terms without connecting them does not.
Entity density: The Princeton/Georgia Tech GEO research paper (Aggarwal et al., published at ACM SIGKDD 2024) demonstrated that content optimized with statistics, citations, and structured evidence can boost visibility in generative engine responses by up to 40%. The researchers found that methods like "Statistics Addition" and "Cite Sources" achieved 30-40% improvement on position-adjusted metrics, while traditional keyword stuffing performed poorly (GEO: Generative Engine Optimization, arXiv:2311.09735). Content that gets cited by AI tends to have significantly higher entity density than average web writing: more named tools, specific metrics, real companies, and defined processes per paragraph.
The shift is this: traditional clustering asks "which keywords belong on the same page?" Entity-based clustering asks "which entities must this page cover, connect, and contextualize to be the definitive source on this topic?" The GEO research demonstrated this concretely: the "Cite Sources" method led to a 115.1% increase in visibility for websites ranked fifth in SERP, while the top-ranked website's visibility actually decreased by 30.3% on average (Aggarwal et al., KDD 2024). Entity-rich content from lower-ranked pages outperformed keyword-optimized content from higher-ranked ones.
For understanding how AI search engines retrieve and select sources, see our source gap analysis guide.
The 5-Step Entity-Based Keyword Clustering Process
Step 1: Extract keywords the traditional way (but broader than usual)
Start with conventional keyword research. Use Ahrefs Keywords Explorer, Semrush Keyword Magic Tool, or Google Search Console. But cast a wider net than you normally would.
What to pull:
Your primary seed keyword and all related terms (use "Also rank for" and "Related terms" reports in Ahrefs)
Questions from People Also Ask, AlsoAsked.com, and Reddit threads in your niche
Competitor keyword exports (top 500-1000 keywords from 3-5 competitors)
Zero-volume long-tail queries from customer support logs, sales call transcripts, and community forums
Why broader matters for entity clustering: Traditional clustering filters aggressively before grouping. Entity clustering keeps terms that seem tangential because they often contain the entities that make your clusters comprehensive. "CRM data migration compliance" has low volume but introduces the entities "data migration" and "compliance" that your CRM cluster needs for AI citation.
Export everything into a single spreadsheet with keyword, search volume, and intent columns.
Step 2: Run SERP-based clustering first (your structural foundation)
Use a SERP-based clustering tool (Keyword Insights, KeyClusters, or SE Ranking's clustering feature) to group your keywords by search result overlap. This tells you which keywords Google already treats as the same intent and can be targeted on the same page.
Settings that work for entity-based clustering:
Use a minimum SERP overlap of 3 out of 10 results (the default in most tools). This keeps clusters tight enough to target on one page without fragmenting too aggressively.
Include search intent classification if the tool offers it. Knowing whether a cluster is informational, commercial, or transactional determines the content format.
What you get: A list of keyword clusters, each containing 5-50 keywords that share SERP overlap. This is your structural foundation. Each cluster represents one page on your site.
But this is where most guides stop. SERP clustering tells you the page boundaries. It does not tell you what entities each page needs to cover to earn AI citations. That is Step 3.
Step 3: Extract entities from top-ranking content (the layer everyone skips)
This is the step that separates keyword-based clustering from entity-based clustering. For each of your top-priority clusters, you need to identify the specific entities that AI-cited content contains.
How to do it:
Take your top 3-5 clusters by business priority: You do not need to do this for every cluster. Start with the ones that drive revenue.
For each cluster, pull the top 5 ranking pages: Read them or use a tool like Surfer SEO, Clearscope, or MarketMuse to extract the entities and topics they cover.
Use Google's Natural Language API to measure entity salience: Paste each top-ranking page's content into the Google Cloud Natural Language API demo. It returns every entity it detects, categorized by type (Person, Organization, Event, Consumer Good, etc.) with a salience score indicating how central each entity is to the content. As Search Engine Land's entity-first optimization guide notes, pages with high salience scores "consistently mention the target entity in contextually rich ways, which helps Google confidently assign relevance" (Search Engine Land, Entity-First Content Optimization).
Build an entity map for each cluster: Create a spreadsheet with columns: Entity Name, Entity Type, Salience Score (averaged across top pages), Present in Your Content (yes/no).
Example entity map for a "CRM for small business" cluster:
Entity | Type | Avg. salience | In your content? |
|---|---|---|---|
HubSpot | Organization | 0.12 | Yes |
Salesforce | Organization | 0.09 | Yes |
Contact management | Concept | 0.08 | Yes |
Pipeline management | Concept | 0.07 | No |
Email integration | Concept | 0.06 | No |
Data migration | Process | 0.04 | No |
GDPR compliance | Regulation | 0.03 | No |
API access | Feature | 0.03 | No |
Per-user pricing | Pricing model | 0.05 | No |
Zapier | Organization | 0.03 | No |
The "No" column reveals your entity gaps. These are the specific concepts, tools, processes, and organizations that top-ranking (and AI-cited) content covers but your content does not.
Check AI citations directly: Run your cluster's primary query in ChatGPT, Perplexity, and Google AI Overviews. Note which sources get cited and what entities those cited pages contain that yours do not. Use Passionfruit Labs to track your AI citation rates across platforms.
Step 4: Restructure clusters around entity hubs (not just keyword groups)
Now you have two layers: SERP-based keyword clusters (Step 2) and entity maps (Step 3). The next step is to organize your clusters into an entity-based content architecture.
The entity hub model:
Instead of the traditional pillar-cluster model where a broad page links to narrow pages, entity-based clustering organizes content around a central entity and its relationships:
Hub page (pillar): Covers the central entity comprehensively. For a "CRM for small business" hub, this page covers the concept of CRM, the key decision criteria, and the primary entity relationships (pricing, features, integrations, compliance, migration).
Entity-deep pages (cluster): Each cluster page targets a specific entity relationship in depth. "CRM data migration for small business" covers the migration entity with full depth: process steps, common tools, compliance requirements, timeline expectations.
Entity-bridge pages: Pages that connect two entity clusters. "How CRM integrations affect email deliverability" bridges the CRM cluster and the email marketing cluster, creating a cross-cluster entity relationship that signals topical breadth.
The internal linking rule: Every internal link should connect entities, not just pages. The anchor text "CRM data migration" on your hub page links to your migration deep-dive. The anchor text "email deliverability" on your migration page links to your email cluster. This creates the entity relationship chain that AI crawlers traverse.
For how to implement internal linking at scale within topic clusters, see our SEO internal linking guide.
Step 5: Optimize each page for AI extractability
Entity-based clustering gets your content architecture right. But each individual page within the cluster must also be optimized for AI extraction.
The entity density checklist (apply to every page in your cluster):
Lead each section with a direct answer: After every H2, write a 40-word direct answer to the heading question before expanding. The GEO research paper found that "Fluency Optimization" and clear structure produced 15-30% visibility gains in generative engines, confirming that AI systems value extractable format alongside content depth (Aggarwal et al., KDD 2024).
Include specific entities every 150-200 words: Named tools, specific metrics, real companies, defined processes. Replace vague language ("many tools exist") with entity-rich language ("HubSpot, Pipedrive, and Zoho CRM each handle pipeline management differently").
Use entity-consistent terminology: If your hub page calls it "data migration," every cluster page uses "data migration." Not "data transfer," "moving data," or "switching CRMs" without first connecting these terms to the canonical entity.
Add structured data: Implement Article schema, FAQ schema, and HowTo schema on every cluster page. Add Organization schema on your hub page. According to analysis from Decode Growth, integrating schema markup increases AI citation chances by 30-40% because it formally introduces your entities and their relationships to Google's Knowledge Graph (Decode Growth, Entity SEO Guide). See our structured data guide for GEO.
Include comparison tables and decision frameworks: AI systems prefer structured, extractable content. A comparison table with named entities in rows and attribute entities in columns gives AI systems a clean data structure to cite.
SERP Clustering vs. Semantic Clustering vs. Entity Clustering: When to Use Each
Method | What it groups by | Best for | Limitation |
|---|---|---|---|
SERP clustering | Search result overlap | Determining page boundaries (what goes on one page vs. separate pages) | Does not reveal entity coverage gaps |
Semantic/NLP clustering | Word meaning similarity | Quick grouping when you have no SERP data, or for non-Google platforms | May group words that look similar but have different search intents |
Entity clustering | Entity relationships, salience, and density | Optimizing for AI citations, Knowledge Graph alignment, topical authority | Requires more manual analysis per cluster |
The recommended stack: Use SERP clustering for structural decisions (Step 2), then layer entity analysis on top (Step 3) for your highest-priority clusters. You do not need to run full entity extraction on every cluster. Focus on the 10-20 clusters that drive revenue.
How to Measure Entity Cluster Performance
Traditional metrics (keyword rankings, organic traffic) still matter. But entity-based clustering requires additional measurements to confirm whether your entity coverage is improving AI visibility.
Entity-level metrics to track:
Metric | What it measures | Tool |
|---|---|---|
AI citation rate per cluster | How often your pages are cited in AI answers for cluster queries | Passionfruit Labs, manual prompt testing |
Entity salience score | How prominently Google perceives your primary entity on each page | Google NLP API |
Knowledge Panel presence | Whether your brand triggers a Knowledge Panel for branded searches | Google Search (manual check) |
Topical Share of Voice | What percentage of your cluster keywords you rank in the top 10 for | Ahrefs, Semrush rank tracking |
Cross-cluster citations | Whether AI cites your content for queries in adjacent clusters | Manual prompt testing across ChatGPT, Perplexity, Gemini |
Entity gap closure rate | How many entity gaps from Step 3 you have closed over time | Your entity map spreadsheet |
Tracking cadence: Check AI citation rates weekly for your top 10 clusters. Run full entity gap analysis monthly. Reassess cluster architecture quarterly. BrightEdge's 16-month tracking study found that AI Overview citation overlap with organic rankings grew from 32.3% to 54.5%, but this varies wildly by industry: YMYL verticals (Healthcare, Insurance, Education) show 68-75% overlap, while ecommerce stayed nearly flat at 0.6 percentage points of change (BrightEdge, Rank Overlap After 16 Months of AIO). Your measurement strategy must account for your industry's specific convergence pattern.
For the full GA4 setup to track AI referral traffic from these citations, see our GA4 guide for AI traffic.
Common Mistakes in Entity-Based Keyword Clustering
Clustering by keyword similarity instead of entity relationships: "Best CRM software" and "CRM software reviews" look similar and may SERP-cluster together. But "Best CRM software" and "CRM data migration" share deeper entity relationships (both are part of the CRM purchase decision journey) even though they target different pages. Entity clustering connects these at the architecture level through internal links and entity bridges.
Ignoring entity consistency across cluster pages: If your hub page calls the entity "marketing automation" but your cluster page calls it "email campaign tools," you are fragmenting your entity signal. AI systems struggle to connect the two. Use one canonical entity name and introduce synonyms explicitly: "Marketing automation (also called email campaign automation)..."
Building clusters without entity depth: A cluster with 10 pages that each mention the same 5 entities is wide but shallow. AI systems prefer clusters where each page adds new entity relationships. Your hub page covers 15 entities at surface level. Each cluster page covers 2-3 of those entities at depth, introducing 5-10 additional entities per page.
Skipping structured data on cluster pages: Schema markup is not optional for entity-based clustering. It is the mechanism by which you formally declare your entities and their relationships to search engines. Pages without schema force AI systems to infer entities from unstructured text, which is less reliable and less likely to earn citations.
Never updating entity maps: New entities emerge constantly. "Generative engine optimization" did not exist as an entity 18 months ago. Quarterly entity audits ensure your clusters stay current and comprehensive.
Your Next Move
Pick one revenue-driving topic cluster on your site right now. Run the top 5 ranking pages for that cluster through Google's Natural Language API. Build the entity map from Step 3. Compare it against your current content. The entity gaps you find are your highest-leverage content improvements for AI visibility.
Then implement the entity-deep pages and entity-bridge pages from Step 4 to close those gaps. Track your AI citation rate before and after. The data will tell you whether entity-based clustering works for your specific niche.
If you need expert help building entity-based content clusters that earn AI citations, Passionfruit's team builds AI-native SEO strategies for SaaS and ecommerce brands. See our case studies for measurable results across both traditional search and AI platforms.
AI search engines do not rank pages. They cite entities. Cluster your content around the entities your audience needs, and you become the source AI cannot ignore.
Frequently Asked Questions
What is the difference between keyword clustering and entity clustering?
Keyword clustering groups search terms by their similarity (semantic) or by search result overlap (SERP). Entity clustering goes further by identifying the specific people, organizations, concepts, products, and processes that AI-cited content covers, then organizing your content to ensure complete entity coverage and clear entity relationships. Keyword clustering determines page boundaries. Entity clustering determines what each page must contain to earn AI citations.
Do I need special tools for entity-based keyword clustering?
You need a SERP-based clustering tool (Keyword Insights, SE Ranking, or KeyClusters) for Step 2, plus access to Google's Natural Language API for entity extraction in Step 3. For tracking AI citations, tools like Passionfruit Labs monitor your visibility across ChatGPT, Perplexity, and Google AI Overviews. Most of the entity mapping work happens in a spreadsheet.
How many entities should a cluster page target?
Your hub page should cover 15-25 entities at overview level. Each cluster page should cover 2-3 entities at depth while introducing 5-10 additional related entities. The Princeton/Georgia Tech GEO study found that content with higher information density (more statistics, citations, and specific evidence) performed 30-40% better in generative engine visibility than content relying on general language (Aggarwal et al., KDD 2024). Use this as a benchmark: every 150-200 words should introduce a named entity (a specific tool, metric, company, or process).
Does entity-based clustering replace traditional keyword research?
No. Keywords still determine search volume, intent, and page targeting. Entity clustering adds a layer on top of keyword clustering that optimizes for AI visibility. Think of it as: keywords tell you what pages to create. Entities tell you what those pages must contain to get cited by AI search engines.
How often should I update my entity clusters?
Run a full entity gap analysis monthly for your top 10 clusters. Reassess your cluster architecture quarterly. Update your entity maps whenever you notice new entities appearing in AI answers for your target queries. Search intent evolves, and entity clusters that matched six months ago may have gaps today.
Can small websites compete with entity-based clustering?
Yes. Depth within a focused niche outperforms breadth across many topics. A 20-page site with complete entity coverage for one topic cluster performs better in AI search than a 200-page site with shallow coverage across 20 topics. Entity-based clustering rewards expertise, which favors specialists over generalists.
What is entity salience and why does it matter?
Entity salience is a score (0 to 1) from Google's Natural Language API that measures how central an entity is to a piece of content. A page about "CRM software" with high salience for the entity "CRM" signals to Google that this page is genuinely about CRM, not just mentioning it. Low salience on your primary entity means your page will struggle in AI contexts regardless of keyword rankings. Google's official Knowledge Graph documentation confirms that the system "understands facts and information about entities from materials shared across the web" and uses these entity signals to power search features (Google, Knowledge Panels Help).






