🔬 Keyword & Topic Research Agent
BaC Principle
“Keyword research is not a creative exercise — it is a data aggregation and classification problem. Agents are better at it than humans.” — Business as Code Manifesto
Stage: S2 — Keyword & Topic Research Type: AUTO (~85% automated) Trigger: After S1 audit_report is ready, or on monthly research cadence Output feeds: 09-Human-Review-Gates (Strategy Review), 05-Content-Optimization-Engine
See Map of Content for full vault navigation.
🎯 Objective
Build a complete keyword universe organized into topic clusters, classify by intent and AEO potential, and map AI-style query variants — so the content strategy has a data-driven foundation that agents can execute against.
🛠️ Tools & Integrations
tools:
- name: Ahrefs / SEMrush API
used_for: Keyword volume, difficulty, SERP features, competitor keyword gaps
api: REST API
- name: Google Keyword Planner API
used_for: Search volume validation, seasonal trends
api: Google Ads API
- name: People Also Ask scraper
used_for: Extracting question-format keywords — high AEO potential
api: Custom scraper (SerpAPI or Brightdata)
- name: Reddit / Quora scraper
used_for: Discovering how real users phrase questions about target topics
api: SerpAPI (site:reddit.com search) + Quora API
- name: Claude API
used_for: >
(1) Simulate how LLMs answer target queries — identify which content gaps
exist in AI responses that the site could fill.
(2) Classify keyword intent.
(3) Generate AEO query variants per cluster.
api: Messages API⚡ Execution Steps
flowchart TD A["Load target_definition<br/>seed topics + scoring model"] B["Seed expansion<br/>Ahrefs: 'also rank for' + 'keyword ideas'"] C["Competitor gap analysis<br/>What competitors rank for that we don't"] D["People Also Ask extraction<br/>Question keywords per topic"] E["Reddit/Quora mining<br/>Colloquial query patterns"] F["AEO query simulation<br/>Claude API: ask each cluster as LLM query"] G["Intent classification<br/>informational / navigational / commercial / transactional"] H["Scoring<br/>Apply scoring model from [[02-Target-Definition]]"] I["Cluster formation<br/>Group by topic + intent + content pillar"] J["Output: keyword_topic_map<br/>Structured JSON"] A --> B --> C --> D --> E --> F --> G --> H --> I --> J
Step details:
- Seed expansion — Start with seed topics from 02-Target-Definition. Use Ahrefs “Also rank for” + “Keyword ideas” to expand to 500–2,000 candidate keywords.
- Competitor gap — Pull competitor organic keyword lists. Identify keywords competitors rank for (positions 1–20) where the site ranks > 50 or not at all.
- PAA extraction — For each seed topic, extract People Also Ask questions from SERP. These are pre-validated query formats LLMs mirror.
- Reddit/Quora mining — Search Reddit and Quora for target topics. Extract phrasing patterns that reveal real user language — often more AEO-relevant than formal keyword tools.
- AEO query simulation — For each topic cluster, send 3 representative queries to Claude API. Analyze the response: what does the AI answer? What sources does it cite? What content gap exists that the site could fill?
- Intent classification — Classify each keyword: informational / navigational / commercial / transactional. Use Claude API for batch classification.
- Scoring — Apply the scoring model from 02-Target-Definition: volume score × difficulty score × intent match × AEO potential.
- Cluster formation — Group keywords into topic clusters mapped to content pillars (CP1–CP5 from target definition). Each cluster gets a primary keyword (highest score) and supporting keywords.
- Output — Generate
keyword_topic_mapJSON.
📤 Output Schema
{
"research_id": "research_2026-04-02",
"generated_at": "2026-04-02T14:00:00Z",
"total_keywords_evaluated": 1243,
"clusters_formed": 18,
"keyword_clusters": [
{
"cluster_id": "C001",
"content_pillar": "CP1",
"primary_keyword": "business as code",
"primary_keyword_volume": 1200,
"primary_keyword_difficulty": 28,
"primary_keyword_score": 82,
"intent": "informational",
"aeo_potential": "high",
"supporting_keywords": [
"what is business as code",
"business as code methodology",
"codify business processes"
],
"paa_questions": [
"What does Business as Code mean?",
"How is Business as Code different from BPM?",
"Can AI agents run business processes?"
],
"aeo_query_variants": [
"Explain Business as Code",
"What is the Business as Code approach to automation?",
"How do you make a business process AI-executable?"
],
"aeo_gap_analysis": "Current LLM answers conflate BaC with traditional BPM. Opportunity: create definitive, quotable definition + comparison content.",
"recommended_content_type": "pillar_page",
"recommended_word_count": 3000,
"priority": "high"
}
],
"content_gap_summary": {
"missing_pillar_pages": 3,
"pages_needing_update": 7,
"new_cluster_pages_needed": 12,
"quick_wins": [
"Add FAQ schema to existing /use-cases pages",
"Create 'What is BaC?' definitional page — currently not indexed"
]
},
"aeo_landscape": {
"queries_tested_with_llm": 54,
"queries_where_site_was_cited": 3,
"top_cited_competitors": ["zapier.com", "processst.com"],
"citation_gap_queries": ["business as code definition", "AI agent for operations"]
}
}📊 KPIs
| Metric | Target | Alert Condition |
|---|---|---|
| Keywords evaluated | ≥ 500 | < 200 → expand seed list |
| Clusters formed | 10–30 | < 5 → check API connectivity |
| AEO query variants per cluster | ≥ 3 | < 2 → flag cluster |
| High-priority clusters identified | ≥ 5 | 0 → review scoring model |
| AEO gap analysis complete | All clusters | Missing → flag |
| Runtime | < 3 hours | > 6 hours → alert |
⚠️ Error Handling
error_rules:
- condition: Ahrefs / SEMrush API quota exhausted
action: Switch to the other provider for remaining requests
fallback: Use Google Keyword Planner as backup for volume data
- condition: PAA scraper blocked (CAPTCHA or rate limit)
action: Reduce request rate to 1 req/10 sec, add random delays
fallback: Use pre-cached PAA data from last run if < 30 days old
- condition: Claude API returns inconsistent intent classifications
action: Re-run classification with temperature=0 for consistency
fallback: Default to "informational" for ambiguous keywords
- condition: AEO simulation returns no usable gap analysis
action: Retry with more specific query variants
fallback: Flag cluster for human review in Strategy Gate📎 Related Files
- 02-Target-Definition — Provides seed topics, scoring model, content pillars
- 03-Site-Audit-Agent — Provides existing page inventory (content gap reference)
- 01-Process-Manifest — S2 stage definition
- 09-Human-Review-Gates — Gate #1 reviews the output of this agent
- 05-Content-Optimization-Engine — Receives approved content strategy