🔬 Keyword & Topic Research Agent

BaC Principle

“Keyword research is not a creative exercise — it is a data aggregation and classification problem. Agents are better at it than humans.”Business as Code Manifesto

Stage: S2 — Keyword & Topic Research Type: AUTO (~85% automated) Trigger: After S1 audit_report is ready, or on monthly research cadence Output feeds: 09-Human-Review-Gates (Strategy Review), 05-Content-Optimization-Engine

See Map of Content for full vault navigation.


🎯 Objective

Build a complete keyword universe organized into topic clusters, classify by intent and AEO potential, and map AI-style query variants — so the content strategy has a data-driven foundation that agents can execute against.


🛠️ Tools & Integrations

tools:
  - name: Ahrefs / SEMrush API
    used_for: Keyword volume, difficulty, SERP features, competitor keyword gaps
    api: REST API
 
  - name: Google Keyword Planner API
    used_for: Search volume validation, seasonal trends
    api: Google Ads API
 
  - name: People Also Ask scraper
    used_for: Extracting question-format keywords — high AEO potential
    api: Custom scraper (SerpAPI or Brightdata)
 
  - name: Reddit / Quora scraper
    used_for: Discovering how real users phrase questions about target topics
    api: SerpAPI (site:reddit.com search) + Quora API
 
  - name: Claude API
    used_for: >
      (1) Simulate how LLMs answer target queries — identify which content gaps
      exist in AI responses that the site could fill.
      (2) Classify keyword intent.
      (3) Generate AEO query variants per cluster.
    api: Messages API

⚡ Execution Steps

flowchart TD
    A["Load target_definition<br/>seed topics + scoring model"]
    B["Seed expansion<br/>Ahrefs: 'also rank for' + 'keyword ideas'"]
    C["Competitor gap analysis<br/>What competitors rank for that we don't"]
    D["People Also Ask extraction<br/>Question keywords per topic"]
    E["Reddit/Quora mining<br/>Colloquial query patterns"]
    F["AEO query simulation<br/>Claude API: ask each cluster as LLM query"]
    G["Intent classification<br/>informational / navigational / commercial / transactional"]
    H["Scoring<br/>Apply scoring model from [[02-Target-Definition]]"]
    I["Cluster formation<br/>Group by topic + intent + content pillar"]
    J["Output: keyword_topic_map<br/>Structured JSON"]

    A --> B --> C --> D --> E --> F --> G --> H --> I --> J

Step details:

  1. Seed expansion — Start with seed topics from 02-Target-Definition. Use Ahrefs “Also rank for” + “Keyword ideas” to expand to 500–2,000 candidate keywords.
  2. Competitor gap — Pull competitor organic keyword lists. Identify keywords competitors rank for (positions 1–20) where the site ranks > 50 or not at all.
  3. PAA extraction — For each seed topic, extract People Also Ask questions from SERP. These are pre-validated query formats LLMs mirror.
  4. Reddit/Quora mining — Search Reddit and Quora for target topics. Extract phrasing patterns that reveal real user language — often more AEO-relevant than formal keyword tools.
  5. AEO query simulation — For each topic cluster, send 3 representative queries to Claude API. Analyze the response: what does the AI answer? What sources does it cite? What content gap exists that the site could fill?
  6. Intent classification — Classify each keyword: informational / navigational / commercial / transactional. Use Claude API for batch classification.
  7. Scoring — Apply the scoring model from 02-Target-Definition: volume score × difficulty score × intent match × AEO potential.
  8. Cluster formation — Group keywords into topic clusters mapped to content pillars (CP1–CP5 from target definition). Each cluster gets a primary keyword (highest score) and supporting keywords.
  9. Output — Generate keyword_topic_map JSON.

📤 Output Schema

{
  "research_id": "research_2026-04-02",
  "generated_at": "2026-04-02T14:00:00Z",
  "total_keywords_evaluated": 1243,
  "clusters_formed": 18,
  "keyword_clusters": [
    {
      "cluster_id": "C001",
      "content_pillar": "CP1",
      "primary_keyword": "business as code",
      "primary_keyword_volume": 1200,
      "primary_keyword_difficulty": 28,
      "primary_keyword_score": 82,
      "intent": "informational",
      "aeo_potential": "high",
      "supporting_keywords": [
        "what is business as code",
        "business as code methodology",
        "codify business processes"
      ],
      "paa_questions": [
        "What does Business as Code mean?",
        "How is Business as Code different from BPM?",
        "Can AI agents run business processes?"
      ],
      "aeo_query_variants": [
        "Explain Business as Code",
        "What is the Business as Code approach to automation?",
        "How do you make a business process AI-executable?"
      ],
      "aeo_gap_analysis": "Current LLM answers conflate BaC with traditional BPM. Opportunity: create definitive, quotable definition + comparison content.",
      "recommended_content_type": "pillar_page",
      "recommended_word_count": 3000,
      "priority": "high"
    }
  ],
  "content_gap_summary": {
    "missing_pillar_pages": 3,
    "pages_needing_update": 7,
    "new_cluster_pages_needed": 12,
    "quick_wins": [
      "Add FAQ schema to existing /use-cases pages",
      "Create 'What is BaC?' definitional page — currently not indexed"
    ]
  },
  "aeo_landscape": {
    "queries_tested_with_llm": 54,
    "queries_where_site_was_cited": 3,
    "top_cited_competitors": ["zapier.com", "processst.com"],
    "citation_gap_queries": ["business as code definition", "AI agent for operations"]
  }
}

📊 KPIs

MetricTargetAlert Condition
Keywords evaluated≥ 500< 200 → expand seed list
Clusters formed10–30< 5 → check API connectivity
AEO query variants per cluster≥ 3< 2 → flag cluster
High-priority clusters identified≥ 50 → review scoring model
AEO gap analysis completeAll clustersMissing → flag
Runtime< 3 hours> 6 hours → alert

⚠️ Error Handling

error_rules:
  - condition: Ahrefs / SEMrush API quota exhausted
    action: Switch to the other provider for remaining requests
    fallback: Use Google Keyword Planner as backup for volume data
 
  - condition: PAA scraper blocked (CAPTCHA or rate limit)
    action: Reduce request rate to 1 req/10 sec, add random delays
    fallback: Use pre-cached PAA data from last run if < 30 days old
 
  - condition: Claude API returns inconsistent intent classifications
    action: Re-run classification with temperature=0 for consistency
    fallback: Default to "informational" for ambiguous keywords
 
  - condition: AEO simulation returns no usable gap analysis
    action: Retry with more specific query variants
    fallback: Flag cluster for human review in Strategy Gate