πŸ” Site Audit Agent

BaC Principle

β€œYou cannot optimize what you haven’t measured. Before any agent writes a word or changes a tag, the audit agent takes a complete snapshot of reality.” β€” Business as Code Manifesto

Stage: S1 β€” Site Audit & Baseline Type: AUTO (~95% automated) Trigger: New target definition created, or monthly scheduled run Output feeds: 04-Keyword-and-Topic-Research-Agent, 07-Technical-SEO-Agent

See Map of Content for full vault navigation.


🎯 Objective

Produce a complete technical health snapshot and content baseline for the target domain before any optimization begins. This report is the source of truth for all subsequent agents in the pipeline β€” every optimization decision traces back to this audit.


πŸ› οΈ Tools & Integrations

tools:
  - name: Screaming Frog SEO Spider
    used_for: Full site crawl β€” URLs, status codes, meta tags, headings, internal links
    api: CLI / scheduled crawl export
 
  - name: Google Search Console API
    used_for: Indexing status, search performance (impressions, clicks, CTR, position)
    api: REST API (site:domain filter)
 
  - name: PageSpeed Insights API
    used_for: Core Web Vitals per page β€” LCP, CLS, FID/INP, TTFB
    api: REST API (strategy: mobile + desktop)
 
  - name: Ahrefs / SEMrush API
    used_for: Domain authority, backlink profile, top organic pages, existing keyword rankings
    api: REST API
 
  - name: Claude API
    used_for: Classify issues by priority, generate remediation summaries
    api: Messages API

⚑ Execution Steps

flowchart TD
    A["Load target_definition<br/>from [[02-Target-Definition]]"]
    B["Crawl site<br/>Screaming Frog: all URLs"]
    C["Pull GSC data<br/>Last 90 days performance"]
    D["Run PageSpeed audit<br/>Sample top 20 pages by traffic"]
    E["Pull Ahrefs profile<br/>DA, backlinks, top pages"]
    F["Classify issues by priority<br/>Critical / High / Medium / Low"]
    G["Score all pages<br/>content_quality_score + aeo_readiness_score"]
    H["Generate audit_report<br/>JSON output"]

    A --> B --> C --> D --> E --> F --> G --> H

Step details:

  1. Crawl β€” Full crawl of target.domain. Capture: URL, HTTP status, title, meta description, H1, word count, canonical, indexable flag, internal link count, page depth.
  2. GSC pull β€” Last 90 days: clicks, impressions, CTR, avg position per URL and per query.
  3. PageSpeed β€” Mobile + desktop CWV for top 20 pages by organic clicks.
  4. Ahrefs pull β€” Domain Rating, total backlinks, top 10 pages by organic traffic, top 10 keywords by ranking position.
  5. Issue classification β€” Agent classifies each crawl finding: critical (blocks indexing), high (hurts rankings), medium (optimization opportunity), low (nice to have).
  6. Page scoring β€” Each page gets:
    • content_quality_score (0–100): length, heading structure, internal links, media
    • aeo_readiness_score (0–100): FAQ presence, answer-first structure, schema markup, definition clarity
  7. Report generation β€” Outputs structured JSON (schema below).

πŸ“€ Output Schema

{
  "audit_id": "audit_2026-04-02_businessascode.co",
  "domain": "businessascode.co",
  "crawled_at": "2026-04-02T10:00:00Z",
  "crawl_stats": {
    "total_urls_found": 45,
    "indexable_pages": 38,
    "non_indexable_pages": 7,
    "broken_links": 3,
    "redirect_chains": 2
  },
  "technical_issues": [
    {
      "issue_id": "issue_001",
      "type": "missing_meta_description",
      "severity": "high",
      "affected_urls": ["https://businessascode.co/blog/post-1"],
      "count": 12,
      "remediation": "Add unique meta descriptions targeting primary keyword + CTA"
    }
  ],
  "pages_inventory": [
    {
      "url": "https://businessascode.co/use-cases/cold-email/",
      "http_status": 200,
      "indexable": true,
      "title": "Cold Email Outreach β€” Business as Code",
      "word_count": 2400,
      "h1": "Cold Email Outreach as Code",
      "internal_links_in": 8,
      "internal_links_out": 12,
      "gsc_clicks_90d": 145,
      "gsc_impressions_90d": 3200,
      "gsc_avg_position": 14.2,
      "content_quality_score": 72,
      "aeo_readiness_score": 45,
      "lcp_mobile_ms": 2100,
      "cls_mobile": 0.04
    }
  ],
  "baseline_metrics": {
    "organic_sessions_30d": 890,
    "total_indexed_pages": 38,
    "avg_serp_position": 18.4,
    "domain_rating": 22,
    "total_backlinks": 87,
    "referring_domains": 31,
    "pages_with_aeo_score_below_50": 28,
    "pages_passing_core_web_vitals": 31
  },
  "priority_actions": [
    {
      "priority": "critical",
      "action": "Fix 3 broken internal links",
      "effort": "low",
      "agent": "TechnicalSEOAgent"
    },
    {
      "priority": "high",
      "action": "Add meta descriptions to 12 pages",
      "effort": "medium",
      "agent": "ContentOptimizationEngine"
    },
    {
      "priority": "high",
      "action": "Improve AEO readiness on 28 pages scoring < 50",
      "effort": "high",
      "agent": "AEOStructuringAgent"
    }
  ]
}

πŸ“Š KPIs

MetricTargetAlert Condition
Crawl completion rate100%< 95% β†’ re-run
Issues classifiedAll findings categorizedUnclassified issues > 0
Baseline capturedAll 5 data sources pulledAny source unavailable β†’ flag
Page scoring completeAll indexable pages scoredAny page missing scores β†’ flag
Audit runtime< 2 hours> 4 hours β†’ alert CMO

⚠️ Error Handling

error_rules:
  - condition: Screaming Frog crawl times out or fails
    action: Retry once with reduced crawl speed (2 req/sec)
    fallback: Flag to human β€” run crawl manually, upload CSV
 
  - condition: GSC API returns no data (new site or auth error)
    action: Log as "no GSC history" in baseline_metrics
    fallback: Use Ahrefs estimated traffic as proxy
 
  - condition: PageSpeed API returns errors for > 50% of URLs
    action: Flag pages as "CWV_unknown" β€” skip in this cycle
    fallback: Manual PageSpeed check on top 5 pages
 
  - condition: Ahrefs / SEMrush API rate limit hit
    action: Queue remaining requests, retry after 1 hour
    fallback: Use cached data from last run if < 14 days old