π Site Audit Agent
BaC Principle
βYou cannot optimize what you havenβt measured. Before any agent writes a word or changes a tag, the audit agent takes a complete snapshot of reality.β β Business as Code Manifesto
Stage: S1 β Site Audit & Baseline Type: AUTO (~95% automated) Trigger: New target definition created, or monthly scheduled run Output feeds: 04-Keyword-and-Topic-Research-Agent, 07-Technical-SEO-Agent
See Map of Content for full vault navigation.
π― Objective
Produce a complete technical health snapshot and content baseline for the target domain before any optimization begins. This report is the source of truth for all subsequent agents in the pipeline β every optimization decision traces back to this audit.
π οΈ Tools & Integrations
tools:
- name: Screaming Frog SEO Spider
used_for: Full site crawl β URLs, status codes, meta tags, headings, internal links
api: CLI / scheduled crawl export
- name: Google Search Console API
used_for: Indexing status, search performance (impressions, clicks, CTR, position)
api: REST API (site:domain filter)
- name: PageSpeed Insights API
used_for: Core Web Vitals per page β LCP, CLS, FID/INP, TTFB
api: REST API (strategy: mobile + desktop)
- name: Ahrefs / SEMrush API
used_for: Domain authority, backlink profile, top organic pages, existing keyword rankings
api: REST API
- name: Claude API
used_for: Classify issues by priority, generate remediation summaries
api: Messages APIβ‘ Execution Steps
flowchart TD A["Load target_definition<br/>from [[02-Target-Definition]]"] B["Crawl site<br/>Screaming Frog: all URLs"] C["Pull GSC data<br/>Last 90 days performance"] D["Run PageSpeed audit<br/>Sample top 20 pages by traffic"] E["Pull Ahrefs profile<br/>DA, backlinks, top pages"] F["Classify issues by priority<br/>Critical / High / Medium / Low"] G["Score all pages<br/>content_quality_score + aeo_readiness_score"] H["Generate audit_report<br/>JSON output"] A --> B --> C --> D --> E --> F --> G --> H
Step details:
- Crawl β Full crawl of
target.domain. Capture: URL, HTTP status, title, meta description, H1, word count, canonical, indexable flag, internal link count, page depth. - GSC pull β Last 90 days: clicks, impressions, CTR, avg position per URL and per query.
- PageSpeed β Mobile + desktop CWV for top 20 pages by organic clicks.
- Ahrefs pull β Domain Rating, total backlinks, top 10 pages by organic traffic, top 10 keywords by ranking position.
- Issue classification β Agent classifies each crawl finding: critical (blocks indexing), high (hurts rankings), medium (optimization opportunity), low (nice to have).
- Page scoring β Each page gets:
content_quality_score(0β100): length, heading structure, internal links, mediaaeo_readiness_score(0β100): FAQ presence, answer-first structure, schema markup, definition clarity
- Report generation β Outputs structured JSON (schema below).
π€ Output Schema
{
"audit_id": "audit_2026-04-02_businessascode.co",
"domain": "businessascode.co",
"crawled_at": "2026-04-02T10:00:00Z",
"crawl_stats": {
"total_urls_found": 45,
"indexable_pages": 38,
"non_indexable_pages": 7,
"broken_links": 3,
"redirect_chains": 2
},
"technical_issues": [
{
"issue_id": "issue_001",
"type": "missing_meta_description",
"severity": "high",
"affected_urls": ["https://businessascode.co/blog/post-1"],
"count": 12,
"remediation": "Add unique meta descriptions targeting primary keyword + CTA"
}
],
"pages_inventory": [
{
"url": "https://businessascode.co/use-cases/cold-email/",
"http_status": 200,
"indexable": true,
"title": "Cold Email Outreach β Business as Code",
"word_count": 2400,
"h1": "Cold Email Outreach as Code",
"internal_links_in": 8,
"internal_links_out": 12,
"gsc_clicks_90d": 145,
"gsc_impressions_90d": 3200,
"gsc_avg_position": 14.2,
"content_quality_score": 72,
"aeo_readiness_score": 45,
"lcp_mobile_ms": 2100,
"cls_mobile": 0.04
}
],
"baseline_metrics": {
"organic_sessions_30d": 890,
"total_indexed_pages": 38,
"avg_serp_position": 18.4,
"domain_rating": 22,
"total_backlinks": 87,
"referring_domains": 31,
"pages_with_aeo_score_below_50": 28,
"pages_passing_core_web_vitals": 31
},
"priority_actions": [
{
"priority": "critical",
"action": "Fix 3 broken internal links",
"effort": "low",
"agent": "TechnicalSEOAgent"
},
{
"priority": "high",
"action": "Add meta descriptions to 12 pages",
"effort": "medium",
"agent": "ContentOptimizationEngine"
},
{
"priority": "high",
"action": "Improve AEO readiness on 28 pages scoring < 50",
"effort": "high",
"agent": "AEOStructuringAgent"
}
]
}π KPIs
| Metric | Target | Alert Condition |
|---|---|---|
| Crawl completion rate | 100% | < 95% β re-run |
| Issues classified | All findings categorized | Unclassified issues > 0 |
| Baseline captured | All 5 data sources pulled | Any source unavailable β flag |
| Page scoring complete | All indexable pages scored | Any page missing scores β flag |
| Audit runtime | < 2 hours | > 4 hours β alert CMO |
β οΈ Error Handling
error_rules:
- condition: Screaming Frog crawl times out or fails
action: Retry once with reduced crawl speed (2 req/sec)
fallback: Flag to human β run crawl manually, upload CSV
- condition: GSC API returns no data (new site or auth error)
action: Log as "no GSC history" in baseline_metrics
fallback: Use Ahrefs estimated traffic as proxy
- condition: PageSpeed API returns errors for > 50% of URLs
action: Flag pages as "CWV_unknown" β skip in this cycle
fallback: Manual PageSpeed check on top 5 pages
- condition: Ahrefs / SEMrush API rate limit hit
action: Queue remaining requests, retry after 1 hour
fallback: Use cached data from last run if < 14 days oldπ Related Files
- 02-Target-Definition β Provides domain + scope for this agent
- 01-Process-Manifest β S1 stage definition
- 04-Keyword-and-Topic-Research-Agent β Receives audit_report as input
- 07-Technical-SEO-Agent β Receives technical_issues from this report
- 10-Metrics-and-Self-Improvement β Baseline metrics feed the improvement loop