AuditForge Architecture¶
At a glance¶
AuditForge is a corpus-agnostic deep-audit engine. It runs against a customer's document corpus (contracts, policies, SOPs, attestations, training records) and produces a partner-billable deliverable: a set of evidence-anchored findings with severity ratings, root-cause framing, remediation scopes, systemic patterns, and an executive summary.
The system is structured first-pass review for a senior partner, not an autonomous decision-maker. Every finding is reviewed by a partner before the deliverable goes to the end client. The deliverable carries the audit firm's brand, not AuditForge's.
┌─────────────────────────────────────────────────────────────┐
│ Web Frontend │
│ (React SPA at /?view=auditforge — landing + dashboard) │
└───────────────────────────┬─────────────────────────────────┘
│ x-admin-token gated
▼
┌─────────────────────────────────────────────────────────────┐
│ FastAPI app/main.py │
│ (auditforge_router under /auditforge/*) │
└───────────────────────────┬─────────────────────────────────┘
│
▼
┌─────────────────────┐
│ run_audit() │
│ pipeline driver │
└──────────┬──────────┘
│
┌────────────┬───────────┬───────┴───────┬───────────┬──────────────┐
▼ ▼ ▼ ▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌────────┐ ┌────────┐
│ A │ → │ B │ → │ C │ → │ D │ → │ E │ → │ E.5 │
│Profile│ │Catalog│ │Synth. │ │Valid. │ │Invest. │ │Consol. │
└───────┘ └───────┘ └───────┘ └───────┘ └────────┘ └────────┘
│
▼
┌────────┐ ┌────────┐ ┌────────┐
│ G │ ← │ F.5 │ ← │ F │
│ Report │ │ Filter │ │ Deepen │
└────────┘ └────────┘ └────────┘
│ │ │
▼ ▼ ▼
MD / DOCX / JSON Reject Patterns +
(white-label) appendix follow-up
targets
loop back to B → ... if
iterations remain
Pipeline stages¶
| Stage | What it does | LLM tier | Doc |
|---|---|---|---|
| A. Profile | Corpus metadata, cluster topology, citation graph, stratified sample | gpt-4o-mini for cluster labels; Sonnet for domain inference | 02-stage-a-profiler.md |
| B. Catalog | Ranked target lists per primitive (concepts, doc-pairs, required-elements, currency rules, defined terms, citation tuples, temporal relations, quantitative facts, obligation checkpoints, ambiguity checkpoints) | Sonnet | 03-stage-b-catalog.md |
| C. Synthesize | (primitive, target) → concrete scoped Question with prompt template | None (deterministic templating) | 04-stage-c-synthesizer.md |
| D. Validate | Cheap relevance check + near-duplicate dedupe; drop questions with no corpus purchase | Embedding only | 05-stage-d-validator.md |
| E. Investigate | Per-question parallel execution: hybrid retrieve → primitive-templated reason → evidence-anchor → adversarial verifier | Sonnet for investigation; Opus for adversarial verification | 06-stage-e-orchestrator.md |
| E.5 Consolidate | Cluster raw findings by root cause; merge into canonical findings; reclassify primitives; apply cross-primitive corroboration boost | Opus | 14-stage-e5-consolidation.md |
| F. Deepen | Cross-finding pattern synthesis; generate follow-up targets for next iteration | Opus for patterns; Sonnet for follow-up generation | 07-stage-f-deepening.md |
| F.5 Filter | Two-pass classifier: LLM rates findings (definitive / likely / speculative / rejected); upgrade-only override ruleset protects against false negatives | Opus | 15-stage-f5-filter.md |
| G. Report | White-label deliverable rendering — JSON + Markdown + DOCX + Methodology, with firm branding applied | Opus for executive summary | 08-stage-g-report.md |
Stage iteration¶
The B → C → D → E → F sequence runs as an iteration loop. Stage F generates follow_up_targets from the canonical findings and patterns; the next iteration's Stage B picks them up and seeds a focused round. Iterations terminate on:
max_iterations(default 3) reached- Convergence: no new high-confidence findings in the last iteration
- Budget cap: graceful tier downshift at 80% of budget; abort at 92%
E.5 (consolidate) and F.5 (filter) run once at the end of all iterations, not per iteration. The deliverable renders the consolidated, filtered set.
The ten primitives¶
Each primitive is a structured way to interrogate a corpus. They are composable: a new audit type = a different combination + weighting of the same primitives.
| Primitive | Pattern |
|---|---|
conflict_check |
Two documents take contradictory positions on the same concept |
consistency_check |
A defined term used with different meanings across documents |
coverage_check |
A required document or control category is absent given the frameworks in scope |
currency_check |
A reference to a superseded standard or stale clause language |
flow_down_check |
A parent-contract obligation not propagated into subcontracts or implementing policies |
citation_integrity_check |
Citation to an authority that does not say what the citing document claims |
temporal_check |
A required temporal precedence is violated or unconstrained |
quantitative_check |
A quantitative fact internally inconsistent or violating an external standard |
obligation_check |
A one-sided obligation when bilateral coverage is expected |
ambiguity_check |
Language ambiguous enough to cause downstream dispute |
The primitives' implementations are in app/auditforge/catalog.py (catalog generation) and app/auditforge/orchestrator.py (per-question investigation).
Engagement archetypes¶
Four pre-tuned configurations for different sales motions:
| Archetype | Catalog weighting | Validator strictness | Deliverable framing |
|---|---|---|---|
| Capability + Leverage | Balanced | Standard | Comprehensive, junior-staff-readable |
| Remediation Pipeline | Coverage / flow-down / obligation heavy | Standard | Each finding sized as remediation scope |
| Premium / Defensibility | Citation / consistency / conflict heavy | Tight | Evidence-rich, three-corroboration where possible |
| Continuous Monitoring | Currency / temporal heavy | Loose (recall>precision) | Period-over-period delta findings |
Data model¶
Engagement¶
Top-level record. Persisted at s3://{bucket}/auditforge/engagements.json.
class AuditEngagement:
id: str # eng-<hex>
firm_id: str | None # white-label branding
client_id: str | None # Metis tenant whose corpus to audit
client_name: str
archetype: ArchetypeKind
status: EngagementStatus
intake: IntakeData | None
cost: CostTelemetry | None
findings: FindingCounts # aggregated counts by status + severity
findings_key: str # S3 key for the per-engagement findings.json
created_at, updated_at, started_at, completed_at, delivered_at
Finding¶
Persisted at s3://{bucket}/{findings_key} (per engagement).
class Finding:
id: str # f-<hex> (raw) or cf-<hex> (canonical)
engagement_id: str
primitive: str # one of the ten primitives
severity: Severity # critical / high / medium / low
confidence: float # 0.0-1.0
description: str
root_cause: str
evidence: list[EvidenceCitation] # verbatim quotes anchored to source
remediation: RemediationFraming # scope, hours, dependencies, risk
status: FindingStatus # pending / accepted / rejected / refined
auditor_notes: str # partner's notes on the finding
# Stage E.5 metadata
is_canonical: bool # produced by consolidation
merged_finding_ids: list[str] # raw findings that this canonical swallowed
primitive_angles_agreed: list[str]
corroboration_score: float
# Stage F.5 metadata
filter_status: str # pending / definitive / likely / speculative / rejected
filter_rationale: str
filter_overridden_by: str | None # rule id when override ruleset upgraded
Firm (Phase 2.5 white-label)¶
Persisted at s3://{bucket}/auditforge/firms.json.
class Firm:
id: str
display_name: str
short_name: str
tagline: str
logo_url: str
primary_color: str
accent_color: str
methodology_disclaimer: str
footer_text: str
confidentiality_notice: str
default_archetype: str
default_budget_cents: int
When an engagement has no firm or its firm was deleted, the deliverable renderer falls back to firm-default (neutral AuditForge branding).
Storage layout¶
s3://{APP_BUCKET}/
auditforge/
engagements.json # all engagement records
firms.json # all firm branding profiles
engagements/{engagement_id}/findings.json # per-engagement findings
deliverables/{engagement_id}.json # cached JSON deliverable
deliverables/{engagement_id}.md # cached Markdown
deliverables/{engagement_id}.docx # cached DOCX
deliverables/{engagement_id}-methodology.md # cached methodology
audit_logs/{engagement_id}/shard-NNNNN.jsonl # per-call audit log shards
{client_id}/ # Metis tenant corpus index
index/index.faiss
index/metadata.pkl
index/bm25.pkl
index/supersession.json
The shared bucket uses prefix isolation by default. Per-engagement bucket isolation is supported behind the AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET=true feature flag — when enabled, each new engagement gets a dedicated S3 bucket (metis-af-{eng-short}-{account-id}) with versioning, AES-256 encryption, and a strict public-access block. See 20-per-engagement-s3-isolation.md.
Cost governance¶
CostBudget (in app/auditforge/llm.py) tracks cumulative compute spend per engagement against a hard cap.
| Threshold | Behavior |
|---|---|
| 0–80% | Normal operation, requested tier always honored |
| 80–92% | Graceful downshift: REASONING_HIGH → REASONING_MID, REASONING_MID → MECHANICAL |
| 92–100% | Strict downshift: REASONING_HIGH and REASONING_MID both pinned to MECHANICAL |
| ≥100% | BudgetExceeded raised; pipeline aborts cleanly with partial deliverable |
The --max-questions (CLI) / max_questions_per_iteration (API) flag adds an additional cap on the investigate stage to keep cost predictable for synthetic-corpus validation runs.
LLM clients¶
LLMClient is a multi-provider async wrapper with concurrency cap, retries on transient errors (rate-limit, 5xx, timeout), and per-call cost telemetry.
| Tier | Model | Used for |
|---|---|---|
REASONING_HIGH |
claude-opus-4-7 |
Adversarial verification, consolidation, filter, executive summary, pattern detection |
REASONING_MID |
claude-sonnet-4-6 |
Catalog generation, investigation, synthesizer, validator, intake extraction |
MECHANICAL |
gpt-4o-mini |
Cluster labeling, simple text extraction, low-stakes parsing |
Anthropic credentials come from ANTHROPIC_API_KEY. OpenAI uses AUDITFORGE_OPENAI_API_KEY (with optional AUDITFORGE_OPENAI_BASE_URL for self-hosted endpoints) so it can be decoupled from the chat path's OPENAI_API_KEY (which may point at Groq for chat).
REST API surface¶
See api-reference.md for the full endpoint catalog. High-level grouping:
- Engagements: create, list, get, delete, intake, run, stream, findings, deliverable
- Findings: accept, reject, refine, edit, investigate-further, search across engagements
- Firms (white-label): list, get, create, update, delete
- Intake helper: extract-from-description (AI-assisted intake)
All endpoints gated by x-admin-token. Per-user auth (replacing the shared admin token) is an open hardening item.
Frontend¶
React SPA, served by FastAPI as static files from frontend/dist. View switching via ?view=auditforge query param (matches PilotForge precedent).
| Component | Purpose |
|---|---|
AuditForgeLanding.tsx |
Public marketing page + token gate |
AuditForge.tsx |
Shell with Engagements / Firms tab toggle |
EngagementList.tsx |
Portfolio dashboard — summary cards + filter chips + grid |
EngagementDetail.tsx |
Header + live progress banner + finding list / detail two-column |
FindingList.tsx |
Severity / status / primitive filterable list |
FindingDetail.tsx |
Full finding view + accept/reject/refine/edit + investigate-further |
FirmManagement.tsx |
Firm CRUD with logo / color / disclaimer |
NewEngagementForm.tsx |
Create-engagement form with AI-assisted intake |
FindingsSearch.tsx |
Cross-engagement findings search with filters |
useEngagementStream.ts |
SSE consumer for live progress events |
Deployment¶
Production runs as a single FastAPI process on AWS ECS Fargate (Graviton2 / ARM64) behind an ALB, with the React SPA served as static files from the same container. ECR holds the image. ECS service auto-deploys on aws ecs update-service --force-new-deployment. CD via .github/workflows/deploy.yml.
| Resource | Identifier |
|---|---|
| AWS account | 741783034843 |
| Region | us-east-1 |
| ECS cluster / service | metis-demo |
| ECR repo | metis-demo |
| S3 bucket | mobilemetis-metis-indexes-741783034843 |
| Domain | metis-demo.base2ml.com |
See deployment.md for the runbook.