AuditForge Architecture¶

At a glance¶

AuditForge is a corpus-agnostic deep-audit engine. It runs against a customer's document corpus (contracts, policies, SOPs, attestations, training records) and produces a partner-billable deliverable: a set of evidence-anchored findings with severity ratings, root-cause framing, remediation scopes, systemic patterns, and an executive summary.

The system is structured first-pass review for a senior partner, not an autonomous decision-maker. Every finding is reviewed by a partner before the deliverable goes to the end client. The deliverable carries the audit firm's brand, not AuditForge's.

                   ┌─────────────────────────────────────────────────────────────┐
                   │                       Web Frontend                          │
                   │   (React SPA at /?view=auditforge — landing + dashboard)    │
                   └───────────────────────────┬─────────────────────────────────┘
                                               │  x-admin-token gated
                                               ▼
                   ┌─────────────────────────────────────────────────────────────┐
                   │                FastAPI app/main.py                          │
                   │   (auditforge_router under /auditforge/*)                   │
                   └───────────────────────────┬─────────────────────────────────┘
                                               │
                                               ▼
                                  ┌─────────────────────┐
                                  │   run_audit()       │
                                  │   pipeline driver   │
                                  └──────────┬──────────┘
                                             │
            ┌────────────┬───────────┬───────┴───────┬───────────┬──────────────┐
            ▼            ▼           ▼               ▼           ▼              ▼
        ┌───────┐    ┌───────┐   ┌───────┐      ┌───────┐   ┌────────┐    ┌────────┐
        │   A   │ →  │   B   │ → │   C   │  →   │   D   │ → │   E    │ →  │  E.5   │
        │Profile│    │Catalog│   │Synth. │      │Valid. │   │Invest. │    │Consol. │
        └───────┘    └───────┘   └───────┘      └───────┘   └────────┘    └────────┘
                                                                              │
                                                                              ▼
                                          ┌────────┐    ┌────────┐    ┌────────┐
                                          │   G    │ ←  │  F.5   │ ←  │   F    │
                                          │ Report │    │ Filter │    │ Deepen │
                                          └────────┘    └────────┘    └────────┘
                                              │              │             │
                                              ▼              ▼             ▼
                                      MD / DOCX / JSON   Reject       Patterns +
                                      (white-label)      appendix     follow-up
                                                                      targets

                                                          loop back to B → ... if
                                                          iterations remain

Pipeline stages¶

Stage	What it does	LLM tier	Doc
A. Profile	Corpus metadata, cluster topology, citation graph, stratified sample	gpt-4o-mini for cluster labels; Sonnet for domain inference	02-stage-a-profiler.md
B. Catalog	Ranked target lists per primitive (concepts, doc-pairs, required-elements, currency rules, defined terms, citation tuples, temporal relations, quantitative facts, obligation checkpoints, ambiguity checkpoints)	Sonnet	03-stage-b-catalog.md
C. Synthesize	(primitive, target) → concrete scoped Question with prompt template	None (deterministic templating)	04-stage-c-synthesizer.md
D. Validate	Cheap relevance check + near-duplicate dedupe; drop questions with no corpus purchase	Embedding only	05-stage-d-validator.md
E. Investigate	Per-question parallel execution: hybrid retrieve → primitive-templated reason → evidence-anchor → adversarial verifier	Sonnet for investigation; Opus for adversarial verification	06-stage-e-orchestrator.md
E.5 Consolidate	Cluster raw findings by root cause; merge into canonical findings; reclassify primitives; apply cross-primitive corroboration boost	Opus	14-stage-e5-consolidation.md
F. Deepen	Cross-finding pattern synthesis; generate follow-up targets for next iteration	Opus for patterns; Sonnet for follow-up generation	07-stage-f-deepening.md
F.5 Filter	Two-pass classifier: LLM rates findings (definitive / likely / speculative / rejected); upgrade-only override ruleset protects against false negatives	Opus	15-stage-f5-filter.md
G. Report	White-label deliverable rendering — JSON + Markdown + DOCX + Methodology, with firm branding applied	Opus for executive summary	08-stage-g-report.md

Stage iteration¶

The B → C → D → E → F sequence runs as an iteration loop. Stage F generates follow_up_targets from the canonical findings and patterns; the next iteration's Stage B picks them up and seeds a focused round. Iterations terminate on:

max_iterations (default 3) reached
Convergence: no new high-confidence findings in the last iteration
Budget cap: graceful tier downshift at 80% of budget; abort at 92%

E.5 (consolidate) and F.5 (filter) run once at the end of all iterations, not per iteration. The deliverable renders the consolidated, filtered set.

The ten primitives¶

Each primitive is a structured way to interrogate a corpus. They are composable: a new audit type = a different combination + weighting of the same primitives.

Primitive	Pattern
`conflict_check`	Two documents take contradictory positions on the same concept
`consistency_check`	A defined term used with different meanings across documents
`coverage_check`	A required document or control category is absent given the frameworks in scope
`currency_check`	A reference to a superseded standard or stale clause language
`flow_down_check`	A parent-contract obligation not propagated into subcontracts or implementing policies
`citation_integrity_check`	Citation to an authority that does not say what the citing document claims
`temporal_check`	A required temporal precedence is violated or unconstrained
`quantitative_check`	A quantitative fact internally inconsistent or violating an external standard
`obligation_check`	A one-sided obligation when bilateral coverage is expected
`ambiguity_check`	Language ambiguous enough to cause downstream dispute

The primitives' implementations are in app/auditforge/catalog.py (catalog generation) and app/auditforge/orchestrator.py (per-question investigation).

Engagement archetypes¶

Four pre-tuned configurations for different sales motions:

Archetype	Catalog weighting	Validator strictness	Deliverable framing
Capability + Leverage	Balanced	Standard	Comprehensive, junior-staff-readable
Remediation Pipeline	Coverage / flow-down / obligation heavy	Standard	Each finding sized as remediation scope
Premium / Defensibility	Citation / consistency / conflict heavy	Tight	Evidence-rich, three-corroboration where possible
Continuous Monitoring	Currency / temporal heavy	Loose (recall>precision)	Period-over-period delta findings

Data model¶

Engagement¶

Top-level record. Persisted at s3://{bucket}/auditforge/engagements.json.

class AuditEngagement:
    id: str                          # eng-<hex>
    firm_id: str | None              # white-label branding
    client_id: str | None            # Metis tenant whose corpus to audit
    client_name: str
    archetype: ArchetypeKind
    status: EngagementStatus
    intake: IntakeData | None
    cost: CostTelemetry | None
    findings: FindingCounts          # aggregated counts by status + severity
    findings_key: str                # S3 key for the per-engagement findings.json
    created_at, updated_at, started_at, completed_at, delivered_at

Finding¶

Persisted at s3://{bucket}/{findings_key} (per engagement).

class Finding:
    id: str                          # f-<hex> (raw) or cf-<hex> (canonical)
    engagement_id: str
    primitive: str                   # one of the ten primitives
    severity: Severity               # critical / high / medium / low
    confidence: float                # 0.0-1.0
    description: str
    root_cause: str
    evidence: list[EvidenceCitation] # verbatim quotes anchored to source
    remediation: RemediationFraming  # scope, hours, dependencies, risk
    status: FindingStatus            # pending / accepted / rejected / refined
    auditor_notes: str               # partner's notes on the finding

    # Stage E.5 metadata
    is_canonical: bool               # produced by consolidation
    merged_finding_ids: list[str]    # raw findings that this canonical swallowed
    primitive_angles_agreed: list[str]
    corroboration_score: float

    # Stage F.5 metadata
    filter_status: str               # pending / definitive / likely / speculative / rejected
    filter_rationale: str
    filter_overridden_by: str | None # rule id when override ruleset upgraded

Firm (Phase 2.5 white-label)¶

Persisted at s3://{bucket}/auditforge/firms.json.

class Firm:
    id: str
    display_name: str
    short_name: str
    tagline: str
    logo_url: str
    primary_color: str
    accent_color: str
    methodology_disclaimer: str
    footer_text: str
    confidentiality_notice: str
    default_archetype: str
    default_budget_cents: int

When an engagement has no firm or its firm was deleted, the deliverable renderer falls back to firm-default (neutral AuditForge branding).

Storage layout¶

s3://{APP_BUCKET}/
  auditforge/
    engagements.json                              # all engagement records
    firms.json                                    # all firm branding profiles
    engagements/{engagement_id}/findings.json     # per-engagement findings
    deliverables/{engagement_id}.json             # cached JSON deliverable
    deliverables/{engagement_id}.md               # cached Markdown
    deliverables/{engagement_id}.docx             # cached DOCX
    deliverables/{engagement_id}-methodology.md   # cached methodology
    audit_logs/{engagement_id}/shard-NNNNN.jsonl  # per-call audit log shards
  {client_id}/                                    # Metis tenant corpus index
    index/index.faiss
    index/metadata.pkl
    index/bm25.pkl
    index/supersession.json

The shared bucket uses prefix isolation by default. Per-engagement bucket isolation is supported behind the AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET=true feature flag — when enabled, each new engagement gets a dedicated S3 bucket (metis-af-{eng-short}-{account-id}) with versioning, AES-256 encryption, and a strict public-access block. See 20-per-engagement-s3-isolation.md.

Cost governance¶

CostBudget (in app/auditforge/llm.py) tracks cumulative compute spend per engagement against a hard cap.

Threshold	Behavior
0–80%	Normal operation, requested tier always honored
80–92%	Graceful downshift: REASONING_HIGH → REASONING_MID, REASONING_MID → MECHANICAL
92–100%	Strict downshift: REASONING_HIGH and REASONING_MID both pinned to MECHANICAL
≥100%	`BudgetExceeded` raised; pipeline aborts cleanly with partial deliverable

The --max-questions (CLI) / max_questions_per_iteration (API) flag adds an additional cap on the investigate stage to keep cost predictable for synthetic-corpus validation runs.

LLM clients¶

LLMClient is a multi-provider async wrapper with concurrency cap, retries on transient errors (rate-limit, 5xx, timeout), and per-call cost telemetry.

Tier	Model	Used for
`REASONING_HIGH`	`claude-opus-4-7`	Adversarial verification, consolidation, filter, executive summary, pattern detection
`REASONING_MID`	`claude-sonnet-4-6`	Catalog generation, investigation, synthesizer, validator, intake extraction
`MECHANICAL`	`gpt-4o-mini`	Cluster labeling, simple text extraction, low-stakes parsing

Anthropic credentials come from ANTHROPIC_API_KEY. OpenAI uses AUDITFORGE_OPENAI_API_KEY (with optional AUDITFORGE_OPENAI_BASE_URL for self-hosted endpoints) so it can be decoupled from the chat path's OPENAI_API_KEY (which may point at Groq for chat).

REST API surface¶

See api-reference.md for the full endpoint catalog. High-level grouping:

Engagements: create, list, get, delete, intake, run, stream, findings, deliverable
Findings: accept, reject, refine, edit, investigate-further, search across engagements
Firms (white-label): list, get, create, update, delete
Intake helper: extract-from-description (AI-assisted intake)

All endpoints gated by x-admin-token. Per-user auth (replacing the shared admin token) is an open hardening item.

Frontend¶

React SPA, served by FastAPI as static files from frontend/dist. View switching via ?view=auditforge query param (matches PilotForge precedent).

Component	Purpose
`AuditForgeLanding.tsx`	Public marketing page + token gate
`AuditForge.tsx`	Shell with Engagements / Firms tab toggle
`EngagementList.tsx`	Portfolio dashboard — summary cards + filter chips + grid
`EngagementDetail.tsx`	Header + live progress banner + finding list / detail two-column
`FindingList.tsx`	Severity / status / primitive filterable list
`FindingDetail.tsx`	Full finding view + accept/reject/refine/edit + investigate-further
`FirmManagement.tsx`	Firm CRUD with logo / color / disclaimer
`NewEngagementForm.tsx`	Create-engagement form with AI-assisted intake
`FindingsSearch.tsx`	Cross-engagement findings search with filters
`useEngagementStream.ts`	SSE consumer for live progress events

Deployment¶

Production runs as a single FastAPI process on AWS ECS Fargate (Graviton2 / ARM64) behind an ALB, with the React SPA served as static files from the same container. ECR holds the image. ECS service auto-deploys on aws ecs update-service --force-new-deployment. CD via .github/workflows/deploy.yml.

Resource	Identifier
AWS account	741783034843
Region	us-east-1
ECS cluster / service	metis-demo
ECR repo	metis-demo
S3 bucket	mobilemetis-metis-indexes-741783034843
Domain	metis-demo.base2ml.com

See deployment.md for the runbook.