Skip to content

AuditForge Architecture

At a glance

AuditForge is a corpus-agnostic deep-audit engine. It runs against a customer's document corpus (contracts, policies, SOPs, attestations, training records) and produces a partner-billable deliverable: a set of evidence-anchored findings with severity ratings, root-cause framing, remediation scopes, systemic patterns, and an executive summary.

The system is structured first-pass review for a senior partner, not an autonomous decision-maker. Every finding is reviewed by a partner before the deliverable goes to the end client. The deliverable carries the audit firm's brand, not AuditForge's.

                   ┌─────────────────────────────────────────────────────────────┐
                   │                       Web Frontend                          │
                   │   (React SPA at /?view=auditforge — landing + dashboard)    │
                   └───────────────────────────┬─────────────────────────────────┘
                                               │  x-admin-token gated
                   ┌─────────────────────────────────────────────────────────────┐
                   │                FastAPI app/main.py                          │
                   │   (auditforge_router under /auditforge/*)                   │
                   └───────────────────────────┬─────────────────────────────────┘
                                  ┌─────────────────────┐
                                  │   run_audit()       │
                                  │   pipeline driver   │
                                  └──────────┬──────────┘
            ┌────────────┬───────────┬───────┴───────┬───────────┬──────────────┐
            ▼            ▼           ▼               ▼           ▼              ▼
        ┌───────┐    ┌───────┐   ┌───────┐      ┌───────┐   ┌────────┐    ┌────────┐
        │   A   │ →  │   B   │ → │   C   │  →   │   D   │ → │   E    │ →  │  E.5   │
        │Profile│    │Catalog│   │Synth. │      │Valid. │   │Invest. │    │Consol. │
        └───────┘    └───────┘   └───────┘      └───────┘   └────────┘    └────────┘
                                          ┌────────┐    ┌────────┐    ┌────────┐
                                          │   G    │ ←  │  F.5   │ ←  │   F    │
                                          │ Report │    │ Filter │    │ Deepen │
                                          └────────┘    └────────┘    └────────┘
                                              │              │             │
                                              ▼              ▼             ▼
                                      MD / DOCX / JSON   Reject       Patterns +
                                      (white-label)      appendix     follow-up
                                                                      targets

                                                          loop back to B → ... if
                                                          iterations remain

Pipeline stages

Stage What it does LLM tier Doc
A. Profile Corpus metadata, cluster topology, citation graph, stratified sample gpt-4o-mini for cluster labels; Sonnet for domain inference 02-stage-a-profiler.md
B. Catalog Ranked target lists per primitive (concepts, doc-pairs, required-elements, currency rules, defined terms, citation tuples, temporal relations, quantitative facts, obligation checkpoints, ambiguity checkpoints) Sonnet 03-stage-b-catalog.md
C. Synthesize (primitive, target) → concrete scoped Question with prompt template None (deterministic templating) 04-stage-c-synthesizer.md
D. Validate Cheap relevance check + near-duplicate dedupe; drop questions with no corpus purchase Embedding only 05-stage-d-validator.md
E. Investigate Per-question parallel execution: hybrid retrieve → primitive-templated reason → evidence-anchor → adversarial verifier Sonnet for investigation; Opus for adversarial verification 06-stage-e-orchestrator.md
E.5 Consolidate Cluster raw findings by root cause; merge into canonical findings; reclassify primitives; apply cross-primitive corroboration boost Opus 14-stage-e5-consolidation.md
F. Deepen Cross-finding pattern synthesis; generate follow-up targets for next iteration Opus for patterns; Sonnet for follow-up generation 07-stage-f-deepening.md
F.5 Filter Two-pass classifier: LLM rates findings (definitive / likely / speculative / rejected); upgrade-only override ruleset protects against false negatives Opus 15-stage-f5-filter.md
G. Report White-label deliverable rendering — JSON + Markdown + DOCX + Methodology, with firm branding applied Opus for executive summary 08-stage-g-report.md

Stage iteration

The B → C → D → E → F sequence runs as an iteration loop. Stage F generates follow_up_targets from the canonical findings and patterns; the next iteration's Stage B picks them up and seeds a focused round. Iterations terminate on:

  • max_iterations (default 3) reached
  • Convergence: no new high-confidence findings in the last iteration
  • Budget cap: graceful tier downshift at 80% of budget; abort at 92%

E.5 (consolidate) and F.5 (filter) run once at the end of all iterations, not per iteration. The deliverable renders the consolidated, filtered set.

The ten primitives

Each primitive is a structured way to interrogate a corpus. They are composable: a new audit type = a different combination + weighting of the same primitives.

Primitive Pattern
conflict_check Two documents take contradictory positions on the same concept
consistency_check A defined term used with different meanings across documents
coverage_check A required document or control category is absent given the frameworks in scope
currency_check A reference to a superseded standard or stale clause language
flow_down_check A parent-contract obligation not propagated into subcontracts or implementing policies
citation_integrity_check Citation to an authority that does not say what the citing document claims
temporal_check A required temporal precedence is violated or unconstrained
quantitative_check A quantitative fact internally inconsistent or violating an external standard
obligation_check A one-sided obligation when bilateral coverage is expected
ambiguity_check Language ambiguous enough to cause downstream dispute

The primitives' implementations are in app/auditforge/catalog.py (catalog generation) and app/auditforge/orchestrator.py (per-question investigation).

Engagement archetypes

Four pre-tuned configurations for different sales motions:

Archetype Catalog weighting Validator strictness Deliverable framing
Capability + Leverage Balanced Standard Comprehensive, junior-staff-readable
Remediation Pipeline Coverage / flow-down / obligation heavy Standard Each finding sized as remediation scope
Premium / Defensibility Citation / consistency / conflict heavy Tight Evidence-rich, three-corroboration where possible
Continuous Monitoring Currency / temporal heavy Loose (recall>precision) Period-over-period delta findings

Data model

Engagement

Top-level record. Persisted at s3://{bucket}/auditforge/engagements.json.

class AuditEngagement:
    id: str                          # eng-<hex>
    firm_id: str | None              # white-label branding
    client_id: str | None            # Metis tenant whose corpus to audit
    client_name: str
    archetype: ArchetypeKind
    status: EngagementStatus
    intake: IntakeData | None
    cost: CostTelemetry | None
    findings: FindingCounts          # aggregated counts by status + severity
    findings_key: str                # S3 key for the per-engagement findings.json
    created_at, updated_at, started_at, completed_at, delivered_at

Finding

Persisted at s3://{bucket}/{findings_key} (per engagement).

class Finding:
    id: str                          # f-<hex> (raw) or cf-<hex> (canonical)
    engagement_id: str
    primitive: str                   # one of the ten primitives
    severity: Severity               # critical / high / medium / low
    confidence: float                # 0.0-1.0
    description: str
    root_cause: str
    evidence: list[EvidenceCitation] # verbatim quotes anchored to source
    remediation: RemediationFraming  # scope, hours, dependencies, risk
    status: FindingStatus            # pending / accepted / rejected / refined
    auditor_notes: str               # partner's notes on the finding

    # Stage E.5 metadata
    is_canonical: bool               # produced by consolidation
    merged_finding_ids: list[str]    # raw findings that this canonical swallowed
    primitive_angles_agreed: list[str]
    corroboration_score: float

    # Stage F.5 metadata
    filter_status: str               # pending / definitive / likely / speculative / rejected
    filter_rationale: str
    filter_overridden_by: str | None # rule id when override ruleset upgraded

Firm (Phase 2.5 white-label)

Persisted at s3://{bucket}/auditforge/firms.json.

class Firm:
    id: str
    display_name: str
    short_name: str
    tagline: str
    logo_url: str
    primary_color: str
    accent_color: str
    methodology_disclaimer: str
    footer_text: str
    confidentiality_notice: str
    default_archetype: str
    default_budget_cents: int

When an engagement has no firm or its firm was deleted, the deliverable renderer falls back to firm-default (neutral AuditForge branding).

Storage layout

s3://{APP_BUCKET}/
  auditforge/
    engagements.json                              # all engagement records
    firms.json                                    # all firm branding profiles
    engagements/{engagement_id}/findings.json     # per-engagement findings
    deliverables/{engagement_id}.json             # cached JSON deliverable
    deliverables/{engagement_id}.md               # cached Markdown
    deliverables/{engagement_id}.docx             # cached DOCX
    deliverables/{engagement_id}-methodology.md   # cached methodology
    audit_logs/{engagement_id}/shard-NNNNN.jsonl  # per-call audit log shards
  {client_id}/                                    # Metis tenant corpus index
    index/index.faiss
    index/metadata.pkl
    index/bm25.pkl
    index/supersession.json

The shared bucket uses prefix isolation by default. Per-engagement bucket isolation is supported behind the AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET=true feature flag — when enabled, each new engagement gets a dedicated S3 bucket (metis-af-{eng-short}-{account-id}) with versioning, AES-256 encryption, and a strict public-access block. See 20-per-engagement-s3-isolation.md.

Cost governance

CostBudget (in app/auditforge/llm.py) tracks cumulative compute spend per engagement against a hard cap.

Threshold Behavior
0–80% Normal operation, requested tier always honored
80–92% Graceful downshift: REASONING_HIGH → REASONING_MID, REASONING_MID → MECHANICAL
92–100% Strict downshift: REASONING_HIGH and REASONING_MID both pinned to MECHANICAL
≥100% BudgetExceeded raised; pipeline aborts cleanly with partial deliverable

The --max-questions (CLI) / max_questions_per_iteration (API) flag adds an additional cap on the investigate stage to keep cost predictable for synthetic-corpus validation runs.

LLM clients

LLMClient is a multi-provider async wrapper with concurrency cap, retries on transient errors (rate-limit, 5xx, timeout), and per-call cost telemetry.

Tier Model Used for
REASONING_HIGH claude-opus-4-7 Adversarial verification, consolidation, filter, executive summary, pattern detection
REASONING_MID claude-sonnet-4-6 Catalog generation, investigation, synthesizer, validator, intake extraction
MECHANICAL gpt-4o-mini Cluster labeling, simple text extraction, low-stakes parsing

Anthropic credentials come from ANTHROPIC_API_KEY. OpenAI uses AUDITFORGE_OPENAI_API_KEY (with optional AUDITFORGE_OPENAI_BASE_URL for self-hosted endpoints) so it can be decoupled from the chat path's OPENAI_API_KEY (which may point at Groq for chat).

REST API surface

See api-reference.md for the full endpoint catalog. High-level grouping:

  • Engagements: create, list, get, delete, intake, run, stream, findings, deliverable
  • Findings: accept, reject, refine, edit, investigate-further, search across engagements
  • Firms (white-label): list, get, create, update, delete
  • Intake helper: extract-from-description (AI-assisted intake)

All endpoints gated by x-admin-token. Per-user auth (replacing the shared admin token) is an open hardening item.

Frontend

React SPA, served by FastAPI as static files from frontend/dist. View switching via ?view=auditforge query param (matches PilotForge precedent).

Component Purpose
AuditForgeLanding.tsx Public marketing page + token gate
AuditForge.tsx Shell with Engagements / Firms tab toggle
EngagementList.tsx Portfolio dashboard — summary cards + filter chips + grid
EngagementDetail.tsx Header + live progress banner + finding list / detail two-column
FindingList.tsx Severity / status / primitive filterable list
FindingDetail.tsx Full finding view + accept/reject/refine/edit + investigate-further
FirmManagement.tsx Firm CRUD with logo / color / disclaimer
NewEngagementForm.tsx Create-engagement form with AI-assisted intake
FindingsSearch.tsx Cross-engagement findings search with filters
useEngagementStream.ts SSE consumer for live progress events

Deployment

Production runs as a single FastAPI process on AWS ECS Fargate (Graviton2 / ARM64) behind an ALB, with the React SPA served as static files from the same container. ECR holds the image. ECS service auto-deploys on aws ecs update-service --force-new-deployment. CD via .github/workflows/deploy.yml.

Resource Identifier
AWS account 741783034843
Region us-east-1
ECS cluster / service metis-demo
ECR repo metis-demo
S3 bucket mobilemetis-metis-indexes-741783034843
Domain metis-demo.base2ml.com

See deployment.md for the runbook.