AuditForge — Investor Brief¶
Last updated: 2026-05-08
A condensed snapshot of AuditForge for fundraising conversations: product state, market thesis, traction, technical bets, and capital ask framing.
TL;DR¶
Corpus-agnostic deep-audit engine for audit and advisory firms. Frontier LLMs finally compress document-review portions of an audit from weeks to hours; AuditForge is the first product built specifically for partner-review-as-the-product (rather than blind LLM-output-as-the-product). Working product live at https://metis-demo.base2ml.com/?view=auditforge. Built by a sole founder over ~6 weeks in 2026; ICP is mid-market CPA / consulting firms; pricing target is $50–100K/yr platform + $25–75K per-engagement variable to firm.
Market thesis¶
The problem. Document-review-heavy audits (compliance pre-assessments, contract reviews, M&A due diligence) require an associate army to read thousands of pages and a senior partner to review their work. Mid-market firms can't afford the headcount; Big-4 charge premium prices and book 8–12 weeks. Both leave money on the table for the kind of deep audits a regulatory environment increasingly demands.
The shift. Frontier-model reasoning (Claude Opus 4.7, GPT-5-class) crossed the threshold in 2024 where document-grounded reasoning at attorney-junior quality is reliable enough that partner-time-multiplied becomes a real product motion. The bottleneck shifts from "can the AI read this?" to "can the partner trust the AI's read enough to put their firm's name on it?"
The product motion. Build for partner trust, not for autonomy. Every finding cites verbatim source quotes; every reasoning step is logged; the deliverable carries the firm's brand. The partner reviews, edits, and signs — same as today, but on a 2-week clock instead of an 8-week one.
The market sizing. - US: ~50K CPA firms, ~10K with audit practices, ~3K mid-market+. Adjacent: consulting firms with audit lines, internal-audit teams at large enterprises, in-house compliance functions at regulated industries. - Average mid-market firm runs 50–200 document-heavy audits / year. - AuditForge captures $5–15K per engagement on the firm side. At even 30 engagements/year per firm × $10K = $300K ARR per firm at maturity. - 100-firm portfolio → $30M ARR. 1000-firm portfolio (national footprint) → $300M ARR. Both achievable on existing market structure.
ICP¶
Primary: Mid-market regional CPA firms and consulting boutiques with audit practices doing CMMC pre-assessment, SOC 2 readiness, FedRAMP advisory, contract compliance reviews. Specifically firms whose partners want to take on more audit volume but can't proportionally grow their associate bench.
Secondary: Big-4 audit groups looking to standardize their document-review tooling. Higher revenue-per-firm but longer sales cycle and more scrutiny.
Tertiary: Internal audit teams at large enterprises (esp. defense contractors, healthcare systems, financial services). Direct-to-enterprise rather than via firm.
The original ICP framing is in ~/.claude/projects/-Users-christopherlindeman-Projects-Metis/memory/project_metis_icp.md.
Product state (May 2026)¶
Live in production (last update: 2026-05-08): - Full seven-stage pipeline (profile, catalog, synthesize, validate, investigate, consolidate, deepen, filter, report) - 10 audit primitives (conflict, coverage, currency, consistency, flow-down, citation-integrity, temporal, quantitative, obligation, ambiguity) - 4 engagement archetypes (capability, remediation, premium, monitoring) - White-label deliverable rendering (firm logo, colors, methodology disclaimer, footer) - Adversarial verification per finding - Upgrade-only filter override ruleset (protects against false negatives via regulatory-pattern matching) - Web UI: portfolio dashboard, cross-engagement findings search, finding review/accept/reject/refine/edit, investigate-further per finding, AI-assisted intake - Live progress streaming via SSE - DOCX / Markdown / JSON deliverable export - Server-side persistence with S3-backed lazy-load + best-effort upload - AWS ECS Fargate deployment with continuous deployment via GitHub Actions - ~10-page methodology white paper for procurement-review
Test corpus validated: - Synthetic 12-document defense-contractor corpus with 13 planted flaws - Run 10 produced 75 raw findings → 17 canonical findings → 12 critical / 5 high (no medium / low) - 38% planted-flaw recall on the validated runs (limited by pre-fix bugs, not engine quality) - Total cost ~$7 per audit on this corpus
Hardening items — substantively complete (awaiting external audit kickoff):
- ✅ Per-user authentication (Phase 8) — argon2 passwords, opaque session tokens
- ✅ Per-engagement firm scoping (Phase 9) — closes ID-enumeration risk; cross-firm access returns 404
- ✅ Per-engagement S3 bucket isolation (Phase 7) — behind feature flag, default off for backward compat. Phase 18 migration script (scripts/auditforge_migrate_buckets.py) moves existing engagements into isolated buckets without disrupting reads.
- ✅ TOTP MFA (Phase 11) — RFC 6238 + 10 single-use backup codes; works under SES sandbox
- ✅ Brute-force lockout (Phase 12) — 5 fails / 15 min → 15 min lock; bounds password guessing
- ✅ Fine-grained roles (Phase 15) — admin / partner / associate; associate is read-only, enforces partner-reviews-before-decision discipline
- ✅ Admin recovery (Phase 16) — clear-MFA / clear-lockout / reset-password endpoints close the SOC 2 readiness gap where a user who lost their TOTP device or forgot their password had no recovery path
- ✅ Engagement freeze on deliver (Phase 20) — once a partner marks a deliverable as handed to the client, finding mutations are blocked until an admin unfreezes; the chain-of-custody break is logged. Procurement-grade defensibility against the "what if you change findings after we approved them?" question.
- ✅ Encrypted at rest (S3 default AES-256) + in transit (ALB HTTPS, ACM cert)
- ✅ Per-engagement audit log of every LLM call
- Rate-limiting per token
- Pagination on long-list endpoints
- Cross-engagement search and analytics
Technical bets¶
Permissive funnel + aggressive downstream filter. Stages B–E over-generate; Stage E.5 consolidates by root cause; Stage F.5 filters with upgrade-only override. The cost of a missed finding is far worse than the cost of an investigated duplicate. Cross-primitive agreement on the same root cause is treated as the strongest possible corroboration signal.
Composable primitives, not pre-built vertical packs. Adding a new audit type (HIPAA, ISO 27001, ABA Model Rules) means tuning the catalog priorities and intake prompts — not writing thousands of lines of vertical-specific rules. The intake captures the framework; the engine adapts.
Compound knowledge as moat. Cross-engagement findings search (Phase 5) and portfolio clustering (Phase 10) mean a firm's first 5 audits seed a corpus of partner-validated judgment that informs every subsequent audit. The 10th audit sees the prior 9; the 100th sees the prior 99. The Patterns view in particular surfaces recurring themes across the portfolio with suggested standard remediation language — the firm productizes its judgment into reusable playbooks. Cluster diff over time (Phase 22) makes this evolution legible: each Recompute captures a snapshot, and the next Recompute's diff shows new themes that emerged, grew, or got resolved. Engagement templates (Phase 23) close the loop — once a partner refines an intake on a CMMC audit, "Save as template" turns that into reusable starter for every subsequent CMMC engagement at that firm. Over time the firm's accumulated decisions, patterns, and templates become a structured reference asset they can't easily abandon — switching costs go up with each engagement.
Frontier models for reasoning, cheap models for plumbing. Opus 4.7 for adversarial verification, consolidation, filter, exec summary; Sonnet 4.6 for catalog/investigate; gpt-4o-mini for cluster labeling. Cost-per-audit currently ~$5–50 depending on corpus size.
Frontend = partner workbench, not API console. The product is a polished dashboard with portfolio analytics, severity-weighted finding lists, two-column review surface, inline edit, white-label export. The partner doesn't see the API.
Self-serve onboarding (Phase 25–27). Closes the critical "demo to billable" gap: a partner with a directory of PDFs goes from "I have files on my laptop" to "I'm running an audit" in 5–10 minutes via drag-and-drop, no Base2ML hand-holding. Removes the ~4 hour founder tax per prospect and the 3–5 day wall-clock onboarding delay that would have throttled GTM scaling.
What we got right¶
- Pivoting from "discriminate at catalog time" (false-negative-prone) to "permissive funnel + aggressive filter" (corroboration-as-signal) on Day 7 of dogfooding. Live testing surfaced the architectural mistake; founder authorized the pivot in one decision.
- Building white-label from Phase 2.5 instead of deferring. The deliverable is the product; the partner's name on it is the value.
- Methodology white paper as a Day-1 artifact, not a Day-100 one. Procurement reviewers will demand it; better to write it before than after.
- Adversarial verifier and filter ruleset designed to never silently reject a finding. Audit firms can't tolerate false negatives in the way an autonomous-agent product can.
What we're still figuring out¶
- Exact pricing. The $50–100K/yr platform license is grounded in firm revenue-per-audit, but no firm has paid yet. First paid engagement target: a friendly partner pilot in late 2026.
- GTM motion. Direct outbound to mid-market CPA firms vs. partnership channel via existing audit-tech vendors. Both being explored.
- Vertical specialization vs. horizontal engine. Resisting building vertical packs (NIST/CMMC, HIPAA, etc.) — the corpus carries the framework. But sales motion may pull toward "the CMMC audit tool" rather than "the corpus-agnostic engine."
Capital ask framing¶
Bootstrap path (where we are): self-funded sole founder, AWS credits ~$1000 remaining, Anthropic + OpenAI credit balances ~$10–20 each at the moment. Sufficient to land first paid engagement in Q3 2026 if focused.
Seed scenario ($500K–1M): 12-month runway extension, dedicated frontend designer (in-house design system per the original plan), 1 sales-engineer hire to manage partner-firm onboardings, SOC 2 Type 1 audit + Type 2 prep, dedicated AWS infra per partner firm. Funds the path from first paid engagement to repeatable sales motion (5–10 firms paying).
Series A scenario ($3–8M, 12–18 months out): national footprint, second-tier vertical specializations, enterprise tier with SSO/SAML, dedicated client-success team, ~50-firm portfolio.
Company¶
Base2ML LLC — Pittsburgh-based, founded 2026. AuditForge is the company's primary product, the next iteration after PilotForge (a fixed-question SMB demo product capped at $10K/engagement — useful for capability demos, not the actual product motion).
Direct: chris@base2ml.com.
Appendix: live URLs¶
- Product: https://metis-demo.base2ml.com/?view=auditforge
- Methodology white paper:
docs/auditforge/methodology-white-paper.md - Architecture reference:
docs/auditforge/01-architecture.md - API reference:
docs/auditforge/api-reference.md - User manual:
docs/auditforge/user-manual.md - Sales one-pager:
docs/auditforge/sales-one-pager.md