Skip to content

Stage G — Report

Status: ✅ Complete (JSON + Markdown deliverable) File: app/auditforge/report.py Tests: tests/test_auditforge_report.py — 20 cases passing

PDF generation at executive-deck quality is deferred to a focused Phase 2 commit. PDF without a real design system is worse than no PDF for AuditForge's polish bar (Microsoft/Oracle-grade trust signal).

Purpose

Stage G produces the engagement deliverable — the artifact the firm hands to its client. v1 outputs three artifacts:

  1. JSON — machine-readable engagement record. Consumed by downstream tooling, dashboards, exports, and integrations. Captures the full engagement state.
  2. Markdown — humanly-readable narrative. The partner reads this, edits it in their preferred editor, and drops the content into the firm's branded template tool.
  3. Methodology text — standalone description of the audit method, evidence anchoring, and auditor's role. Defensibility material the firm can hand to its client's GC.

PDF generation is deliberately deferred. Building partner-grade PDFs requires real design system work (custom typography, branded covers, pixel-precise layout). A reportlab quick-and-dirty PDF would damage the polish signal. Phase 2 lands PDF properly.

Output: Deliverable

@dataclass
class Deliverable:
    json_text: str = ""           # in-memory copy (always populated)
    markdown_text: str = ""
    methodology_text: str = ""
    json_path: str | None = None  # disk paths populated when output_dir provided
    markdown_path: str | None = None
    methodology_path: str | None = None

Pipeline

engagement, findings, deepen
generate_executive_summary  ── LLM (REASONING_HIGH, Opus 4.7)
build_methodology_text       ── pure (templated)
build_json_deliverable       ── pure (full state serialization)
build_markdown_deliverable   ── pure (archetype-aware narrative)
[optional] write to output_dir
       Deliverable

Executive summary (generate_executive_summary)

The most important LLM output of the audit. The partner reads this and signs off on it. Uses REASONING_HIGH (Opus 4.7) — the only place in the report stage that does. Targets 4–8 sentences of clear professional prose, leading with the most material risk and closing with the single most important next step.

System prompt enforces: - Audience: client's CFO/GC/sponsor - No marketing language, no superlatives, no surprised tone - Theme-level synthesis, not finding restatement - Plain prose, no headers / bullets

Empty findings → returns a canned no-issues paragraph (no LLM call). LLM failure → empty summary; markdown rendering shows a placeholder.

JSON deliverable (build_json_deliverable)

Pure function. Captures full engagement state in machine-readable form. Top-level shape:

{
  "version": 1,
  "engagement": {id, firm_id, client_name, archetype, status, timestamps},
  "intake": {domain, audit_purpose, frameworks, focus_areas, ...},
  "cost": {budget_cents, spent_cents, by_stage, by_model},
  "summary": {total_findings, by_severity, executive_summary},
  "patterns": [...],
  "clusters": [{cluster_id, finding_ids, ...}],
  "findings": [{id, primitive, severity, ..., evidence, remediation}]
}

Findings sorted by severity (CRITICAL → HIGH → MEDIUM → LOW) for predictable consumer ordering. Evidence chains preserve doc/section/page/ chunk_id/quote/score so consumers can trace each citation.

Markdown deliverable (build_markdown_deliverable)

Section order:

  1. Header — engagement id, archetype, date
  2. Executive Summary — partner-reviewed prose
  3. At a glance — severity counts table
  4. Systemic Patterns (if any) — cross-finding patterns from Stage F
  5. Remediation Roadmap (REMEDIATION_PIPELINE archetype only) — sellable scope-of-work table with effort estimates and severity ordering
  6. Findings (Detailed) — per-finding with description, root cause, verbatim evidence quotes (rendered as blockquotes), remediation
  7. Methodology — appendix
  8. Engagement Metadata — id, firm, archetype, compute spend

Archetype-specific variations

Archetype Markdown variation
Capability + Leverage Base template (no extra section)
Remediation Pipeline Adds "Remediation Roadmap" section after patterns
Premium / Defensibility Base template (Phase 2: add defensibility statement appendix)
Continuous Monitoring Base template (Phase 2: add period-over-period comparison section)

For v1, only the most distinctive archetype variation (Remediation Pipeline's roadmap) is implemented. Other archetypes ship base template with archetype-specific sections landing in Phase 2.

Evidence rendering

Each finding's evidence quotes render as Markdown blockquotes anchored to their source:

**Evidence**

- _Master Contract — 3.2 (page 4)_
  > Contractor shall provide annual cybersecurity training.
- _Subcontract A — 5_
  > Subcontractor shall complete training every 90 days.

This is what the partner shows their client when defending a finding.

Methodology text (build_methodology_text)

Pure templated text — no LLM call. Renders a partner-grade methodology appendix covering:

  • Provenance — every finding anchored to verbatim corpus quotes
  • LLM use — tiered model approach with audit logging
  • Auditor verification — findings PENDING auditor review; professional judgment is essential
  • Limitations — corpus-time-bounded, semantic relationship inferences, false-positive/negative caveats
  • Configuration snapshot — archetype, audit type, frameworks, budget, spend

This text is what the firm's licensed professional points to when their client asks "how was this generated?"

Top-level (generate_deliverable)

async def generate_deliverable(
    engagement: AuditEngagement,
    findings: list[Finding],
    *,
    deepen: DeepenResult | None = None,
    llm: LLMClient,
    output_dir: str | None = None,
) -> Deliverable

If output_dir is provided, writes {engagement.id}.json, {engagement.id}.md, and {engagement.id}-methodology.md to disk. Always returns the in-memory Deliverable with text fields populated.

Cost shape

Single REASONING_HIGH call (executive summary) at ~$0.05–0.20 per audit depending on findings count. Markdown / JSON / methodology are pure — zero LLM cost.

Total Stage G cost per audit: typically $0.05–0.25.

Test coverage

Area Cases
JSON deliverable 6 (metadata, intake+cost, severity counts, severity-sort, clusters/patterns, evidence)
Markdown deliverable 8 (headers, exec summary, finding+evidence blockquote, roadmap inclusion, roadmap exclusion, patterns, no findings, severity table)
Methodology text 2 (archetype + cost rendered, missing intake handled)
Executive summary 2 (REASONING_HIGH tier used, empty findings → canned)
End-to-end generate_deliverable 2 (writes to disk, no output_dir)

All 20 cases passing. Full suite: 381 pass, no regressions.

Public API

from app.auditforge.report import generate_deliverable

deliverable = await generate_deliverable(
    engagement, findings,
    deepen=deepen_result,
    llm=client,
    output_dir="/path/to/deliverables",
)
print(deliverable.markdown_path)

Known limits / future work — explicit

  • PDF deferred to Phase 2. Requires:
  • Custom design system (typography, palette, motion)
  • Branded cover page per firm
  • Professional charts (severity distribution, cluster diagrams)
  • Pixel-precise layout
  • Accessibility (PDF/UA-compliant tags)

Tooling options under consideration: WeasyPrint for HTML→PDF (likely), reportlab for fully programmatic (current dep, less polished output), or LaTeX (highest polish but heaviest dependency). Phase 2 commit picks one.

  • DOCX deferred to Phase 2. python-docx is in requirements (PilotForge uses it). When PDF lands, DOCX comes alongside as the editable copy.

  • Archetype-specific markdown variations partial. Only REMEDIATION_PIPELINE has a distinctive section (Remediation Roadmap). PREMIUM/DEFENSIBILITY needs a defensibility statement appendix. CONTINUOUS_MONITORING needs period-over-period comparison. CAPABILITY+LEVERAGE may need partner-review checkpoints structure. Phase 2.

  • No firm-branding data. AuditEngagement doesn't carry firm-branding fields (logo, color, footer text). Per-firm branding plumbs in Phase 2 when the firm onboarding flow is built.

  • Executive summary token cap is generous. 900 max_tokens at REASONING_HIGH may produce longer-than-ideal summaries. After first paid engagement we'll tune via prompt iteration based on partner feedback.

  • No localization / multi-language. Output is English-only. International firms need Phase 3 i18n work.