Stage G — Report¶
Status: ✅ Complete (JSON + Markdown deliverable)
File: app/auditforge/report.py
Tests: tests/test_auditforge_report.py — 20 cases passing
PDF generation at executive-deck quality is deferred to a focused Phase 2 commit. PDF without a real design system is worse than no PDF for AuditForge's polish bar (Microsoft/Oracle-grade trust signal).
Purpose¶
Stage G produces the engagement deliverable — the artifact the firm hands to its client. v1 outputs three artifacts:
- JSON — machine-readable engagement record. Consumed by downstream tooling, dashboards, exports, and integrations. Captures the full engagement state.
- Markdown — humanly-readable narrative. The partner reads this, edits it in their preferred editor, and drops the content into the firm's branded template tool.
- Methodology text — standalone description of the audit method, evidence anchoring, and auditor's role. Defensibility material the firm can hand to its client's GC.
PDF generation is deliberately deferred. Building partner-grade PDFs requires real design system work (custom typography, branded covers, pixel-precise layout). A reportlab quick-and-dirty PDF would damage the polish signal. Phase 2 lands PDF properly.
Output: Deliverable¶
@dataclass
class Deliverable:
json_text: str = "" # in-memory copy (always populated)
markdown_text: str = ""
methodology_text: str = ""
json_path: str | None = None # disk paths populated when output_dir provided
markdown_path: str | None = None
methodology_path: str | None = None
Pipeline¶
engagement, findings, deepen
│
▼
generate_executive_summary ── LLM (REASONING_HIGH, Opus 4.7)
│
▼
build_methodology_text ── pure (templated)
│
▼
build_json_deliverable ── pure (full state serialization)
│
▼
build_markdown_deliverable ── pure (archetype-aware narrative)
│
▼
[optional] write to output_dir
│
▼
Deliverable
Executive summary (generate_executive_summary)¶
The most important LLM output of the audit. The partner reads this and signs off on it. Uses REASONING_HIGH (Opus 4.7) — the only place in the report stage that does. Targets 4–8 sentences of clear professional prose, leading with the most material risk and closing with the single most important next step.
System prompt enforces: - Audience: client's CFO/GC/sponsor - No marketing language, no superlatives, no surprised tone - Theme-level synthesis, not finding restatement - Plain prose, no headers / bullets
Empty findings → returns a canned no-issues paragraph (no LLM call). LLM failure → empty summary; markdown rendering shows a placeholder.
JSON deliverable (build_json_deliverable)¶
Pure function. Captures full engagement state in machine-readable form. Top-level shape:
{
"version": 1,
"engagement": {id, firm_id, client_name, archetype, status, timestamps},
"intake": {domain, audit_purpose, frameworks, focus_areas, ...},
"cost": {budget_cents, spent_cents, by_stage, by_model},
"summary": {total_findings, by_severity, executive_summary},
"patterns": [...],
"clusters": [{cluster_id, finding_ids, ...}],
"findings": [{id, primitive, severity, ..., evidence, remediation}]
}
Findings sorted by severity (CRITICAL → HIGH → MEDIUM → LOW) for predictable consumer ordering. Evidence chains preserve doc/section/page/ chunk_id/quote/score so consumers can trace each citation.
Markdown deliverable (build_markdown_deliverable)¶
Section order:
- Header — engagement id, archetype, date
- Executive Summary — partner-reviewed prose
- At a glance — severity counts table
- Systemic Patterns (if any) — cross-finding patterns from Stage F
- Remediation Roadmap (REMEDIATION_PIPELINE archetype only) — sellable scope-of-work table with effort estimates and severity ordering
- Findings (Detailed) — per-finding with description, root cause, verbatim evidence quotes (rendered as blockquotes), remediation
- Methodology — appendix
- Engagement Metadata — id, firm, archetype, compute spend
Archetype-specific variations¶
| Archetype | Markdown variation |
|---|---|
| Capability + Leverage | Base template (no extra section) |
| Remediation Pipeline | Adds "Remediation Roadmap" section after patterns |
| Premium / Defensibility | Base template (Phase 2: add defensibility statement appendix) |
| Continuous Monitoring | Base template (Phase 2: add period-over-period comparison section) |
For v1, only the most distinctive archetype variation (Remediation Pipeline's roadmap) is implemented. Other archetypes ship base template with archetype-specific sections landing in Phase 2.
Evidence rendering¶
Each finding's evidence quotes render as Markdown blockquotes anchored to their source:
**Evidence**
- _Master Contract — 3.2 (page 4)_
> Contractor shall provide annual cybersecurity training.
- _Subcontract A — 5_
> Subcontractor shall complete training every 90 days.
This is what the partner shows their client when defending a finding.
Methodology text (build_methodology_text)¶
Pure templated text — no LLM call. Renders a partner-grade methodology appendix covering:
- Provenance — every finding anchored to verbatim corpus quotes
- LLM use — tiered model approach with audit logging
- Auditor verification — findings PENDING auditor review; professional judgment is essential
- Limitations — corpus-time-bounded, semantic relationship inferences, false-positive/negative caveats
- Configuration snapshot — archetype, audit type, frameworks, budget, spend
This text is what the firm's licensed professional points to when their client asks "how was this generated?"
Top-level (generate_deliverable)¶
async def generate_deliverable(
engagement: AuditEngagement,
findings: list[Finding],
*,
deepen: DeepenResult | None = None,
llm: LLMClient,
output_dir: str | None = None,
) -> Deliverable
If output_dir is provided, writes {engagement.id}.json,
{engagement.id}.md, and {engagement.id}-methodology.md to disk.
Always returns the in-memory Deliverable with text fields populated.
Cost shape¶
Single REASONING_HIGH call (executive summary) at ~$0.05–0.20 per audit depending on findings count. Markdown / JSON / methodology are pure — zero LLM cost.
Total Stage G cost per audit: typically $0.05–0.25.
Test coverage¶
| Area | Cases |
|---|---|
| JSON deliverable | 6 (metadata, intake+cost, severity counts, severity-sort, clusters/patterns, evidence) |
| Markdown deliverable | 8 (headers, exec summary, finding+evidence blockquote, roadmap inclusion, roadmap exclusion, patterns, no findings, severity table) |
| Methodology text | 2 (archetype + cost rendered, missing intake handled) |
| Executive summary | 2 (REASONING_HIGH tier used, empty findings → canned) |
End-to-end generate_deliverable |
2 (writes to disk, no output_dir) |
All 20 cases passing. Full suite: 381 pass, no regressions.
Public API¶
from app.auditforge.report import generate_deliverable
deliverable = await generate_deliverable(
engagement, findings,
deepen=deepen_result,
llm=client,
output_dir="/path/to/deliverables",
)
print(deliverable.markdown_path)
Known limits / future work — explicit¶
- PDF deferred to Phase 2. Requires:
- Custom design system (typography, palette, motion)
- Branded cover page per firm
- Professional charts (severity distribution, cluster diagrams)
- Pixel-precise layout
- Accessibility (PDF/UA-compliant tags)
Tooling options under consideration: WeasyPrint for HTML→PDF (likely), reportlab for fully programmatic (current dep, less polished output), or LaTeX (highest polish but heaviest dependency). Phase 2 commit picks one.
-
DOCX deferred to Phase 2.
python-docxis in requirements (PilotForge uses it). When PDF lands, DOCX comes alongside as the editable copy. -
Archetype-specific markdown variations partial. Only REMEDIATION_PIPELINE has a distinctive section (Remediation Roadmap). PREMIUM/DEFENSIBILITY needs a defensibility statement appendix. CONTINUOUS_MONITORING needs period-over-period comparison. CAPABILITY+LEVERAGE may need partner-review checkpoints structure. Phase 2.
-
No firm-branding data. AuditEngagement doesn't carry firm-branding fields (logo, color, footer text). Per-firm branding plumbs in Phase 2 when the firm onboarding flow is built.
-
Executive summary token cap is generous. 900 max_tokens at REASONING_HIGH may produce longer-than-ideal summaries. After first paid engagement we'll tune via prompt iteration based on partner feedback.
-
No localization / multi-language. Output is English-only. International firms need Phase 3 i18n work.