Planted Flaws — AuditForge Test Corpus¶

This corpus is synthetic. It simulates a defense contractor's task-order package, with deliberate flaws planted across the document set so we can score AuditForge's recall objectively.

Do NOT distribute this directory as a customer-facing demo — these documents are a scoring rig, not real audit content.

Domain framing¶

Northstar Defense Inc. — a notional mid-tier defense contractor. The corpus represents documents Northstar would assemble for a CMMC L2 pre-assessment audit:

1 master contract (prime contract with the federal agency)
2 task orders under the master
2 subcontracts (one to each of 2 lower-tier vendors)
4 internal policies / SOPs
2 training records / attestations
1 quality assurance — actually missing (planted gap)
1 incident response plan — actually present but in DRAFT status (planted gap-with-caveat)

Total: 12 documents.

Planted flaws (ground truth for scoring)¶

Conflict findings (`conflict_check`)¶

P-1: Cybersecurity training cadence conflict - Master contract clause 3.2: "Contractor shall provide annual cybersecurity awareness training to all personnel." - Subcontract Alpha §5.4: "Subcontractor shall complete cybersecurity training every 90 days." - Subcontract Bravo §5.4: "Subcontractor shall provide annual training, with refresher every 6 months." - Expected finding: Three different cadences for the same requirement. HIGH severity.

P-2: Key Personnel substitution authority conflict - Master contract §7.1: "No substitution of Key Personnel without 30-day prior written notice and Contracting Officer approval." - Project Management SOP §4.2: "Key Personnel substitutions may be made by the Program Manager with internal sign-off." - Expected finding: Internal SOP authorizes substitutions the master forbids. CRITICAL severity.

Coverage gaps (`coverage_check`)¶

P-3: Quality Assurance Plan absent - Master contract §6.1 explicitly requires "a Quality Assurance Plan submitted within 30 days of contract award." - No QAP document is in the corpus. - Expected finding: Required deliverable missing. CRITICAL severity.

P-4: Incident Response Plan in draft status - Master contract §8.2 requires "an approved Incident Response Plan." - The IR Plan in the corpus is marked [DRAFT - NOT APPROVED] in its header. - Expected finding: Required element present but not in approved state. HIGH severity.

P-5: Subcontractor cybersecurity attestation missing for Subcontract Bravo - Master contract §3.4 requires each subcontractor to provide a cybersecurity self-attestation. - Subcontract Alpha has its attestation file. Subcontract Bravo does not. - Expected finding: Coverage gap on one subcontractor. HIGH severity.

Currency / supersession (`currency_check`)¶

P-6: Superseded NIST publication reference - Cybersecurity Policy v3 cites "NIST SP 800-171 r2" as the basis for control mappings. - The corpus's most recent docs (post-2024) should reference r3. - Expected finding: Outdated standard reference. MEDIUM severity.

P-7: Stale ITAR clause language - Subcontract Alpha quotes EAR Part 744 language from a 2021 revision. - The corpus operates in a 2025+ context where Part 744 was amended. - Expected finding: Stale regulatory text. MEDIUM severity.

Consistency / definitional drift (`consistency_check`)¶

P-8: "Controlled Unclassified Information" defined inconsistently - Master contract §2.0: "CUI means information requiring safeguarding under 32 CFR 2002, including For Official Use Only and Sensitive But Unclassified categories." - Cybersecurity Policy v3 §1.1: "CUI is any document marked CONFIDENTIAL or higher." - Expected finding: Two materially different definitions. HIGH severity.

P-9: "Effective Date" defined inconsistently - Master contract §1.0: "Effective Date means the date of last signature on this contract." - Subcontract Alpha §1.0: "Effective Date means the date Subcontractor begins performance, which may differ from contract execution." - Expected finding: Definition drift; downstream date references are ambiguous. MEDIUM severity.

Flow-down failures (`flow_down_check`)¶

P-10: Cybersecurity audit-rights clause not flowed down to Subcontract Alpha - Master contract §4.1 grants the Contracting Officer audit rights over cybersecurity practices. - Subcontract Alpha contains no parallel clause. - Expected finding: Master flow-down clause absent in subcontract. HIGH severity.

P-11: Personnel security clearance flow-down absent in Subcontract Bravo - Master contract §5.0 requires all personnel to hold a SECRET clearance for CUI handling. - Subcontract Bravo §5.0 has no clearance requirement. - Expected finding: Personnel security flow-down absent. HIGH severity.

Citation integrity (`citation_integrity_check`)¶

P-12: Misrepresented FAR clause - Project Management SOP §3.1 states: "Per FAR 52.204-21, contractors are required to provide quarterly cybersecurity reports." - FAR 52.204-21 is the Basic Safeguarding clause; it does NOT specify quarterly reporting. - Expected finding: Citation misrepresented. MEDIUM severity.

P-13: Misidentified NIST publication - Cybersecurity Policy v3 cites "NIST SP 800-53 r5" as the basis for FedRAMP control selection. - For CMMC L2 / NIST 800-171 environments, 800-53 is not the operative standard. (800-171 is.) - Expected finding: Wrong standard cited for the context. LOW severity.

Total flaws by severity¶

Severity	Count
CRITICAL	2 (P-2, P-3)
HIGH	6 (P-1, P-4, P-5, P-8, P-10, P-11)
MEDIUM	4 (P-6, P-7, P-9, P-12)
LOW	1 (P-13)
Total	13

Total flaws by primitive¶

Primitive	Count	Flaws
`conflict_check`	2	P-1, P-2
`coverage_check`	3	P-3, P-4, P-5
`currency_check`	2	P-6, P-7
`consistency_check`	2	P-8, P-9
`flow_down_check`	2	P-10, P-11
`citation_integrity_check`	2	P-12, P-13

Recall scoring¶

After running an audit, compare findings to this ground truth:

Recall = (planted flaws found) / 13
Precision = (planted flaws found) / (total findings produced)
False positives = findings that don't correspond to any planted flaw

Target for first dogfood: ≥70% recall on HIGH+CRITICAL, ≥40% recall overall. Precision is harder to measure (some "false positives" may be real findings the synthetic corpus accidentally introduced); manual review needed.

Notes¶

The corpus is intentionally lightweight — short, clearly-structured documents. Real-world audits will have far more noise. Performance here is an upper bound; real-corpus performance will be lower.
Flaws are deliberately diverse — at least one per primitive — so every primitive gets exercised at least once per audit run.
Some flaws cross-cut (e.g., P-1 and P-2 are both 'conflict' but about different topics) so cluster behavior in Stage F gets tested.
Citation flaws (P-12, P-13) test the LLM's training-data knowledge. External citation verification (Phase 2) would catch these more reliably.

Planted Flaws — AuditForge Test Corpus¶

Domain framing¶

Planted flaws (ground truth for scoring)¶

Conflict findings (conflict_check)¶

Coverage gaps (coverage_check)¶

Currency / supersession (currency_check)¶

Consistency / definitional drift (consistency_check)¶

Flow-down failures (flow_down_check)¶

Citation integrity (citation_integrity_check)¶