Per-engagement S3 bucket isolation (Phase 7)¶

What¶

Each engagement gets its own dedicated S3 bucket. Source documents, findings, deliverables, and audit logs for engagement A are stored in a different physical bucket than engagement B's. A bug or misconfiguration affecting one engagement cannot expose another's data.

Behind a feature flag — AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET=true. Default off so local dev and existing engagements keep using the shared platform bucket with prefix isolation.

Why this matters¶

Prefix isolation in a shared bucket (today's default) is workable for early customers but doesn't pass the SOC 2 Type 2 bar:

An IAM mistake (bucket policy too permissive, principal granted blanket read access) exposes every engagement at once
An application bug that constructs the wrong key prefix can leak engagement A's findings into engagement B's response
Audit-log review of "who read what" requires parsing application logs, not just CloudTrail S3 access logs
Compliance auditors regularly flag shared-bucket multi-tenancy as inadequate isolation

Per-engagement buckets eliminate the prefix-construction risk class. Each bucket has its own ACL, its own bucket policy, its own CloudTrail-able access pattern. A misconfiguration affects one engagement, not the whole platform.

Architecture¶

engagement-create flow:
  1. Generate engagement_id
  2. If AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET set:
     a. Provision bucket: metis-af-{eng-short}-{account-id}
     b. Apply security controls: versioning, AES256 encryption,
        public-access-block (all four blockPublicAcls/IgnorePublicAcls/
        BlockPublicPolicy/RestrictPublicBuckets = true)
     c. Tag for cost tracking + discovery
  3. Persist engagement.source_bucket = bucket name (or empty)

read/write flow:
  1. Caller has engagement object
  2. FindingsStore.list/get/edit/etc accept optional bucket=...
  3. Pass engagement.source_bucket or None
  4. Store internally: bucket override or fall back to self._bucket
     (which is the shared platform bucket)

engagement-delete flow:
  1. Read engagement.source_bucket
  2. If non-empty, decommission_engagement_bucket():
     a. List all object versions + delete markers
     b. Batch-delete in 1000-object chunks
     c. delete_bucket()
  3. Best-effort — engagement deletion succeeds even if cleanup fails
     (cleanup can be retried offline)

What's stored where¶

Artifact	Per-engagement bucket	Shared bucket
`findings.json`	✅ when flag on	✅ otherwise
Cached deliverable (`{id}.json`, `.md`, `.docx`, `-methodology.md`)	✅ when flag on	✅ otherwise
Audit log shards (`shard-NNNNN.jsonl`)	✅ when flag on	✅ otherwise
`engagements.json` (engagement index)	—	✅ always (it's the index)
`firms.json` (firm branding)	—	✅ always (firm-level, not engagement-level)
Per-tenant corpus index (`{client_id}/index/*`)	—	✅ always (Metis tenant data, not engagement data)

The engagement index lives in the shared bucket because it's how the system finds engagements. Per-engagement buckets are provisioned for engagements; the index that points at them is platform-level.

Bucket name format¶

metis-af-{engagement_short}-{aws_account_id}

example: metis-af-3f06f14ea94a-741783034843

Total length: 32 characters (metis-af- + 12 + - + 12). Well under S3's 63-character limit. Account ID at the end gives global uniqueness.

Feature flag¶

Environment variable: AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET

Accepted values (case-insensitive): 1, true, yes enables. Any other value (including unset) keeps the shared-bucket default.

For production rollout: set the flag on the ECS task definition. Existing engagements (created before the flag flipped) continue to work — source_bucket is empty on their record, code path falls back to the shared bucket. Only new engagements get dedicated buckets.

Migration path for existing engagements (manual)¶

Today: existing engagements use the shared bucket with prefix isolation. To migrate one to a dedicated bucket:

Provision the bucket: call provision_engagement_bucket(engagement_id) from a script

Copy artifacts:

aws s3 sync s3://shared-bucket/auditforge/engagements/{id}/ s3://metis-af-{id}-{acct}/auditforge/engagements/{id}/
aws s3 sync s3://shared-bucket/auditforge/deliverables/{id}* s3://metis-af-{id}-{acct}/auditforge/deliverables/
aws s3 sync s3://shared-bucket/auditforge/audit_logs/{id}/ s3://metis-af-{id}-{acct}/auditforge/audit_logs/{id}/

Update the engagement record: set source_bucket to the new bucket name
(Optional) Delete the artifacts from the shared bucket once confirmed working

A one-shot migration script is on the roadmap. For now, manual migration is fine — most existing engagements are pre-paid-customer test engagements and don't warrant migration.

AWS account limits¶

Default soft limit: 100 buckets per account. Each engagement = 1 bucket → ~100 engagements before requesting a quota increase. AWS routinely raises this limit on request to 1000+ for legitimate use cases.

Per-account quota math: - 100 engagements: default - 1000 engagements: standard quota raise - 10,000+: requires AWS account-team conversation

A firm running 50 engagements/year hits the default limit in ~2 years. Plan for the quota raise as the firm crosses ~30 engagements.

Cost characteristics¶

S3 buckets themselves are free; storage costs are unchanged (data is the same data, just in a different physical bucket). Slight overhead in:

API calls: each bucket-provisioning is ~5 API calls (CreateBucket + 4 security-control PUTs)
IAM: bucket policies add to the IAM evaluation cost on read; trivial at engagement scale
CloudTrail: per-bucket S3 data events are billable separately if enabled; recommend enabling them on per-engagement buckets specifically (not the shared one) to keep costs proportional to high-value data

Code¶

app/auditforge/buckets.py — provisioning, security controls, decommissioning, upload helper
app/auditforge/engagement.py — source_bucket field + provisioning at create-time + decommissioning at delete-time
app/auditforge/findings.py — FindingsStore methods accept bucket=... override
app/auditforge/report.py — deliverable cache lookup + sync respects per-engagement bucket
app/auditforge/runner.py — audit log writer + findings store calls thread engagement.source_bucket
app/auditforge_endpoints.py — endpoint handlers pass eng.source_bucket or None to all store calls

Tests¶

The bucket-threading change is covered by the existing FindingsStore + endpoint tests (105 passing as of Phase 7). Provisioning is exercised manually via the env flag — automated provisioning tests use moto or LocalStack and are an open hardening item.

Open follow-ups¶

Automated provisioning tests with moto / LocalStack
One-shot migration script for existing engagements (scripts/auditforge_migrate_to_per_engagement_bucket.py)
Per-bucket lifecycle policy: auto-archive findings.json older than 90 days to Glacier Deep Archive
CloudTrail S3 data-events enablement template for per-engagement buckets
Bucket policy template restricting principals to a single ECS task role + the engagement's own audit log writer

01-architecture.md — overall storage layout
09-runner-and-store.md — runner internals