Skip to content

Per-engagement S3 bucket isolation (Phase 7)

What

Each engagement gets its own dedicated S3 bucket. Source documents, findings, deliverables, and audit logs for engagement A are stored in a different physical bucket than engagement B's. A bug or misconfiguration affecting one engagement cannot expose another's data.

Behind a feature flag — AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET=true. Default off so local dev and existing engagements keep using the shared platform bucket with prefix isolation.

Why this matters

Prefix isolation in a shared bucket (today's default) is workable for early customers but doesn't pass the SOC 2 Type 2 bar:

  • An IAM mistake (bucket policy too permissive, principal granted blanket read access) exposes every engagement at once
  • An application bug that constructs the wrong key prefix can leak engagement A's findings into engagement B's response
  • Audit-log review of "who read what" requires parsing application logs, not just CloudTrail S3 access logs
  • Compliance auditors regularly flag shared-bucket multi-tenancy as inadequate isolation

Per-engagement buckets eliminate the prefix-construction risk class. Each bucket has its own ACL, its own bucket policy, its own CloudTrail-able access pattern. A misconfiguration affects one engagement, not the whole platform.

Architecture

engagement-create flow:
  1. Generate engagement_id
  2. If AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET set:
     a. Provision bucket: metis-af-{eng-short}-{account-id}
     b. Apply security controls: versioning, AES256 encryption,
        public-access-block (all four blockPublicAcls/IgnorePublicAcls/
        BlockPublicPolicy/RestrictPublicBuckets = true)
     c. Tag for cost tracking + discovery
  3. Persist engagement.source_bucket = bucket name (or empty)

read/write flow:
  1. Caller has engagement object
  2. FindingsStore.list/get/edit/etc accept optional bucket=...
  3. Pass engagement.source_bucket or None
  4. Store internally: bucket override or fall back to self._bucket
     (which is the shared platform bucket)

engagement-delete flow:
  1. Read engagement.source_bucket
  2. If non-empty, decommission_engagement_bucket():
     a. List all object versions + delete markers
     b. Batch-delete in 1000-object chunks
     c. delete_bucket()
  3. Best-effort — engagement deletion succeeds even if cleanup fails
     (cleanup can be retried offline)

What's stored where

Artifact Per-engagement bucket Shared bucket
findings.json ✅ when flag on ✅ otherwise
Cached deliverable ({id}.json, .md, .docx, -methodology.md) ✅ when flag on ✅ otherwise
Audit log shards (shard-NNNNN.jsonl) ✅ when flag on ✅ otherwise
engagements.json (engagement index) ✅ always (it's the index)
firms.json (firm branding) ✅ always (firm-level, not engagement-level)
Per-tenant corpus index ({client_id}/index/*) ✅ always (Metis tenant data, not engagement data)

The engagement index lives in the shared bucket because it's how the system finds engagements. Per-engagement buckets are provisioned for engagements; the index that points at them is platform-level.

Bucket name format

metis-af-{engagement_short}-{aws_account_id}

example: metis-af-3f06f14ea94a-741783034843

Total length: 32 characters (metis-af- + 12 + - + 12). Well under S3's 63-character limit. Account ID at the end gives global uniqueness.

Feature flag

Environment variable: AUDITFORGE_PROVISION_PER_ENGAGEMENT_BUCKET

Accepted values (case-insensitive): 1, true, yes enables. Any other value (including unset) keeps the shared-bucket default.

For production rollout: set the flag on the ECS task definition. Existing engagements (created before the flag flipped) continue to work — source_bucket is empty on their record, code path falls back to the shared bucket. Only new engagements get dedicated buckets.

Migration path for existing engagements (manual)

Today: existing engagements use the shared bucket with prefix isolation. To migrate one to a dedicated bucket:

  1. Provision the bucket: call provision_engagement_bucket(engagement_id) from a script
  2. Copy artifacts:
    aws s3 sync s3://shared-bucket/auditforge/engagements/{id}/ s3://metis-af-{id}-{acct}/auditforge/engagements/{id}/
    aws s3 sync s3://shared-bucket/auditforge/deliverables/{id}* s3://metis-af-{id}-{acct}/auditforge/deliverables/
    aws s3 sync s3://shared-bucket/auditforge/audit_logs/{id}/ s3://metis-af-{id}-{acct}/auditforge/audit_logs/{id}/
    
  3. Update the engagement record: set source_bucket to the new bucket name
  4. (Optional) Delete the artifacts from the shared bucket once confirmed working

A one-shot migration script is on the roadmap. For now, manual migration is fine — most existing engagements are pre-paid-customer test engagements and don't warrant migration.

AWS account limits

Default soft limit: 100 buckets per account. Each engagement = 1 bucket → ~100 engagements before requesting a quota increase. AWS routinely raises this limit on request to 1000+ for legitimate use cases.

Per-account quota math: - 100 engagements: default - 1000 engagements: standard quota raise - 10,000+: requires AWS account-team conversation

A firm running 50 engagements/year hits the default limit in ~2 years. Plan for the quota raise as the firm crosses ~30 engagements.

Cost characteristics

S3 buckets themselves are free; storage costs are unchanged (data is the same data, just in a different physical bucket). Slight overhead in:

  • API calls: each bucket-provisioning is ~5 API calls (CreateBucket + 4 security-control PUTs)
  • IAM: bucket policies add to the IAM evaluation cost on read; trivial at engagement scale
  • CloudTrail: per-bucket S3 data events are billable separately if enabled; recommend enabling them on per-engagement buckets specifically (not the shared one) to keep costs proportional to high-value data

Code

  • app/auditforge/buckets.py — provisioning, security controls, decommissioning, upload helper
  • app/auditforge/engagement.pysource_bucket field + provisioning at create-time + decommissioning at delete-time
  • app/auditforge/findings.py — FindingsStore methods accept bucket=... override
  • app/auditforge/report.py — deliverable cache lookup + sync respects per-engagement bucket
  • app/auditforge/runner.py — audit log writer + findings store calls thread engagement.source_bucket
  • app/auditforge_endpoints.py — endpoint handlers pass eng.source_bucket or None to all store calls

Tests

The bucket-threading change is covered by the existing FindingsStore + endpoint tests (105 passing as of Phase 7). Provisioning is exercised manually via the env flag — automated provisioning tests use moto or LocalStack and are an open hardening item.

Open follow-ups

  • Automated provisioning tests with moto / LocalStack
  • One-shot migration script for existing engagements (scripts/auditforge_migrate_to_per_engagement_bucket.py)
  • Per-bucket lifecycle policy: auto-archive findings.json older than 90 days to Glacier Deep Archive
  • CloudTrail S3 data-events enablement template for per-engagement buckets
  • Bucket policy template restricting principals to a single ECS task role + the engagement's own audit log writer