Metis Product Overview¶
What is Metis?¶
Metis is an AI-powered internal knowledge assistant that answers questions grounded in an organization's actual documents. Unlike keyword search, which returns links and leaves users to piece together meaning, Metis reads across your document library to surface patterns, resolve ambiguity, and synthesize answers with traceable citations. Every response includes a confidence level, a clear evidence basis, coverage gaps, and follow-up questions designed to guide the user toward the next decision. When sources contradict each other, Metis detects the conflict and presents both positions rather than silently picking one. The result is a system that operates less like a search engine and more like a knowledgeable colleague who has read every document and remembers where each detail came from.
Key Capabilities¶
Hybrid Retrieval¶
Metis combines two complementary search strategies. Semantic search (FAISS vector similarity) captures conceptual matches: a question about "termination procedures" finds documents that discuss "ending an agreement" even when those exact words are absent. Lexical search (BM25 keyword scoring) catches precise terminology: part numbers, policy names, and domain-specific acronyms that vector similarity can miss. The two ranked lists are merged using Reciprocal Rank Fusion, a technique that promotes documents appearing in both channels while preserving strong single-channel hits. This dual approach means the system handles both vague conceptual questions and exact-match lookups without manual tuning.
Cross-Encoder Reranking¶
Raw retrieval returns 20 candidate passages. A cross-encoder model then scores each (query, passage) pair jointly, evaluating how well the passage actually answers the question rather than how similar the embeddings are. This step narrows 20 candidates down to the 5 most relevant, dramatically improving precision over raw vector similarity alone. Because cross-encoders consider the full interaction between query and passage, they catch nuances that bi-encoder retrieval misses, such as negation, conditional statements, and subtle scope differences.
Structured Intelligence¶
Every answer Metis produces includes a structured envelope of metadata beyond the prose response itself. This includes a confidence level (high, medium, or low), a one-sentence evidence basis explaining which documents grounded the answer, scannable key points for readers who need the summary without the detail, explicit coverage gaps identifying what the user asked about but the documents do not address, and suggested follow-up questions crafted as expert guidance rather than generic prompts. This structure transforms a raw LLM response into an accountable, auditable knowledge artifact.
Conflict Detection¶
When retrieved passages span different document ages, types, or modification dates, Metis automatically checks whether the sources give materially different answers to the question. If they do, the system presents each position with its supporting citations rather than silently choosing one. Confidence is capped at medium whenever a conflict is detected, making it immediately clear that the answer requires human judgment. This is particularly valuable in environments where policies evolve over time and legacy documents coexist with current ones.
Conversation Memory¶
Metis maintains multi-turn conversation sessions using a sliding window of the most recent 5 question-and-answer turns, with a 30-minute inactivity timeout. When a follow-up question contains referential language ("What about those?" or "How does that compare?"), the system rewrites the query into a standalone form that retrieval can act on without conversation context. Rewritten queries pass through drift guardrails that reject rewrites introducing entities not present in the conversation, preventing the system from hallucinating context that was never discussed.
Query Decomposition¶
Complex questions that compare, contrast, or ask about multiple topics are automatically detected and split into focused sub-queries. Each sub-query retrieves independently against the full document index, and the results are merged, deduplicated, and globally reranked before synthesis. This ensures that multi-part questions receive balanced coverage across all their components rather than skewing toward whichever topic happens to dominate the vector space.
Living Knowledge Layer (Admin Curation)¶
An admin panel allows authorized users to mark documents as outdated, link them to their replacements, add contextual notes, and override automatic age classifications. These changes take effect immediately without re-ingesting documents or rebuilding the FAISS index; they are stored as a lightweight overrides layer that merges onto chunk metadata at load time. In-memory telemetry tracks which overrides actually surface in answers, giving administrators visibility into whether their curation decisions are reaching users. Manual overrides outrank both regex heuristics and LLM classifiers — they are the deterministic, zero-cost path for keeping a corpus accurate without re-running expensive classification.
Per-User Authentication (V2.0)¶
Real user accounts replace the shared admin token model. Three roles — viewer, editor, owner — with hierarchical permissions and a hardened password store (argon2id with salt-per-record). Server-side sessions support three lifetime presets: 8-hour hard cap, 7-day sliding renewal, or 30-day sliding renewal, configurable per-pilot to match the customer's security posture. Owners invite teammates from inside the pilot UI; password reset and forgot-password flows work end-to-end via Resend transactional email, with copy/paste fallback when email isn't yet configured. A separate super-admin allow-list grants Base2ML staff a 1-hour break-glass impersonation session, every use of which is recorded in a per-pilot audit log visible to the customer's owner — no silent vendor access.
Source-System Connector Layer (V2.1)¶
Metis can now keep its corpus in sync with the customer's source-of-truth document system instead of requiring manual upload. Microsoft 365 SharePoint is the first connector: customer's IT performs a one-time Azure AD app registration with read-only Graph permissions, and the operator pastes the credentials into the in-pilot connector panel. The system verifies credentials immediately, then runs scheduled diff-based syncs at the cadence the operator chose — manual, hourly, daily, or weekly. Each sync identifies adds, updates, and deletes against the previous manifest, downloads only what changed, and triggers a re-ingest. A lightweight_sync knob (default ON) skips LLM classifiers on scheduled runs to keep daily-rebuild costs under $1/month per pilot for corpora up to 10,000 documents. Live cost projection appears in the connector setup UI so operators see the cost impact before saving.
Coverage-Gap Surfacing (V1.7.7)¶
The health report doesn't just list low-confidence questions individually — it clusters them by topic to surface documentation gaps the corpus consistently fails to answer. Topics with three or more queries and at least 50% low-confidence rate are flagged as actionable documentation targets, with severity badges (critical for high-volume + high-failure-rate; warning for borderline). The PDF and the in-product Reports panel both show the cluster view, ranked by impact, with example questions per topic.
Notifications (V2.1.4)¶
Operators stop polling the UI to find out what happened. Eight events are emitted to per-pilot configurable channels (email + Slack incoming-webhook): ingest completion, ingest failure, sync completion, sync failure, conflict detection, pilot expiring, pilot expired, and super-admin impersonation. The impersonation channel cannot be disabled — security audit notifications are non-removable. Slack webhooks are customer-supplied, so notifications post to the customer's own workspace.
Multi-Tenancy¶
Each client organization receives isolated FAISS and BM25 indexes stored in S3 under a client-specific prefix. Client identity is resolved from the request host header: subdomain-based routing for production deployments (e.g., acmelaw.base2ml.com) and URL-parameter routing for demo environments. An LRU cache with per-client load locks prevents memory growth from unbounded index loading and protects against cache stampedes when multiple concurrent requests hit a cold client simultaneously.
Structure-Aware Chunking¶
During ingestion, Metis detects document structure including markdown headings, numbered section labels (Section 1, Phase 2, Step 3), ALL-CAPS header lines, and colon-terminated labels (Coverage:, Warranty Claims:). Chunks respect these boundaries and carry the section title as metadata, so retrieved passages arrive with their original structural context intact. This means the LLM sees not just the text but where in the document it came from, and the frontend can render meaningful source cards.
Document Type and Age Classification¶
Every document is automatically classified by type (policy, SOP, contract, proposal, checklist, manual, template, compliance, safety, billing, training, warranty, dispatch, and others) and by age (current, legacy, or draft). Classification uses path-based and content-based heuristics with no LLM calls, keeping ingestion fast and deterministic. These labels feed into conflict detection gating, source card rendering, and admin panel filtering.
How It Works (High-Level Flow)¶
- A user asks a question in the chat UI.
- The session store loads any existing conversation history for the active session.
- If the question contains referential signals (pronouns, phrases like "what about those"), the query is rewritten into a standalone form using the conversation context.
- If the rewritten query is a complex multi-part question, it is decomposed into 2-3 focused sub-queries.
- Each sub-query runs through hybrid retrieval: semantic search (FAISS) and lexical search (BM25) each produce ranked candidate lists, which are merged via Reciprocal Rank Fusion into a single list of approximately 20 candidates.
- A cross-encoder reranks the merged candidates, selecting the top 5 most relevant passages. For decomposed queries, sub-query results are merged, deduplicated, and globally reranked.
- If the top passages span different document ages, types, or modification dates, a conflict detection pass checks whether the sources give materially different answers.
- The LLM synthesizes an answer from the retrieved context and conversation history, producing a structured JSON response with inline citation markers.
- The structured response is parsed into its component fields: answer text, confidence level, evidence basis, key points, coverage gaps, follow-up questions, and per-source explanations.
- The answer streams to the chat UI via Server-Sent Events, with the answer text appearing token-by-token followed by metadata, source cards, and confidence indicators in a staged reveal.
Supported Document Types¶
Metis accepts .txt, .md, .pdf, and .docx files. Documents are automatically loaded, classified by type and age, chunked with structure awareness, and indexed into both FAISS (semantic) and BM25 (lexical) stores. No manual tagging or format-specific configuration is required.
Architecture at a Glance¶
The backend is a Python FastAPI application that serves both the API and the built frontend as static files from a single deployment. The frontend is a React application built with Vite and styled with Tailwind CSS. Retrieval runs against per-client FAISS vector indexes and BM25 keyword indexes, with a cross-encoder reranker (ms-marco-MiniLM) for passage scoring and an embedding model (all-MiniLM-L6-v2) for vector encoding. The LLM layer accepts any OpenAI-compatible API endpoint, currently configured for Groq. Client indexes and override files are stored in S3, downloaded on demand, and cached locally with LRU eviction. The application runs on AWS ECS Fargate behind an Application Load Balancer with ACM-managed TLS certificates. Infrastructure is defined in Terraform, covering networking, compute, storage, DNS, IAM, and load balancing.