ContextWeaver — Identity-Aware Agentic AI Platform

One platform where identity, credentials, knowledge, and policy enforcement follow every action — regardless of which AI model powers it, which cloud runs it, or how you access it. Click any block for details.

📡 Multi-Channel Access

🖥️ Browser Chat UI — Current. Full-featured admin + user views with SSE streaming, session history, Visual Designer.

📱 Mobile App / REST API — The POST /api/agent/chat/sync endpoint already returns JSON — any mobile app can call it. Future: dedicated mobile SDKs (iOS/Android), push notifications for workflow approvals.

🎤 Voice Assistants — Future: POST /api/agent/voice endpoint. Flow: Speech-to-Text (Azure/Whisper) → Agent Loop → Text-to-Speech (Azure TTS) → audio response. Siri Shortcuts, Alexa Skills, Google Actions can all call the REST API.

🔗 Webhooks — Already supported for workflow triggers. Any external system can kick off an agent action.

Key: All channels authenticate the same way (OIDC/SSO token) and pass through the same 5 security layers. Typing, tapping, or talking — the security is identical.

👤 User Interaction

Entry point: User types a question in the Chat UI (browser)

Identity source: Entra ID / OIDC session → session["user"]

What flows forward: Question text + selected MCP server IDs

What comes back: SSE event stream — tokens, tool calls, citations, final answer

🧠 Cerebro App

File: app.py lines 4495–4720

Endpoints: POST /api/agent/chat (streaming) · POST /api/agent/chat/sync (JSON)

Step 1: Extract identity — _current_user_email(), _user_max_role(), _current_user_groups()

Step 2: Create/resume chat session in Cosmos DB

Step 3: Call agent_mod.agent_chat(question, server_ids, ...user_email, user_role, user_groups)

Step 4: Stream events back to browser via SSE (/api/stream/<sid>)

🔄 Agent Loop

File: agent.py lines 987–1333

Entry: agent_chat(question, server_ids, ...)

Step 1: Sync user identity to all MCP servers via _sync_user_to_server() → POST /api/set-user

Step 2: Discover tools from each server via MCP protocol — discover_tools() → ClientSession.list_tools()

Step 3: Build tool registry mapping tool_name → {server_url, parameters}

Step 4: Build system prompt with available tools + RAG context + policies

Step 5: Enter loop: LLM call → if tool_calls → execute each → feed results back → repeat until no more tool_calls

Max iterations: 10 (prevents infinite loops)

🤖 LLM Backend (Pluggable)

Current: Azure OpenAI GPT-4o — azure_clients.get_active_openai_client()

Planned backends:

→ Claude (Anthropic) — Extended thinking, parallel tool calls, large context windows. Paid API ($3-75/1M tokens).

→ Ollama (Local) — Free, runs Llama/Mistral/Qwen locally. Already in sim_mode. Great for development and air-gapped deployments.

→ Google Gemini — Free tier available, good tool use support. Planned.

How it works: All backends produce the same output format — tool_calls[] array. The agent loop doesn't care which model produced them. Switch via LLM_BACKEND env var or per-engine in the Visual Designer.

Key insight: The LLM is a commodity brain — swappable. The 5 security layers wrapping every tool call are the real differentiator. No other platform has that.

🔒 Identity Injection

File: agent.py lines 1253–1267

What happens: After the LLM returns tool_calls, before execution:

→ func_args["_user_email"] = user_email

→ func_args["_user_role"] = user_role

→ func_args["_user_groups"] = user_groups

→ HTTP header: X-User-Email: sarah@acme.com

→ SSO token: X-SSO-Token: base64(payload).hmac_sig

Why both args + headers? FastMCP strips unknown kwargs (args starting with _), so identity must also flow via HTTP headers + ContextVar fallback.

📡 MCP Transport

File: agent.py lines 839–890

Protocol: MCP (Model Context Protocol) by Anthropic

Transport options:

→ Streamable HTTP (streamablehttp_client) — default, stateless

→ SSE (sse_client) — fallback, persistent connection

Operations: session.list_tools() (discover) · session.call_tool(name, args) (execute)

URL pattern: http://mcp1.mcp1.svc:5001/mcp (K8s service DNS)

🔌 MCP Engine (Gatekeeper)

Each engine runs in its own K8s namespace with its own plugins, credentials, policies, and RAG indexes. Deploy as many as needed.

Every tool call passes through 5 checks: Security Wrapper → Credential Resolve → Policy Check → Audit Log → Metrics.

mcp1 (Primary): Email, Payments, GitHub, Travel — the core business tools.

Plugin sharing: Other engines can import plugins from mcp1 via scoped dependencies. mcp2 can use Email and Payments from mcp1 without reinstalling — but only the cherry-picked tools are visible, and RAG indexes are filtered to the imported scope.

Result: ✅ ALLOW → execute plugin | ❌ BLOCK → return error. The LLM never knows it was blocked — it just gets a "permission denied" message and adapts.

🔌 mcp2 — DevOps Engine (Example)

Native plugins: Jira, Datadog — tools specific to this engine's purpose.

Shared plugins (via dependency): Email and Payments imported from mcp1. Shown with dashed borders.

How sharing works:

1. Admin wires a dependency in the Visual Designer: mcp2 → mcp1

2. Cherry-pick which plugins to import (not all — just Email + Payments)

3. Imported tools get X-Dependency-Scope header — mcp1 enforces scoped access

4. RAG search is filtered to imported plugin indexes only

5. Credentials resolve independently per engine — mcp2 users have their own vault keys

Security: Prompt injection in mcp2 cannot access GitHub or Travel tools — they're not in mcp2's scope. The dependency boundary is a code-level firewall.

🔌 mcp-N — Deploy Unlimited Engines

Each industry vertical or department can have its own MCP engine:

🏥 mcp-healthcare: EHR, FHIR, clinical trials plugins — HIPAA namespace isolation

💰 mcp-finance: Bloomberg, Plaid, KYC plugins — SOX-compliant audit trail

⚖️ mcp-legal: Westlaw, DocuSign, billing — ethical wall enforcement

Each engine is a separate Helm release: helm install mcp-healthcare ./helm/mcp-engine -n mcp-healthcare

Share common plugins (Email, Calendar) across engines via dependencies while keeping specialized plugins isolated.

🧩 Plugins

File: plugin_loader.py · Each plugin is a ZIP with manifest.json + Python code

Available: Email, GitHub, Payments, Travel (+ industry verticals on roadmap)

Each plugin registers MCP tools with the FastMCP server. The engine wraps each tool with the security middleware.

Credentials: Resolved per-user from vault cascade — the plugin function receives ready-to-use API keys, never raw vault secrets.

Scoping: Plugins only see indexes they own (scoped RAG search).

🌐 External APIs

The actual services that plugins call: Gmail, Stripe, GitHub, Amadeus, etc.

Credentials used: User's personal API keys (from vault cascade), NOT shared org keys.

The AI never sees these keys — they're injected by the engine's credential resolver at the last moment.

🔐 Credential Vault

File: vault_client.py

Cascade: User key → Group key → Org key (most specific wins)

Storage: Azure Key Vault (production) or HashiCorp Vault or local encrypted

Access: Managed Identity — no API keys stored in pods

📚 Hierarchical RAG

File: cerebro_client.py · search_all_indexes()

5 priority levels: P5 Personal → P4 Group → P3 Org → P2 Plugin → P1 Engine

Identity-scoped: Search filters by _user_email, _user_groups, ACLs on each index

Result: Injected into the LLM system prompt as context before the agent loop starts

💾 Document Database (Cloud-Agnostic)

Persists: Chat sessions, Visual Designer blueprints, MCP server registry, workflow definitions & runs, user preferences, connector configs.

Azure: Cosmos DB · AWS: DynamoDB · GCP: Firestore · Local: SQLite

The app code uses a common interface — swap the database by changing one Terraform module.

📊 Observability Stack

Prometheus: Metrics — tool calls, latency, error rates, RAG queries

Grafana: Dashboards — platform overview, MCP engine, security audit, user activity

Loki: Logs — all pod logs with level detection and colored volume charts

🏭 Industry Vertical Plugins

Specialized plugin bundles for regulated industries. Each ships as a ZIP and installs in 2 clicks via the Visual Designer.

🏥 Healthcare: EHR integration, HL7 FHIR, clinical trials, drug interactions — HIPAA-enforced at code level

💰 Finance: Bloomberg, Plaid, QuickBooks, compliance/KYC — SOX audit trail built in

⚖️ Legal: Westlaw, DocuSign, e-filing, billing — ethical walls + attorney-client privilege enforced

🚀 Aerospace: Supply chain, fleet management — clearance-level data isolation

🏫 Education: Canvas/Blackboard, student info — FERPA compliance

🛒 Retail: Shopify, CRM, inventory — per-merchant data isolation

✨ Build Your Own: No-code wizard imports any OpenAPI spec and wraps it with ContextWeaver's 5 security layers automatically.

☁️ Loom — Cloud-Agnostic Infrastructure

How it works: 9 Terraform modules abstract each cloud service into a common interface. The app code never references cloud-specific APIs.

Azure (Current): AKS, Cosmos DB, AI Search, Azure OpenAI, Key Vault, ACR — fully wired, running in production.

AWS (Planned): EKS, DynamoDB, OpenSearch, Bedrock, Secrets Manager, ECR — same Terraform modules, different providers.

GCP (Planned): GKE, Firestore, Vertex AI Search, Vertex AI, Secret Manager, Artifact Registry.

Local/On-Prem: Docker Compose, SQLite, Ollama, file-based vault — for air-gapped deployments and development.

Key command: cd environments/aws && terraform apply — swaps the entire cloud backend. The app, security, plugins, and user data remain untouched.

Security stays constant: Identity injection, credential vault, policy enforcement, and audit logging work identically regardless of which cloud runs underneath.

🧠 Cerebro — Secure Agentic AI, Any Model, Any Cloud, Any Channel