One platform where identity, credentials, knowledge, and policy enforcement follow every action โ regardless of which AI model powers it, which cloud runs it, or how you access it. Click any block for details.
๐ฅ๏ธ Browser Chat UI โ Current. Full-featured admin + user views with SSE streaming, session history, Visual Designer.
๐ฑ Mobile App / REST API โ The POST /api/agent/chat/sync endpoint already returns JSON โ any mobile app can call it. Future: dedicated mobile SDKs (iOS/Android), push notifications for workflow approvals.
๐ค Voice Assistants โ Future: POST /api/agent/voice endpoint. Flow: Speech-to-Text (Azure/Whisper) โ Agent Loop โ Text-to-Speech (Azure TTS) โ audio response. Siri Shortcuts, Alexa Skills, Google Actions can all call the REST API.
๐ Webhooks โ Already supported for workflow triggers. Any external system can kick off an agent action.
Key: All channels authenticate the same way (OIDC/SSO token) and pass through the same 5 security layers. Typing, tapping, or talking โ the security is identical.
Entry point: User types a question in the Chat UI (browser)
Identity source: Entra ID / OIDC session โ session["user"]
What flows forward: Question text + selected MCP server IDs
What comes back: SSE event stream โ tokens, tool calls, citations, final answer
File: app.py lines 4495โ4720
Endpoints: POST /api/agent/chat (streaming) ยท POST /api/agent/chat/sync (JSON)
Step 1: Extract identity โ _current_user_email(), _user_max_role(), _current_user_groups()
Step 2: Create/resume chat session in Cosmos DB
Step 3: Call agent_mod.agent_chat(question, server_ids, ...user_email, user_role, user_groups)
Step 4: Stream events back to browser via SSE (/api/stream/<sid>)
File: agent.py lines 987โ1333
Entry: agent_chat(question, server_ids, ...)
Step 1: Sync user identity to all MCP servers via _sync_user_to_server() โ POST /api/set-user
Step 2: Discover tools from each server via MCP protocol โ discover_tools() โ ClientSession.list_tools()
Step 3: Build tool registry mapping tool_name โ {server_url, parameters}
Step 4: Build system prompt with available tools + RAG context + policies
Step 5: Enter loop: LLM call โ if tool_calls โ execute each โ feed results back โ repeat until no more tool_calls
Max iterations: 10 (prevents infinite loops)
Current: Azure OpenAI GPT-4o โ azure_clients.get_active_openai_client()
Planned backends:
โ Claude (Anthropic) โ Extended thinking, parallel tool calls, large context windows. Paid API ($3-75/1M tokens).
โ Ollama (Local) โ Free, runs Llama/Mistral/Qwen locally. Already in sim_mode. Great for development and air-gapped deployments.
โ Google Gemini โ Free tier available, good tool use support. Planned.
How it works: All backends produce the same output format โ tool_calls[] array. The agent loop doesn't care which model produced them. Switch via LLM_BACKEND env var or per-engine in the Visual Designer.
Key insight: The LLM is a commodity brain โ swappable. The 5 security layers wrapping every tool call are the real differentiator. No other platform has that.
File: agent.py lines 1253โ1267
What happens: After the LLM returns tool_calls, before execution:
โ func_args["_user_email"] = user_email
โ func_args["_user_role"] = user_role
โ func_args["_user_groups"] = user_groups
โ HTTP header: X-User-Email: sarah@acme.com
โ SSO token: X-SSO-Token: base64(payload).hmac_sig
Why both args + headers? FastMCP strips unknown kwargs (args starting with _), so identity must also flow via HTTP headers + ContextVar fallback.
File: agent.py lines 839โ890
Protocol: MCP (Model Context Protocol) by Anthropic
Transport options:
โ Streamable HTTP (streamablehttp_client) โ default, stateless
โ SSE (sse_client) โ fallback, persistent connection
Operations: session.list_tools() (discover) ยท session.call_tool(name, args) (execute)
URL pattern: http://mcp1.mcp1.svc:5001/mcp (K8s service DNS)
Each engine runs in its own K8s namespace with its own plugins, credentials, policies, and RAG indexes. Deploy as many as needed.
Every tool call passes through 5 checks: Security Wrapper โ Credential Resolve โ Policy Check โ Audit Log โ Metrics.
mcp1 (Primary): Email, Payments, GitHub, Travel โ the core business tools.
Plugin sharing: Other engines can import plugins from mcp1 via scoped dependencies. mcp2 can use Email and Payments from mcp1 without reinstalling โ but only the cherry-picked tools are visible, and RAG indexes are filtered to the imported scope.
Result: โ ALLOW โ execute plugin | โ BLOCK โ return error. The LLM never knows it was blocked โ it just gets a "permission denied" message and adapts.
Native plugins: Jira, Datadog โ tools specific to this engine's purpose.
Shared plugins (via dependency): Email and Payments imported from mcp1. Shown with dashed borders.
How sharing works:
1. Admin wires a dependency in the Visual Designer: mcp2 โ mcp1
2. Cherry-pick which plugins to import (not all โ just Email + Payments)
3. Imported tools get X-Dependency-Scope header โ mcp1 enforces scoped access
4. RAG search is filtered to imported plugin indexes only
5. Credentials resolve independently per engine โ mcp2 users have their own vault keys
Security: Prompt injection in mcp2 cannot access GitHub or Travel tools โ they're not in mcp2's scope. The dependency boundary is a code-level firewall.
Each industry vertical or department can have its own MCP engine:
๐ฅ mcp-healthcare: EHR, FHIR, clinical trials plugins โ HIPAA namespace isolation
๐ฐ mcp-finance: Bloomberg, Plaid, KYC plugins โ SOX-compliant audit trail
โ๏ธ mcp-legal: Westlaw, DocuSign, billing โ ethical wall enforcement
Each engine is a separate Helm release: helm install mcp-healthcare ./helm/mcp-engine -n mcp-healthcare
Share common plugins (Email, Calendar) across engines via dependencies while keeping specialized plugins isolated.
File: plugin_loader.py ยท Each plugin is a ZIP with manifest.json + Python code
Available: Email, GitHub, Payments, Travel (+ industry verticals on roadmap)
Each plugin registers MCP tools with the FastMCP server. The engine wraps each tool with the security middleware.
Credentials: Resolved per-user from vault cascade โ the plugin function receives ready-to-use API keys, never raw vault secrets.
Scoping: Plugins only see indexes they own (scoped RAG search).
The actual services that plugins call: Gmail, Stripe, GitHub, Amadeus, etc.
Credentials used: User's personal API keys (from vault cascade), NOT shared org keys.
The AI never sees these keys โ they're injected by the engine's credential resolver at the last moment.
File: vault_client.py
Cascade: User key โ Group key โ Org key (most specific wins)
Storage: Azure Key Vault (production) or HashiCorp Vault or local encrypted
Access: Managed Identity โ no API keys stored in pods
File: cerebro_client.py ยท search_all_indexes()
5 priority levels: P5 Personal โ P4 Group โ P3 Org โ P2 Plugin โ P1 Engine
Identity-scoped: Search filters by _user_email, _user_groups, ACLs on each index
Result: Injected into the LLM system prompt as context before the agent loop starts
Persists: Chat sessions, Visual Designer blueprints, MCP server registry, workflow definitions & runs, user preferences, connector configs.
Azure: Cosmos DB ยท AWS: DynamoDB ยท GCP: Firestore ยท Local: SQLite
The app code uses a common interface โ swap the database by changing one Terraform module.
Prometheus: Metrics โ tool calls, latency, error rates, RAG queries
Grafana: Dashboards โ platform overview, MCP engine, security audit, user activity
Loki: Logs โ all pod logs with level detection and colored volume charts
Specialized plugin bundles for regulated industries. Each ships as a ZIP and installs in 2 clicks via the Visual Designer.
๐ฅ Healthcare: EHR integration, HL7 FHIR, clinical trials, drug interactions โ HIPAA-enforced at code level
๐ฐ Finance: Bloomberg, Plaid, QuickBooks, compliance/KYC โ SOX audit trail built in
โ๏ธ Legal: Westlaw, DocuSign, e-filing, billing โ ethical walls + attorney-client privilege enforced
๐ Aerospace: Supply chain, fleet management โ clearance-level data isolation
๐ซ Education: Canvas/Blackboard, student info โ FERPA compliance
๐ Retail: Shopify, CRM, inventory โ per-merchant data isolation
โจ Build Your Own: No-code wizard imports any OpenAPI spec and wraps it with ContextWeaver's 5 security layers automatically.
How it works: 9 Terraform modules abstract each cloud service into a common interface. The app code never references cloud-specific APIs.
Azure (Current): AKS, Cosmos DB, AI Search, Azure OpenAI, Key Vault, ACR โ fully wired, running in production.
AWS (Planned): EKS, DynamoDB, OpenSearch, Bedrock, Secrets Manager, ECR โ same Terraform modules, different providers.
GCP (Planned): GKE, Firestore, Vertex AI Search, Vertex AI, Secret Manager, Artifact Registry.
Local/On-Prem: Docker Compose, SQLite, Ollama, file-based vault โ for air-gapped deployments and development.
Key command: cd environments/aws && terraform apply โ swaps the entire cloud backend. The app, security, plugins, and user data remain untouched.
Security stays constant: Identity injection, credential vault, policy enforcement, and audit logging work identically regardless of which cloud runs underneath.