BrainzBytes — Cloud-Agnostic Architecture

The Customer / Platform Provider Model

Identity and infrastructure are completely separated. Nothing crosses the boundary except a signed JWT.

Customer Identity

Identity Provider

Users authenticate here. This tenant owns nothing except identity.

Entra ID AWS Cognito Okta Keycloak

App Registration / OAuth Client
OIDC authorization code flow
JWT tokens with roles + groups
MFA via Conditional Access (customer-controlled)

🔒

OIDC JWT Token

The ONLY thing that crosses tenants

Platform Provider Infrastructure

All Compute + Data

Every workload, every database, every secret lives here.

K8s Cluster Container Registry Vault

LLM / AI endpoints
Vector search + database
Secret vault + managed identity

Users NEVER access BrainzBytes infrastructure directly. Infrastructure NEVER stores user passwords. The JWT is validated on every request — if it's invalid, access is denied at the ingress.

🧠 Cerebro — The Constant Application Layer

These two pods are always the same regardless of which cloud you deploy to.

Pod 1

cerebro-app

Flask application • Port 5080

AI Hub — LLM orchestration, RAG pipeline
Designer — Plugin builder, connector config
Agent — Agentic reasoning loop
OIDC — Token validation, session management

2/2 app container + istio-proxy sidecar

Pod 2

mcp-engine

FastMCP server • Dashboard :5000 • MCP :5001

33+ MCP Tools from loaded plugins
Plugin Runtime — Hot-reload, scoped access
Dashboard — Tool inspection, health check
Vault Client — Credential resolution

2/2 engine container + istio-proxy sidecar

🔃

Istio / service mesh mTLS between all pods

🏗️ Current AKS Production Topology

contextweaver-aks — 2–6 nodes (Standard_D4ds_v5) with Cluster Autoscaler + HPA

cerebro namespace

cerebro-app — Flask backend 2–8 replicas (HPA)
contextweaver-frontend — React SPA 2–5 replicas (HPA)
Valkey — Redis-compatible (sessions, rate limiting)
code-server — Standalone workspace (data source)
ConfigMaps + Secrets (tokens, config)

workspaces namespace

ws-{username} — Per-user code-server 4.115.0
Azure Disk PVC per user (ext4, persistent)
zsh + Oh-My-Zsh + kubectl/helm autocomplete
Copilot CLI, Node.js 20, Claude Code pre-installed
4 cores / 4Gi per workspace pod

jupyterhub namespace

JupyterHub — Multi-user notebook server
jupyter-{username} — Per-user notebook pods
Entra ID authentication
Custom zsh prompts with username

monitoring

Prometheus
Grafana
Loki + Promtail
Alertmanager

ingress-nginx

NGINX Ingress Controller
TLS termination
SSL redirect

Cluster Autoscaler: 2–6 nodes HPA: cerebro-app 2–8, frontend 2–5 Valkey: sessions + rate limiting 20 req/min per-user rate limit

🌐 Loom — The Replaceable Platform Layer

Six swappable backend categories. Application code stays THE SAME — only config changes.

LLM Provider

Azure OpenAI
Amazon Bedrock
Google Vertex AI
Ollama (local)

Vector Search

Azure AI Search
Amazon OpenSearch
Elasticsearch
ChromaDB (local)

Database

Azure Cosmos DB
Amazon DynamoDB
MongoDB
PostgreSQL

Secret Vault

Azure Key Vault
AWS Secrets Manager
HashiCorp Vault
GCP Secret Manager

Auth / SSO

Entra ID
AWS Cognito
Google Identity
Okta / Keycloak

Container Infra

AKS (Azure)
EKS (AWS)
GKE (Google)
Docker Desktop (local)

Application code stays THE SAME. Switch from Azure to AWS by changing config.yaml — zero code modifications required.

Customer Isolation — One Platform, Complete Separation

Multiple customers share the same infrastructure while having complete data and resource isolation.

How It Works

🏢 Customer A — Acme Corp

acme.brainzbytes.com
Namespace: acme
Key Vault: acme-kv
Database: acme-db
Search: acme-search
Managed Identity: acme-mi
🔒 accesses ONLY acme resources

🏭 Customer B — MegaCorp

mega.brainzbytes.com
Namespace: megacorp
Key Vault: mega-kv
Database: mega-db
Search: mega-search
Managed Identity: mega-mi
🛡️ completely separate

🚀 Customer C — StartupXYZ

startup.brainzbytes.com
Namespace: startupxyz
Key Vault: startup-kv
Database: startup-db
Search: startup-search
Managed Identity: startup-mi
✅ same pattern, fully isolated

⚙️ Shared Layer: BrainzBytes Platform — ACR Images • Connector Marketplace • Control Plane

4 Layers of Isolation

🌐 1. Namespace Isolation

K8s NetworkPolicy: Customer A pods can ONLY talk to Customer A services. Istio AuthorizationPolicy enforces pod-to-pod auth.

👤 2. Identity Isolation

Each customer has own Managed Identity (acme-mi). acme-mi can ONLY access acme-kv, acme-db, acme-search. Cannot touch mega-kv.

🗄️ 3. Data Isolation

Separate Key Vault (credentials never mix). Separate Database (plugins, state, connectors). Separate Search indexes (RAG data never crosses).

🛡️ 4. Network Isolation

Istio mTLS encrypts all pod-to-pod traffic. No cross-namespace traffic allowed. App Gateway routes by hostname to correct namespace.

What Each Customer Sees

💬 AI Hub

Chat with YOUR connectors only

🧩 My Plugins

Only plugins YOUR org subscribed to

🔑 My Credentials

YOUR vault — YOUR Stripe, Gmail, Jira keys

⚙️ MCP Engines

YOUR engine instances only

🛡️ You never see other customers. They never see you.

Onboarding a New Customer

📝

1. Sign Up

Customer signs up at brainzbytes.com

⚙️

2. Provision

Control Plane creates isolated namespace + resources (~5 min)

🔑

3. Connect IdP

Connect identity provider (Entra / Okta / Cognito)

🧩

4. Select Plugins

Choose plugins from the Connector Marketplace

🔐

5. Credentials

Configure credentials via My Credentials vault

🎉

6. Done!

Team logs in and starts using AI immediately

Economics

💰 Per-Customer Cost	~$60–75/mo
💵 Revenue (10 users × $99)	~$990/mo
📈 Margin	92%+

🧠 Cerebro — Security Principles

These apply to ALL cloud deployments — non-negotiable.

Workload Identity

Azure Workload Identity / AWS IRSA / GCP Workload Identity Federation — zero secrets in pods. Pods authenticate via platform-native identity, never API keys.

mTLS Between Pods

Istio / service mesh encrypts all pod-to-pod traffic with mutual TLS. No unencrypted internal communication.

HTTPS Ingress + WAF

App Gateway / ALB / Cloud Load Balancer with WAF. All external traffic is TLS-terminated with managed certificates.

Vault-Backed Credentials

Connector credentials are never in environment variables. They're resolved at runtime from the vault with per-request authentication.

Per-User Isolation

Vault cascade: user → group → org. Each user's credentials are isolated. No lateral access.

Two-Tier MFA

Customer login via Entra Conditional Access MFA. Provider admin access requires MFA. Runtime uses Managed Identity (X.509 certs) — stronger than password+MFA.

No Card Data on Platform

Card data never touches the platform. Stripe Checkout handles PCI compliance — we only store Stripe tokens.

RAG Policy Enforcement at org / group / user levels — AI responses are governed by scoped policies ingested from documents, URLs, wikis, and databases.

🧶 Weave — Visual Designer Architecture

An interactive SVG canvas for designing, configuring, and deploying MCP engine topologies — entirely no-code

Frontend — SVG Canvas

Pure SVG canvas rendered in app.js (~2,000 lines)
Drag-and-drop engine nodes, plugin cards, connector slots, and source groups
Inline toggles for plugin enable/disable and security settings
Real-time tooltips on every interactive element for self-documenting UX
No external canvas library — lightweight, zero dependencies

State Model

_vdEngines[] — array of engine objects with plugins, connectors, sources
Each engine holds its own plugin list, connector configurations, and source mappings
Connector credentials reference vault paths — never stored in client state
Sources grouped by scope level (User, Group, Org) within each plugin
Full state serializable to JSON for blueprint save/load

Backend — API Endpoints

/api/visual-designer/save — persist blueprint to Cosmos DB
/api/visual-designer/load — retrieve saved blueprints
/api/visual-designer/deploy — deploy topology as live MCP engines
/api/visual-designer/ingest — trigger chunk → embed → index pipeline
/api/visual-designer/cleanup — remove deployed resources

Storage & Ingest Pipeline

Storage: Cosmos DB cerebro-visual-designs container for blueprints
Ingest: Azure AI Search indexes + Azure OpenAI embeddings
Pipeline: source → chunk → embed (text-embedding-ada-002) → vector index
Hierarchical RAG cascade: User → Group → Org → Plugin → Engine
Proxy: /dash/{slug}/ reverse proxy routes to MCP engine dashboards

🌐 Loom — Cloud Deployments

See the concrete implementation for each cloud provider

☁️

Azure Architecture

AKS • Entra ID • Cosmos DB • Azure OpenAI

Live — Production

📦

AWS Architecture

EKS • Cognito • DynamoDB • Bedrock

Reference Architecture

🌍

GCP Architecture

GKE • Cloud Identity • Firestore • Vertex AI

Reference Architecture

🌐 Loom — Infrastructure as Code

Deploy the entire platform from scratch in ~20 minutes with Terraform

9 Cloud-Agnostic Modules

Kubernetes, database, LLM, search, vault, registry, identity, networking, and DNS — each module provisions the correct cloud-native resource based on a single cloud variable.

Azure Primary

Fully tested Azure environment: AKS, Cosmos DB, Azure OpenAI, AI Search, Key Vault, ACR, and DNS. AWS and GCP modules are ready — swap by changing one directory.

Helm Charts

Two Helm charts (cerebro-app rev 5 and mcp-engine rev 3) deploy Kubernetes workloads. Day-2 operations via helm upgrade / helm rollback. Infrastructure endpoints are injected automatically by Terraform.

Bootstrap Script

One-time bootstrap.sh creates remote state storage and a service principal. Then terraform apply handles everything else — infrastructure + workloads in a single command.

Full Terraform documentation →

🌐 Loom — SRE Observability

Production-grade monitoring with Prometheus + Grafana + Loki — embedded directly in the platform

Prometheus + Grafana

Full metrics stack deployed alongside the application. Prometheus scrapes all services; Grafana provides real-time visualization via the Monitoring tab — no context-switching to external tools.

Loki 3 Log Aggregation

Loki 3.x + Promtail for centralized log aggregation. Search logs via LogQL in Grafana Explore. Log volume histogram colored by detected_level. 7-day retention.

7 Dashboards

Overview, MCP Engine, RAG, Security, Users, Alerts, and Logs — embedded via iframe with a dashboard switcher in the Monitoring tab.

12 Alerting Rules

Alertmanager rules cover API latency, error rates, MCP tool failures, RAG indexing delays, auth failures, pod restarts, and resource saturation — with severity-based routing.

Zero PII in Metrics

All user identifiers are sha256-hashed before emission. No raw emails, names, or IPs in Prometheus or Loki — compliant with GDPR, SOC2, and HIPAA audit requirements.

Cluster View

Horizontal card layout showing all namespaces (default, mcp1, ingress-nginx, monitoring). Expandable pod details with images, digests, resources, and Helm metadata.

Helm-Managed Deployments

Both workloads managed via Helm charts (cerebro-app rev 5, mcp1 rev 3). Day-2 operations via helm upgrade and helm rollback.

Dashboards are embedded in the app via the Monitoring tab — deploy with deploy-monitoring.sh. Logs aggregated via Loki 3 + Promtail with 7-day retention.

Security Guarantees — All Clouds

Zero API Keys in Pods Workload Identity Istio mTLS HTTPS + WAF Vault-Backed Secrets Per-User Isolation Two-Tier MFA PCI via Stripe