Enterprise AI Hub β Architecture Pattern
A design pattern for building a centralised, multi-tenant AI platform that serves multiple business use cases and countries from a single deployment. Learned from designing the Motul APAC AI Hub.
The Problem with Siloed AIβ
Most enterprises start AI projects the naive way: one project = one AI infrastructure. This leads to:
| Problem | Impact |
|---|---|
| Each project builds its own GPU cluster | 5β10Γ higher total infrastructure cost |
| Models trained/hosted separately | No shared learning, inconsistent quality |
| Security/governance duplicated per project | Compliance risk, audit complexity |
| New use case takes 6β12 months | Slow to deliver value |
The Hub-and-Spoke Patternβ
One centralised AI Hub provides shared AI capabilities. Business use cases ("spokes") consume them via API β no spoke builds its own AI engine.
The economics: Layers 1β4 (Cloud, GPU, Infrastructure, AI Models) are fixed costs that barely change whether you run 1 or 40 use cases. Only the application layer varies.
| Metric | Siloed | Hub |
|---|---|---|
| Time to deploy new use case | 6β12 months | 6β8 weeks |
| Cost per use case (1 active) | High | High |
| Cost per use case (10 active) | Still high (each has own infra) | ~10Γ cheaper |
| Data intelligence | Siloed | Cross-domain (all data flows through one hub) |
| Governance | Duplicated | Centralised |
The 5-Layer AI Stackβ
| Layer | Purpose | Components |
|---|---|---|
| Layer 5 β Applications | Business automation & intelligence | 40+ use cases per domain |
| Layer 4 β AI Models | Cognitive & prediction engine | VLM, LLM, Embedding, ML |
| Layer 3 β Platform | Enterprise AI runtime | AKS, pgvector, Redis, Service Bus, API Management |
| Layer 2 β GPU Compute | High-performance AI processing | A100 nodes, vLLM, auto-scaling |
| Layer 1 β Cloud | Reliable & sustainable base | Azure Singapore, 99.9% SLA |
Key insight: Layers 1β4 are built once and shared. Only Layer 5 changes per use case. The incremental cost of a new use case is near-zero once the hub is live.
Multi-Pool AKS Topologyβ
The hub runs on AKS with three node pools, each with a different operating contract:
| Pool | VM type | Billing | Priority | Purpose |
|---|---|---|---|---|
gpu-realtime | NC24ads_A100_v4 | On-demand | High | Real-time inference (< 60s SLA) |
gpu-batch | NC24ads_A100_v4 | Spot (~60% cheaper) | Low | Batch jobs, model training, nightly sync |
cpu-app | D8s_v3 | On-demand | Medium | Portal, API, pgvector, Redis, email |
Why split? A batch job (e.g., nightly model re-training, 4 hours) on a shared pool starves real-time inference. Kubernetes priority classes + node taints enforce separation automatically.
Country Identification Before Processingβ
For a multi-country hub, country must be identified at intake β before the document is processed. Never infer country from document content.
Three mechanisms:
-
Dedicated email inbox per country
po-japan@hub.com,po-thailand@hub.com, etc.- Intake service knows country from which mailbox received the email
- Most common, works for all countries
-
Fax DID number (Japan / fax-based)
- Each fax line has a unique number β maps to a country
- XDW format is also Japan-specific (double signal)
-
API endpoint path / header (EDI/direct integration)
POST /api/v1/submit/JPor headerX-Country-Code: JP
Why this matters:
- SAP routing β each country has its own SAP system. Without a country tag, you don't know which BAPI endpoint to call.
- Per-country KEDA scaling β queue depth per country drives scaling decisions
- Monitoring β auto-processing rate, HITL queue depth per country
- Email confirmation language β determined by country tag, not document language inference
The Confidence-Gated Automation Patternβ
Never post AI results directly to production systems. Use a confidence score to decide routing:
Confidence β₯ 90% β Auto-post to ERP (no human touch)
Confidence 75β89% β HITL review queue (human verifies flagged fields only)
Confidence < 75% β Full manual processing
Why this matters: The AI is not always right. But it doesn't need to be right 100% of the time β it needs to be right often enough that humans only review exceptions. Starting at 85% auto-processing rate and improving through continuous learning is better than 100% manual from day one.
Continuous learning loop: Every HITL correction is stored back in the vector database. Future documents with similar patterns get higher confidence scores automatically.
Material / Entity Mapping Pattern (Two-Strategy Search)β
For mapping unstructured customer text to structured master data (e.g., Japanese product name β SAP material code):
Strategy A: Exact / rule match (fast, O(1))
β Check Redis cache for known mappings
β If found: confidence 1.00, return immediately
Strategy B: Semantic vector search (handles new descriptions)
β BGE-M3 converts text to embedding vector
β pgvector finds top-5 nearest matches by cosine similarity
β LLM picks the best match and explains why
β Store result in pgvector for future Strategy A hits
The system learns: the more Strategy B runs, the more Strategy A hits in the future.
Cost Amortisation β The Key Economic Argumentβ
The strongest argument for the Hub pattern is infrastructure amortisation:
| Active use cases | Annual hub cost | Cost per use case / year |
|---|---|---|
| 1 | $13,320 | $13,320 |
| 7 | $13,320 | $1,903 |
| 13 | $25,368 | $1,951 |
| 23+ | $29,608 | < $1,287 |
Infrastructure scales sublinearly (add 1 batch node per 2β3 new use cases). Use case count scales linearly. The ROI compounds with every wave.
When to Use This Patternβ
Good fit:
- Multiple AI use cases across the same enterprise
- Multi-country deployment with centralised governance requirements
- Mix of real-time (< 60s) and batch workloads
- Open-source models (self-hosted, not Azure OpenAI) for cost control
Not a good fit:
- Single use case with no roadmap for more
- Each use case has completely different data residency requirements (e.g., can't route Japan data through Singapore)
- Teams that need full isolation between use cases for org/security reasons