AI Infrastructure | My Second Brain

📄️ Kubernetes: Pod vs Node

One of the most confusing concepts when starting with Kubernetes. The office building analogy clicked for me.

📄️ GPU Inference Serving with vLLM

Notes on running LLM/VLM inference in production on GPUs — specifically using vLLM on Kubernetes (AKS).

📄️ AKS Auto-Scaling — 3-Layer System

How to design auto-scaling for AI inference workloads on Azure Kubernetes Service that delivers SLA + cost efficiency simultaneously. Three independent layers work together — each reacts at a different speed.

📄️ Enterprise AI Hub — Architecture Pattern

A design pattern for building a centralised, multi-tenant AI platform that serves multiple business use cases and countries from a single deployment. Learned from designing the Motul APAC AI Hub.