GPU Inference Serving with vLLMNotes on running LLM/VLM inference in production on GPUs — specifically using vLLM on Kubernetes (AKS).