Cloud Native AI: ML Infrastructure on Kubernetes
Cloud Native AI: ML Infrastructure on Kubernetes
The Cloud Native AI (CNAI) category is the fastest-growing part of the CNCF landscape (2.0x trend multiplier). AI workloads need the same infrastructure patterns cloud native was built for — containerization, orchestration, auto-scaling, and observability. As organizations deploy LLMs and vector databases at scale, they're reaching for Kubernetes-native tools.
ML Serving
vLLM
vLLM — 74K stars
The de facto standard for serving large language models. Provides high-throughput inference with PagedAttention for memory-efficient KV-cache management, continuous batching, speculative decoding, and an OpenAI-compatible API. If you're serving LLMs in production, vLLM is almost certainly in your stack.
Vector Databases
The backbone of RAG (Retrieval-Augmented Generation) architectures.
| Project | Stars | Status | Best For |
|---|---|---|---|
| Milvus | 43K | Incubating | Billion-scale RAG, hybrid search |
| Qdrant | 29K | — | Filtered semantic search |
| Chroma | 27K | — | Prototyping, developer-friendly |
Milvus is the most popular purpose-built vector database in the CNCF landscape. Supports multiple index types (IVF_FLAT, HNSW, DiskANN), hybrid dense+sparse search, multi-tenancy, and scales to billions of vectors across distributed storage. The go-to for production RAG at scale.
Qdrant uses a Rust-based engine for fast, filtered vector search. Excels at combining dense similarity with precise metadata filtering — "documents similar to X that also match filters Y and Z." Strong choice for real-time applications that need both semantic search and structured queries.
Chroma is the vector database built for developers. Pythonic API, in-memory and persistent storage, automatic embedding generation, and built-in metadata filtering. The go-to for prototyping RAG applications and local development.
The Broader CNAI Ecosystem
- Distributed Training: Ray, DeepSpeed, Kubeflow Training
- Model Observability: Langfuse, EvidentlyAI, Arize Phoenix
- Data Architecture: LakeFS, DVC, Great Expectations
- AutoML: AutoGluon, FLAML, Optuna
When to Use What
- Serving LLMs in production? → vLLM
- RAG at scale (billions of vectors)? → Milvus
- RAG with metadata filtering? → Qdrant
- Prototyping RAG locally? → Chroma
- MLOps on Kubernetes? → Kubeflow + Ray
- Model monitoring? → Langfuse or Phoenix
Part of the CNCF Cloud Native Landscape.