Cloud Native AI: ML Infrastructure on Kubernetes

The Cloud Native AI (CNAI) category is the fastest-growing part of the CNCF landscape (2.0x trend multiplier). AI workloads need the same infrastructure patterns cloud native was built for — containerization, orchestration, auto-scaling, and observability. As organizations deploy LLMs and vector databases at scale, they're reaching for Kubernetes-native tools.

ML Serving

vLLM

vLLM — 74K stars

The de facto standard for serving large language models. Provides high-throughput inference with PagedAttention for memory-efficient KV-cache management, continuous batching, speculative decoding, and an OpenAI-compatible API. If you're serving LLMs in production, vLLM is almost certainly in your stack.

Vector Databases

The backbone of RAG (Retrieval-Augmented Generation) architectures.

Project	Stars	Status	Best For
Milvus	43K	Incubating	Billion-scale RAG, hybrid search
Qdrant	29K	—	Filtered semantic search
Chroma	27K	—	Prototyping, developer-friendly

Milvus is the most popular purpose-built vector database in the CNCF landscape. Supports multiple index types (IVF_FLAT, HNSW, DiskANN), hybrid dense+sparse search, multi-tenancy, and scales to billions of vectors across distributed storage. The go-to for production RAG at scale.

Qdrant uses a Rust-based engine for fast, filtered vector search. Excels at combining dense similarity with precise metadata filtering — "documents similar to X that also match filters Y and Z." Strong choice for real-time applications that need both semantic search and structured queries.

Chroma is the vector database built for developers. Pythonic API, in-memory and persistent storage, automatic embedding generation, and built-in metadata filtering. The go-to for prototyping RAG applications and local development.

The Broader CNAI Ecosystem

Distributed Training: Ray, DeepSpeed, Kubeflow Training
Model Observability: Langfuse, EvidentlyAI, Arize Phoenix
Data Architecture: LakeFS, DVC, Great Expectations
AutoML: AutoGluon, FLAML, Optuna

When to Use What

Serving LLMs in production? → vLLM
RAG at scale (billions of vectors)? → Milvus
RAG with metadata filtering? → Qdrant
Prototyping RAG locally? → Chroma
MLOps on Kubernetes? → Kubeflow + Ray
Model monitoring? → Langfuse or Phoenix

Part of the CNCF Cloud Native Landscape.