Skip to main content
graphwiz.ai
← Back to DevOps

Cloud Native AI: ML Infrastructure on Kubernetes

DevOpsAI
cncfaivllmmilvusqdrantvector-databasellm-serving

Cloud Native AI: ML Infrastructure on Kubernetes

The Cloud Native AI (CNAI) category is the fastest-growing part of the CNCF landscape (2.0x trend multiplier). AI workloads need the same infrastructure patterns cloud native was built for — containerization, orchestration, auto-scaling, and observability. As organizations deploy LLMs and vector databases at scale, they're reaching for Kubernetes-native tools.


ML Serving

vLLM

vLLM — 74K stars

The de facto standard for serving large language models. Provides high-throughput inference with PagedAttention for memory-efficient KV-cache management, continuous batching, speculative decoding, and an OpenAI-compatible API. If you're serving LLMs in production, vLLM is almost certainly in your stack.


Vector Databases

The backbone of RAG (Retrieval-Augmented Generation) architectures.

Project Stars Status Best For
Milvus 43K Incubating Billion-scale RAG, hybrid search
Qdrant 29K Filtered semantic search
Chroma 27K Prototyping, developer-friendly

Milvus is the most popular purpose-built vector database in the CNCF landscape. Supports multiple index types (IVF_FLAT, HNSW, DiskANN), hybrid dense+sparse search, multi-tenancy, and scales to billions of vectors across distributed storage. The go-to for production RAG at scale.

Qdrant uses a Rust-based engine for fast, filtered vector search. Excels at combining dense similarity with precise metadata filtering — "documents similar to X that also match filters Y and Z." Strong choice for real-time applications that need both semantic search and structured queries.

Chroma is the vector database built for developers. Pythonic API, in-memory and persistent storage, automatic embedding generation, and built-in metadata filtering. The go-to for prototyping RAG applications and local development.


The Broader CNAI Ecosystem

  • Distributed Training: Ray, DeepSpeed, Kubeflow Training
  • Model Observability: Langfuse, EvidentlyAI, Arize Phoenix
  • Data Architecture: LakeFS, DVC, Great Expectations
  • AutoML: AutoGluon, FLAML, Optuna

When to Use What

  • Serving LLMs in production? → vLLM
  • RAG at scale (billions of vectors)? → Milvus
  • RAG with metadata filtering? → Qdrant
  • Prototyping RAG locally? → Chroma
  • MLOps on Kubernetes? → Kubeflow + Ray
  • Model monitoring? → Langfuse or Phoenix

Part of the CNCF Cloud Native Landscape.