Tag: vllm

4 articles

llm inference qwen ai agentic-ai vllm sglang machine-learning

vLLM vs SGLang: Choosing an LLM Inference Framework in 2026

April 13, 2026 · 7 min read

A technical comparison of vLLM and SGLang, the two leading open-source LLM inference engines, covering architecture, performance, and when to pick each one.

vllmsglangllminferencemachine-learninggpuserving

Cloud Native AI: ML Infrastructure on Kubernetes

April 6, 2026 · 2 min read

The fastest-growing CNCF category — ML serving, vector databases, and the open AI stack running on Kubernetes.

cncfaivllmmilvusqdrantvector-databasellm-serving

Qwen3.5-35B-A3B: Production Deployment on GB10 Grace Blackwell

March 1, 2026 · 4 min read

Deploy Qwen's latest agentic coding model with vLLM on NVIDIA DGX Spark. Complete configuration for tool calling, extended context, and optimal performance on the GB10 Grace Blackwell Superchip.

qwenvllmllmself-hosteddockernvidiagb10agentic-ai

Self-Hosted LLM Inference: A Complete vLLM Setup Guide

February 25, 2026 · 8 min read

A practical guide to deploying production-ready LLM inference using vLLM on NVIDIA DGX Spark hardware, covering configuration, troubleshooting, and performance optimization.

vllmllmself-hosteddockernvidiainferenceqwen