Tag: qwen

4 articles

llm dgx-spark atlas multi-model inference qwen moe open-source

Atlas Engine: Sub-2-Minute Cold Start for Multi-Model Orchestration on DGX Spark

May 10, 2026 · 7 min read

Run 3 specialised LLMs on a single DGX Spark in under 2 minutes with 100+ tok/s throughput. Production orchestration patterns revealed.

atlasdgx-sparkmulti-modelllminferenceqwen

Qwen3.6-35B-A3B: What the Numbers Actually Show

April 18, 2026 · 8 min read

Alibaba released Qwen3.6-35B-A3B on 16 April 2026, the first open-weight model in the Qwen3.6 series. The benchmarks show real gains in agentic coding, but the architecture is unchanged from Qwen3.5 and the red flags warrant scrutiny.

qwenmoellmopen-sourceagenticcodingalibaba

Qwen3.5-35B-A3B: Production Deployment on GB10 Grace Blackwell

March 1, 2026 · 4 min read

Deploy Qwen's latest agentic coding model with vLLM on NVIDIA DGX Spark. Complete configuration for tool calling, extended context, and optimal performance on the GB10 Grace Blackwell Superchip.

qwenvllmllmself-hosteddockernvidiagb10agentic-ai

Self-Hosted LLM Inference: A Complete vLLM Setup Guide

February 25, 2026 · 8 min read

A practical guide to deploying production-ready LLM inference using vLLM on NVIDIA DGX Spark hardware, covering configuration, troubleshooting, and performance optimization.

vllmllmself-hosteddockernvidiainferenceqwen