Skip to main content
graphwiz.aigraphwiz.ai

DeepSeek V4: 1.6T Parameters, FP4 Precision, and the Huawei NPU Question

DeepSeek V4 ships two open-weight MoE models — a 1.6T Pro and a 284B Flash — with novel sparse attention, FP4 quantisation, 1M token context, and validated Huawei Ascend NPU support. Here's what actually changed.

deepseekmoellmopen-sourcehuaweinpuinferencefp4

Qwen3.6-35B-A3B: What the Numbers Actually Show

Alibaba released Qwen3.6-35B-A3B on 16 April 2026, the first open-weight model in the Qwen3.6 series. The benchmarks show real gains in agentic coding, but the architecture is unchanged from Qwen3.5 and the red flags warrant scrutiny.

qwenmoellmopen-sourceagenticcodingalibaba

CoreCoder: Claude Code's Architecture in 950 Lines of Python

How CoreCoder reverse-engineered Anthropic's Claude Code from 512K lines into a minimal 950-line implementation, revealing the essential architecture of modern AI coding agents.

claude-codeai-agentscorecoderreverse-engineeringllmcoding-agentpython

Arcee AI Trinity-Large-Thinking: The $20M Open Model Chasing Claude

A 26-person startup spent $20M training a 400B MoE model on 2,048 B300 GPUs — and produced the strongest open reasoning model outside China. Trinity-Large-Thinking ranks #1 on τ²-Airline at 1/28th the cost of Claude Opus 4.6.

arcee-aitrinitymoeopen-sourceapache-2llmagentic-aireasoning

Gemma 4: Google DeepMind's Most Intelligent Open Models

Gemma 4 brings frontier-level multimodal intelligence to open-source — with models ranging from 2B to 31B parameters, MoE efficiency, and native audio support for edge devices.

gemmagoogle-deepmindllmopen-sourcemoemultimodaledge-aiapache-2

Prompting Techniques for Agentic AI

A practical guide to engineering prompts for autonomous AI systems that plan, act, and iterate toward goals.

aipromptingagentic-systemsllmautonomous-agents

Qwen3.5-35B-A3B: Production Deployment on GB10 Grace Blackwell

Deploy Qwen's latest agentic coding model with vLLM on NVIDIA DGX Spark. Complete configuration for tool calling, extended context, and optimal performance on the GB10 Grace Blackwell Superchip.

qwenvllmllmself-hosteddockernvidiagb10agentic-ai

Self-Hosted LLM Inference: A Complete vLLM Setup Guide

A practical guide to deploying production-ready LLM inference using vLLM on NVIDIA DGX Spark hardware, covering configuration, troubleshooting, and performance optimization.

vllmllmself-hosteddockernvidiainferenceqwen

LLM Prompt Engineering: Best Practices for Production Systems

Comprehensive guide to prompt engineering techniques that work reliably in production environments, including chain-of-thought, few-shot learning, and output formatting strategies.

prompt-engineeringllmproductionbest-practices