Kong: From Nginx-Based API Gateway to Unified AI Traffic Control Plane
There are few open-source projects that survive a decade and a pivot — and fewer still that grow from 43,000 GitHub stars to become the reference architecture for API and AI traffic management. Kong, first committed in November 2014 and now at version 3.14+, is one of them. Built on OpenResty (Nginx plus LuaJIT), it started as a reverse proxy for REST APIs and has since absorbed AI workloads, the Model Context Protocol (MCP), and Agent-to-Agent (A2A) communication into a single control plane.
This article is a technical deep dive into Kong's architecture, its plugin system, its AI Gateway capabilities, and where it fits in the 2026 API gateway landscape.
Architecture: OpenResty Under the Hood
Kong is not a rewrite. It is an Nginx distribution running LuaJIT-compiled plugins inside the request lifecycle. Every request flows through a fixed set of phases — rewrite, access, balancer, header_filter, body_filter, log — and Kong inserts plugin execution at precisely these points.
Request Processing
When Kong receives a request, the router identifies the matching Route and Service. Plugins are loaded via a three-tier iterator: global plugins execute for every request (typically used in certificate and rewrite), a collecting iterator runs during access to gather Route- and Service-scoped plugins, and a collected iterator replays that gathered set during the response phases.
The router itself uses a rebuild counter in shared Nginx memory — when configuration changes (via Admin API, declarative config, or hybrid mode sync), the router is invalidated and rebuilt lazily on the next request. This avoids restarts but means there is a brief window where stale routes may be served.
Hybrid Mode (Control Plane / Data Plane)
Since version 2.x, Kong has supported a hybrid deployment model:
- Control Plane: Runs with a database (PostgreSQL or Cassandra), exposes the Admin API, and pushes configuration to data planes.
- Data Plane: Operates in DB-less mode, receives configuration over WebSocket, and handles traffic only.
The Control Plane serialises the entire gateway configuration via declarative.export_config(), gzip-compresses it, calculates a hash for change detection, and pushes it to every connected Data Plane. Data planes send periodic pings with their config hash; if the hash mismatches, the Control Plane re-pushes. This is essentially a CDN-style config propagation model with sub-second convergence on push.
Kong 3.10+ also supports RPC-based Sync V2 over JSON-RPC, replacing the legacy WebSocket loop with bidirectional capability negotiation.
Plugin System: Kong's Moat
If architecture is Kong's foundation, plugins are its moat. Gateway ships with over 100 official plugins in the Plugin Hub, and the Plugin Development Kit (PDK) makes writing custom plugins straightforward.
Plugin Phases
| Phase | Purpose | Plugin Execution |
|---|---|---|
init_worker | Per-worker startup | Configuration reload |
certificate | SSL cert resolution | Global plugins only |
rewrite | Pre-routing transforms | Global + route-specific |
access | Auth, routing, enrichment | All applicable plugins |
balancer | Upstream selection | None (internal) |
header_filter | Response header modification | Collected plugins |
body_filter | Response body modification | Collected plugins |
log | Logging and metrics | Collected plugins |
Plugins define a PRIORITY integer — higher values execute first. Kong also supports dynamic plugin ordering via ordering.before and ordering.after fields, so you can force rate limiting to run before authentication even if the numeric priorities disagree.
Plugin Development Kit (PDK)
The PDK is exposed through the global kong variable and provides forward-compatible APIs:
-- Example: custom rate-limiting plugin
local Kong = require "kong"
local MyPlugin = {
PRIORITY = 1000,
VERSION = "1.0.0",
}
function MyPlugin:access(conf)
local key = kong.client.get_ip()
local allowed, limit, remaining = kong.rate_limiting.consume(key, conf.limit, conf.window)
if not allowed then
return kong.response.exit(429, { message = "Rate limit exceeded" })
end
end
return MyPlugin
The PDK modules cover request inspection (kong.request), response manipulation (kong.response), upstream modification (kong.service.request), logging (kong.log), and client metadata (kong.client). Every PDK function is phase-gated — calling kong.response.get_status() in the access phase raises an error.
Multi-Language Plugins
Beyond Lua, Kong supports external plugins via plugin_servers — a gRPC-based out-of-process protocol. Plugins can be written in Go, Python, or JavaScript, running as separate processes that communicate over Unix sockets. For performance-sensitive paths, Kong 3.x also supports Wasm plugins via the Wasm runtime.
The AI Gateway Transformation
Kong's most significant evolution began in 2024 with the AI Gateway — a set of plugins and capabilities built on top of the core gateway. This is not a separate product; it is a plugin suite that runs on every Kong Gateway instance.
Universal LLM API
The AI Proxy plugin (and its Advanced variant) normalises requests to any LLM provider behind a single API:
# decK declarative config
plugins:
- name: ai-proxy
config:
route_type: llm/v1/chat
model:
provider: openai
name: gpt-4o
auth:
header_name: Authorization
header_value: Bearer ${OPENAI_API_KEY}
Supported providers include OpenAI, Anthropic, GCP Gemini, AWS Bedrock, Azure AI, Databricks, Mistral, Hugging Face, xAI/Grok, Aliyun/Qwen, Cerebras, and Ollama for local deployments. Switching providers requires exactly one config change — no client code modifications.
Semantic Caching and Routing
Two plugins that fundamentally change how LLM traffic behaves:
- AI Semantic Cache: Caches LLM responses based on semantic similarity, not exact string match. If a user asks "What's the capital of France?" and another asks "Capital of France?", the second request hits the cache. Configurable with a similarity threshold (default 0.9) and any vector store (Redis, AWS MemoryDB).
- AI Semantic Router: Routes requests to different models based on prompt semantics. Simple queries go to a fast cheap model; complex reasoning goes to a frontier model — all from a single endpoint.
PII Sanitisation and RAG Injection
Kong AI Gateway 3.10 introduced:
- AI PII Sanitisation: Detects and redacts 20+ categories of personally identifiable information across 12 languages — including passwords, credit card numbers, API keys, and addresses. Crucially, it can reinsert redacted data in the response (a "de-redact" mode) so end users receive complete information without developers manually coding sanitisation into every app.
- AI RAG Injector: Automatically queries a vector database on every LLM request, appends relevant context to the prompt, and sends it upstream. This eliminates per-application RAG pipeline coding — the gateway handles embedding generation, vector search, and context injection.
MCP Traffic Gateway
Started in version 3.12 and rapidly matured, Kong's MCP support is built around the AI MCP Proxy plugin, which operates in four modes:
| Mode | Behaviour |
|---|---|
passthrough-listener | Proxies MCP requests to upstream MCP servers |
conversion-listener | Converts REST APIs → MCP tools + accepts MCP requests |
conversion-only | Converts REST APIs → MCP tools (no request handling) |
listener | Aggregates tools from multiple conversion plugins |
The plugin reads OpenAPI schemas and dynamically generates MCP tool definitions — no additional code. ACLs can be applied at the tool level, so an agent authenticated as developer may invoke deploy_service while viewer cannot, even though both tools live on the same MCP server.
Kong 3.12 also added native OAuth 2.1 support for MCP, aligning with the MCP specification's authentication model. The gateway acts as an OAuth Resource Server, delegating token issuance to external Authorisation Servers.
Agent-to-Agent (A2A) Gateway
Announced in April 2026 with AI Gateway 3.14, Kong now supports the A2A protocol for agent-to-agent communication. This extends the same governance model — rate limiting, audit logging, cost tracking, and access control — to inter-agent traffic, making Kong the only gateway that covers LLM, MCP, and A2A traffic in a single control plane.
Kubernetes Integration
Kong runs natively on Kubernetes via the Kong Ingress Controller (KIC), now at v3.5.9. KIC supports both the standard Kubernetes Ingress resource and the Gateway API (GatewayClass, Gateway, HTTPRoute, ReferenceGrant).
helm install kong --namespace kong --create-namespace \
--repo https://charts.konghq.com ingress
KIC translates Kubernetes resources into Kong configuration automatically — adding a Service with annotations provisions routes, plugins, and upstreams in the gateway without touching the Admin API.
For new deployments, Kong recommends the Kong Operator, which combines KIC and Kong Gateway Operator into a single way to deploy, manage, and configure Kong products on Kubernetes.
Kong in the 2026 Gateway Landscape
| Gateway | Core Tech | Plugins | Best For |
|---|---|---|---|
| Kong | OpenResty (Nginx + LuaJIT) | 100+ (Lua, Go, Python, JS, Wasm) | API management + AI + MCP |
| Envoy | C++ | Wasm / C++ filters | Service mesh data plane |
| APISIX | OpenResty + etcd | 80+ (Lua, Go, Java, Python, Wasm) | High-throughput API routing |
| Traefik | Go | 20+ middleware | Simple Kubernetes ingress |
Kong's performance is competitive but not class-leading in raw throughput — DB-less mode achieves roughly 25,000 requests/second per core with ~2ms P99 latency. Envoy and APISIX each post higher numbers in synthetic benchmarks. In practice, backend latency (tens to hundreds of milliseconds) dominates, and Kong's plugin ecosystem and AI/MCP capabilities offset the performance gap for most deployments.
Where Kong Excels
- API management at scale: Developer portal, API versioning, monetisation, service catalog — features no standalone proxy provides.
- Unified AI infrastructure: One gateway for REST APIs, LLM routing, MCP servers, and agent traffic, with consistent auth and observability policies across all four.
- Plugin customisation: Writing a Lua plugin takes minutes. The PDK is well-documented and phase-gated to prevent foot-guns.
Where It Falls Short
- Enterprise features are paywalled: RBAC, OIDC, advanced rate limiting, Kong Manager UI — all require Kong Enterprise or Konnect subscription. The OSS edition is powerful but has clear ceilings.
- Lua dependency: Core plugins are Lua. While external plugin servers support Go, Python, and JS, the latency overhead of out-of-process execution means performance-sensitive plugins should be Lua.
- Memory footprint: Idles at ~80MB in DB-less mode, roughly double what Traefik or raw Nginx consume.
Deploying Kong in Production
Kong supports four deployment modes:
- Traditional — One Kong node with direct database access (PostgreSQL). Simplest setup for small deployments.
- Hybrid — Control Plane + Data Plane separation. Recommended for production: CP manages config, DP handles traffic in DB-less mode.
- DB-less — Single node with declarative YAML/JSON config. No database required. Configuration reloads on file change.
- Konnect — Managed SaaS control plane with self-hosted or cloud data planes. Includes analytics, developer portal, and multi-cloud management.
For Kubernetes, the standard path is:
# Gateway API HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
name: my-api
spec:
parentRefs:
- name: kong
rules:
- matches:
- path:
type: PathPrefix
value: /api/v1
backendRefs:
- name: my-service
port: 8080
Apply rate limiting via the Kong standard plugin:
apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
name: rate-limit
config:
minute: 100
policy: local
plugin: rate-limiting
Should You Run Kong in 2026?
If you need an API gateway and an AI gateway, and your infrastructure runs on Kubernetes or needs CP/DP separation, Kong is the most mature option with the deepest ecosystem. The plugin hub and PDK mean you can extend it to fit almost any traffic management pattern without writing infrastructure from scratch.
If you need maximum throughput on simple routing, APISIX or raw Envoy will outperform it on price-performance. If you need a service mesh sidecar, Envoy (via Istio) is the industry standard. But if you need a single control plane for APIs, LLMs, MCP servers, and agent traffic — Kong has no real equivalent.
The 145 releases, 340 contributors, and decade of production deployments speak to its staying power. The pivot from API proxy to AI control plane was not a rebranding exercise — the architecture, plugin system, and hybrid deployment model were designed for this expansion from day one.