graphwiz.ai
← Back to api-gateway

Kong: From Nginx-Based API Gateway to Unified AI Traffic Control Plane

There are few open-source projects that survive a decade and a pivot — and fewer still that grow from 43,000 GitHub stars to become the reference architecture for API and AI traffic management. Kong, first committed in November 2014 and now at version 3.14+, is one of them. Built on OpenResty (Nginx plus LuaJIT), it started as a reverse proxy for REST APIs and has since absorbed AI workloads, the Model Context Protocol (MCP), and Agent-to-Agent (A2A) communication into a single control plane.

This article is a technical deep dive into Kong's architecture, its plugin system, its AI Gateway capabilities, and where it fits in the 2026 API gateway landscape.

Architecture: OpenResty Under the Hood

Kong is not a rewrite. It is an Nginx distribution running LuaJIT-compiled plugins inside the request lifecycle. Every request flows through a fixed set of phases — rewrite, access, balancer, header_filter, body_filter, log — and Kong inserts plugin execution at precisely these points.

Request Processing

When Kong receives a request, the router identifies the matching Route and Service. Plugins are loaded via a three-tier iterator: global plugins execute for every request (typically used in certificate and rewrite), a collecting iterator runs during access to gather Route- and Service-scoped plugins, and a collected iterator replays that gathered set during the response phases.

The router itself uses a rebuild counter in shared Nginx memory — when configuration changes (via Admin API, declarative config, or hybrid mode sync), the router is invalidated and rebuilt lazily on the next request. This avoids restarts but means there is a brief window where stale routes may be served.

Hybrid Mode (Control Plane / Data Plane)

Since version 2.x, Kong has supported a hybrid deployment model:

  • Control Plane: Runs with a database (PostgreSQL or Cassandra), exposes the Admin API, and pushes configuration to data planes.
  • Data Plane: Operates in DB-less mode, receives configuration over WebSocket, and handles traffic only.

The Control Plane serialises the entire gateway configuration via declarative.export_config(), gzip-compresses it, calculates a hash for change detection, and pushes it to every connected Data Plane. Data planes send periodic pings with their config hash; if the hash mismatches, the Control Plane re-pushes. This is essentially a CDN-style config propagation model with sub-second convergence on push.

Kong 3.10+ also supports RPC-based Sync V2 over JSON-RPC, replacing the legacy WebSocket loop with bidirectional capability negotiation.

Plugin System: Kong's Moat

If architecture is Kong's foundation, plugins are its moat. Gateway ships with over 100 official plugins in the Plugin Hub, and the Plugin Development Kit (PDK) makes writing custom plugins straightforward.

Plugin Phases

PhasePurposePlugin Execution
init_workerPer-worker startupConfiguration reload
certificateSSL cert resolutionGlobal plugins only
rewritePre-routing transformsGlobal + route-specific
accessAuth, routing, enrichmentAll applicable plugins
balancerUpstream selectionNone (internal)
header_filterResponse header modificationCollected plugins
body_filterResponse body modificationCollected plugins
logLogging and metricsCollected plugins

Plugins define a PRIORITY integer — higher values execute first. Kong also supports dynamic plugin ordering via ordering.before and ordering.after fields, so you can force rate limiting to run before authentication even if the numeric priorities disagree.

Plugin Development Kit (PDK)

The PDK is exposed through the global kong variable and provides forward-compatible APIs:

-- Example: custom rate-limiting plugin
local Kong = require "kong"

local MyPlugin = {
  PRIORITY = 1000,
  VERSION = "1.0.0",
}

function MyPlugin:access(conf)
  local key = kong.client.get_ip()
  local allowed, limit, remaining = kong.rate_limiting.consume(key, conf.limit, conf.window)
  if not allowed then
    return kong.response.exit(429, { message = "Rate limit exceeded" })
  end
end

return MyPlugin

The PDK modules cover request inspection (kong.request), response manipulation (kong.response), upstream modification (kong.service.request), logging (kong.log), and client metadata (kong.client). Every PDK function is phase-gated — calling kong.response.get_status() in the access phase raises an error.

Multi-Language Plugins

Beyond Lua, Kong supports external plugins via plugin_servers — a gRPC-based out-of-process protocol. Plugins can be written in Go, Python, or JavaScript, running as separate processes that communicate over Unix sockets. For performance-sensitive paths, Kong 3.x also supports Wasm plugins via the Wasm runtime.

The AI Gateway Transformation

Kong's most significant evolution began in 2024 with the AI Gateway — a set of plugins and capabilities built on top of the core gateway. This is not a separate product; it is a plugin suite that runs on every Kong Gateway instance.

Universal LLM API

The AI Proxy plugin (and its Advanced variant) normalises requests to any LLM provider behind a single API:

# decK declarative config
plugins:
  - name: ai-proxy
    config:
      route_type: llm/v1/chat
      model:
        provider: openai
        name: gpt-4o
      auth:
        header_name: Authorization
        header_value: Bearer ${OPENAI_API_KEY}

Supported providers include OpenAI, Anthropic, GCP Gemini, AWS Bedrock, Azure AI, Databricks, Mistral, Hugging Face, xAI/Grok, Aliyun/Qwen, Cerebras, and Ollama for local deployments. Switching providers requires exactly one config change — no client code modifications.

Semantic Caching and Routing

Two plugins that fundamentally change how LLM traffic behaves:

  • AI Semantic Cache: Caches LLM responses based on semantic similarity, not exact string match. If a user asks "What's the capital of France?" and another asks "Capital of France?", the second request hits the cache. Configurable with a similarity threshold (default 0.9) and any vector store (Redis, AWS MemoryDB).
  • AI Semantic Router: Routes requests to different models based on prompt semantics. Simple queries go to a fast cheap model; complex reasoning goes to a frontier model — all from a single endpoint.

PII Sanitisation and RAG Injection

Kong AI Gateway 3.10 introduced:

  • AI PII Sanitisation: Detects and redacts 20+ categories of personally identifiable information across 12 languages — including passwords, credit card numbers, API keys, and addresses. Crucially, it can reinsert redacted data in the response (a "de-redact" mode) so end users receive complete information without developers manually coding sanitisation into every app.
  • AI RAG Injector: Automatically queries a vector database on every LLM request, appends relevant context to the prompt, and sends it upstream. This eliminates per-application RAG pipeline coding — the gateway handles embedding generation, vector search, and context injection.

MCP Traffic Gateway

Started in version 3.12 and rapidly matured, Kong's MCP support is built around the AI MCP Proxy plugin, which operates in four modes:

ModeBehaviour
passthrough-listenerProxies MCP requests to upstream MCP servers
conversion-listenerConverts REST APIs → MCP tools + accepts MCP requests
conversion-onlyConverts REST APIs → MCP tools (no request handling)
listenerAggregates tools from multiple conversion plugins

The plugin reads OpenAPI schemas and dynamically generates MCP tool definitions — no additional code. ACLs can be applied at the tool level, so an agent authenticated as developer may invoke deploy_service while viewer cannot, even though both tools live on the same MCP server.

Kong 3.12 also added native OAuth 2.1 support for MCP, aligning with the MCP specification's authentication model. The gateway acts as an OAuth Resource Server, delegating token issuance to external Authorisation Servers.

Agent-to-Agent (A2A) Gateway

Announced in April 2026 with AI Gateway 3.14, Kong now supports the A2A protocol for agent-to-agent communication. This extends the same governance model — rate limiting, audit logging, cost tracking, and access control — to inter-agent traffic, making Kong the only gateway that covers LLM, MCP, and A2A traffic in a single control plane.

Kubernetes Integration

Kong runs natively on Kubernetes via the Kong Ingress Controller (KIC), now at v3.5.9. KIC supports both the standard Kubernetes Ingress resource and the Gateway API (GatewayClass, Gateway, HTTPRoute, ReferenceGrant).

helm install kong --namespace kong --create-namespace \
  --repo https://charts.konghq.com ingress

KIC translates Kubernetes resources into Kong configuration automatically — adding a Service with annotations provisions routes, plugins, and upstreams in the gateway without touching the Admin API.

For new deployments, Kong recommends the Kong Operator, which combines KIC and Kong Gateway Operator into a single way to deploy, manage, and configure Kong products on Kubernetes.

Kong in the 2026 Gateway Landscape

GatewayCore TechPluginsBest For
KongOpenResty (Nginx + LuaJIT)100+ (Lua, Go, Python, JS, Wasm)API management + AI + MCP
EnvoyC++Wasm / C++ filtersService mesh data plane
APISIXOpenResty + etcd80+ (Lua, Go, Java, Python, Wasm)High-throughput API routing
TraefikGo20+ middlewareSimple Kubernetes ingress

Kong's performance is competitive but not class-leading in raw throughput — DB-less mode achieves roughly 25,000 requests/second per core with ~2ms P99 latency. Envoy and APISIX each post higher numbers in synthetic benchmarks. In practice, backend latency (tens to hundreds of milliseconds) dominates, and Kong's plugin ecosystem and AI/MCP capabilities offset the performance gap for most deployments.

Where Kong Excels

  • API management at scale: Developer portal, API versioning, monetisation, service catalog — features no standalone proxy provides.
  • Unified AI infrastructure: One gateway for REST APIs, LLM routing, MCP servers, and agent traffic, with consistent auth and observability policies across all four.
  • Plugin customisation: Writing a Lua plugin takes minutes. The PDK is well-documented and phase-gated to prevent foot-guns.

Where It Falls Short

  • Enterprise features are paywalled: RBAC, OIDC, advanced rate limiting, Kong Manager UI — all require Kong Enterprise or Konnect subscription. The OSS edition is powerful but has clear ceilings.
  • Lua dependency: Core plugins are Lua. While external plugin servers support Go, Python, and JS, the latency overhead of out-of-process execution means performance-sensitive plugins should be Lua.
  • Memory footprint: Idles at ~80MB in DB-less mode, roughly double what Traefik or raw Nginx consume.

Deploying Kong in Production

Kong supports four deployment modes:

  1. Traditional — One Kong node with direct database access (PostgreSQL). Simplest setup for small deployments.
  2. Hybrid — Control Plane + Data Plane separation. Recommended for production: CP manages config, DP handles traffic in DB-less mode.
  3. DB-less — Single node with declarative YAML/JSON config. No database required. Configuration reloads on file change.
  4. Konnect — Managed SaaS control plane with self-hosted or cloud data planes. Includes analytics, developer portal, and multi-cloud management.

For Kubernetes, the standard path is:

# Gateway API HTTPRoute
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: my-api
spec:
  parentRefs:
    - name: kong
  rules:
    - matches:
        - path:
            type: PathPrefix
            value: /api/v1
      backendRefs:
        - name: my-service
          port: 8080

Apply rate limiting via the Kong standard plugin:

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: rate-limit
config:
  minute: 100
  policy: local
plugin: rate-limiting

Should You Run Kong in 2026?

If you need an API gateway and an AI gateway, and your infrastructure runs on Kubernetes or needs CP/DP separation, Kong is the most mature option with the deepest ecosystem. The plugin hub and PDK mean you can extend it to fit almost any traffic management pattern without writing infrastructure from scratch.

If you need maximum throughput on simple routing, APISIX or raw Envoy will outperform it on price-performance. If you need a service mesh sidecar, Envoy (via Istio) is the industry standard. But if you need a single control plane for APIs, LLMs, MCP servers, and agent traffic — Kong has no real equivalent.

The 145 releases, 340 contributors, and decade of production deployments speak to its staying power. The pivot from API proxy to AI control plane was not a rebranding exercise — the architecture, plugin system, and hybrid deployment model were designed for this expansion from day one.