Open Source

Higress: Alibaba's Cloud-Native AI Gateway Built on Istio and Envoy

Q: "What is Higress?"

"Higress is a cloud-native AI gateway developed by Alibaba, built on Istio and Envoy. It provides enterprise-grade API management with native AI features including multi-model LLM proxy, token-based rate limiting, semantic caching for AI responses, MCP server hosting, and AI-specific observability."

Q: "What AI-specific features does Higress offer?"

"Higress offers AI-specific features including: multi-model LLM proxy (route requests to different models), token-based rate limiting (cost control per API key), semantic AI caching (cache and reuse LLM responses), MCP server hosting (expose tools via Model Context Protocol), prompt engineering (prompt templates and transformation), and AI-specific metrics and logging."

Q: "Can Higress be used without AI features?"

"Yes, Higress is a fully functional cloud-native API gateway for traditional workloads as well. It supports standard API gateway features including routing, load balancing, circuit breaking, authentication (OAuth2, JWT, OIDC), rate limiting, TLS termination, canary deployments, and gRPC proxy. The AI features are optional add-ons."

Q: "How do you get started with Higress?"

"Higress can be deployed via Helm on Kubernetes: `helm repo add higress.io https://higress.io/helm-charts` and `helm install higress -n higress-system higress.io/higress --create-namespace`. For local testing, Docker Compose is also supported. Configuration is done through Kubernetes CRDs or a web-based console."

Q: "What enterprises use Higress in production?"

"Higress is used by numerous enterprises within and outside of Alibaba's ecosystem. It handles production traffic for Alibaba Cloud, Taobao, and various enterprise customers. The gateway has been battle-tested at Alibaba's scale, processing billions of API calls daily across thousands of services."

Higress is a cloud-native AI gateway from Alibaba supporting multi-model LLM proxy, token rate limiting, AI caching, and MCP server hosting.

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

Editorial Team May 04, 2026 5 min read

As AI applications move from prototypes to production, the infrastructure layer for managing LLM API traffic has become critical. Organizations need to route requests to the right model, control costs with token-level rate limiting, cache responses intelligently, and monitor usage across teams and applications. Higress addresses all of these needs as a cloud-native AI gateway built on the battle-tested Istio and Envoy foundations.

Developed by Alibaba, Higress extends the traditional API gateway concept with native AI capabilities. It understands LLM request semantics – tokens, models, streaming responses, and prompt structures – enabling intelligent traffic management that goes far beyond what generic API gateways can provide.

The gateway’s Istio-based architecture means it integrates seamlessly with Kubernetes environments, supporting service mesh deployment patterns, declarative configuration, and GitOps workflows. For organizations already using Istio, Higress slots into the existing infrastructure without architectural changes.

What AI-Specific Features Does Higress Offer?

Higress’s AI features are what set it apart from traditional API gateways, providing capabilities specifically designed for LLM-based applications.

graph TD
    A[Client Applications] --> B[Higress AI Gateway]
    B --> C[Multi-Model LLM Proxy]
    B --> D[Token Rate Limiting]
    B --> E[Semantic AI Cache]
    B --> F[MCP Server Hosting]
    B --> G[Prompt Management]
    C --> H[OpenAI API]
    C --> I[Anthropic API]
    C --> J[Self-Hosted Models]
    C --> K[Model Fallback Chain]
    E --> L[Semantic Cache Store]
    F --> M[MCP Tools]

AI Feature	Purpose	Benefit
Multi-Model LLM Proxy	Route API calls to different models	Vendor flexibility, failover
Token-Based Rate Limiting	Control API spend per key	Cost governance
Semantic AI Cache	Cache similar prompts automatically	Reduce costs by 40-60%
MCP Server Hosting	Host tools via Model Context Protocol	Unified tool access
Prompt Engineering	Templates and transformation	Consistent prompts
AI Observability	Token counts, latency, costs	Usage visibility

The semantic caching feature is particularly valuable for production deployments. When users ask similar questions, the gateway can return cached responses – not just identical ones, but semantically similar ones – dramatically reducing API costs.

How Does Higress Compare to Other API Gateways?

The API gateway landscape includes many options, but Higress’s AI-native design gives it distinct advantages for LLM workloads.

Feature	Higress	Kong	APISIX	Envoy (Standalone)	AWS API Gateway
AI Multi-Model Proxy	Native	Plugin	Plugin	Manual config	Limited
Token Rate Limiting	Built-in	Custom	Custom	Custom	No
Semantic Caching	Built-in	No	No	No	No
MCP Server	Native	No	No	No	No
Istio Integration	Native	Plugin	Plugin	Native	N/A
Kubernetes CRDs	Yes	Yes (KIC)	Yes	Yes	No
Open Source	Full	Partial	Full	Full	No

For teams building AI applications on Kubernetes, Higress offers the most complete out-of-the-box feature set for LLM API management, reducing the need to cobble together multiple plugins or custom middleware.

What Traditional API Gateway Features Does Higress Support?

Beyond its AI capabilities, Higress is a fully featured enterprise API gateway suitable for all service-to-service communication.

Feature Category	Capabilities
Traffic Management	Load balancing, circuit breaking, retries, timeouts, rate limiting
Security	JWT validation, OAuth2/OIDC, HMAC, basic auth, WAF integration
Observability	Prometheus metrics, access logging, tracing (OpenTelemetry), dashboards
Protocol Support	HTTP/1.1, HTTP/2, gRPC, WebSocket, Dubbo
Deployment	Canary, blue-green, A/B testing, weighted routing
Performance	Sub-millisecond proxy latency, hot reload of configuration

These standard gateway features combined with AI-specific capabilities make Higress a unified ingress solution that can handle both traditional microservices and AI workloads through a single control plane.

FAQ

What is Higress? Higress is a cloud-native AI gateway developed by Alibaba, built on Istio and Envoy. It provides enterprise-grade API management with native AI features including multi-model LLM proxy, token-based rate limiting, semantic caching for AI responses, MCP server hosting, and AI-specific observability.

What AI-specific features does Higress offer? Higress offers AI-specific features including: multi-model LLM proxy (route requests to different models), token-based rate limiting (cost control per API key), semantic AI caching (cache and reuse LLM responses), MCP server hosting (expose tools via Model Context Protocol), prompt engineering (prompt templates and transformation), and AI-specific metrics and logging.

Can Higress be used without AI features? Yes, Higress is a fully functional cloud-native API gateway for traditional workloads as well. It supports standard API gateway features including routing, load balancing, circuit breaking, authentication (OAuth2, JWT, OIDC), rate limiting, TLS termination, canary deployments, and gRPC proxy. The AI features are optional add-ons.

How do you get started with Higress? Higress can be deployed via Helm on Kubernetes: helm repo add higress.io https://higress.io/helm-charts and helm install higress -n higress-system higress.io/higress --create-namespace. For local testing, Docker Compose is also supported. Configuration is done through Kubernetes CRDs or a web-based console.

What enterprises use Higress in production? Higress is used by numerous enterprises within and outside of Alibaba’s ecosystem. It handles production traffic for Alibaba Cloud, Taobao, and various enterprise customers. The gateway has been battle-tested at Alibaba’s scale, processing billions of API calls daily across thousands of services.

Higress: Alibaba's Cloud-Native AI Gateway Built on Istio and Envoy

What AI-Specific Features Does Higress Offer?

How Does Higress Compare to Other API Gateways?

What Traditional API Gateway Features Does Higress Support?

FAQ

Further Reading

LATEST POST

Workday, Anthropic, and LISC Join Forces to Launch AI Solopreneurship Accelerato

Sensor Tower Acquires AppMagic, Filling SMB Data Analytics Gap

Musk, Cook, and Fink Expected to Join Trump's Delegation to Beijing This Week

TAG

CATEGORIES