Open Source

Higress: Alibaba's Cloud-Native AI Gateway Built on Istio and Envoy

Higress is a cloud-native AI gateway from Alibaba supporting multi-model LLM proxy, token rate limiting, AI caching, and MCP server hosting.

Keeping this site alive takes effort — your support means everything.
無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分! 無程式碼也能輕鬆打造專業LINE官方帳號!一鍵導入模板,讓AI助你行銷加分!
Higress: Alibaba's Cloud-Native AI Gateway Built on Istio and Envoy

As AI applications move from prototypes to production, the infrastructure layer for managing LLM API traffic has become critical. Organizations need to route requests to the right model, control costs with token-level rate limiting, cache responses intelligently, and monitor usage across teams and applications. Higress addresses all of these needs as a cloud-native AI gateway built on the battle-tested Istio and Envoy foundations.

Developed by Alibaba, Higress extends the traditional API gateway concept with native AI capabilities. It understands LLM request semantics – tokens, models, streaming responses, and prompt structures – enabling intelligent traffic management that goes far beyond what generic API gateways can provide.

The gateway’s Istio-based architecture means it integrates seamlessly with Kubernetes environments, supporting service mesh deployment patterns, declarative configuration, and GitOps workflows. For organizations already using Istio, Higress slots into the existing infrastructure without architectural changes.


What AI-Specific Features Does Higress Offer?

Higress’s AI features are what set it apart from traditional API gateways, providing capabilities specifically designed for LLM-based applications.

graph TD
    A[Client Applications] --> B[Higress AI Gateway]
    B --> C[Multi-Model LLM Proxy]
    B --> D[Token Rate Limiting]
    B --> E[Semantic AI Cache]
    B --> F[MCP Server Hosting]
    B --> G[Prompt Management]
    C --> H[OpenAI API]
    C --> I[Anthropic API]
    C --> J[Self-Hosted Models]
    C --> K[Model Fallback Chain]
    E --> L[Semantic Cache Store]
    F --> M[MCP Tools]
AI FeaturePurposeBenefit
Multi-Model LLM ProxyRoute API calls to different modelsVendor flexibility, failover
Token-Based Rate LimitingControl API spend per keyCost governance
Semantic AI CacheCache similar prompts automaticallyReduce costs by 40-60%
MCP Server HostingHost tools via Model Context ProtocolUnified tool access
Prompt EngineeringTemplates and transformationConsistent prompts
AI ObservabilityToken counts, latency, costsUsage visibility

The semantic caching feature is particularly valuable for production deployments. When users ask similar questions, the gateway can return cached responses – not just identical ones, but semantically similar ones – dramatically reducing API costs.


How Does Higress Compare to Other API Gateways?

The API gateway landscape includes many options, but Higress’s AI-native design gives it distinct advantages for LLM workloads.

FeatureHigressKongAPISIXEnvoy (Standalone)AWS API Gateway
AI Multi-Model ProxyNativePluginPluginManual configLimited
Token Rate LimitingBuilt-inCustomCustomCustomNo
Semantic CachingBuilt-inNoNoNoNo
MCP ServerNativeNoNoNoNo
Istio IntegrationNativePluginPluginNativeN/A
Kubernetes CRDsYesYes (KIC)YesYesNo
Open SourceFullPartialFullFullNo

For teams building AI applications on Kubernetes, Higress offers the most complete out-of-the-box feature set for LLM API management, reducing the need to cobble together multiple plugins or custom middleware.


What Traditional API Gateway Features Does Higress Support?

Beyond its AI capabilities, Higress is a fully featured enterprise API gateway suitable for all service-to-service communication.

Feature CategoryCapabilities
Traffic ManagementLoad balancing, circuit breaking, retries, timeouts, rate limiting
SecurityJWT validation, OAuth2/OIDC, HMAC, basic auth, WAF integration
ObservabilityPrometheus metrics, access logging, tracing (OpenTelemetry), dashboards
Protocol SupportHTTP/1.1, HTTP/2, gRPC, WebSocket, Dubbo
DeploymentCanary, blue-green, A/B testing, weighted routing
PerformanceSub-millisecond proxy latency, hot reload of configuration

These standard gateway features combined with AI-specific capabilities make Higress a unified ingress solution that can handle both traditional microservices and AI workloads through a single control plane.


FAQ

What is Higress? Higress is a cloud-native AI gateway developed by Alibaba, built on Istio and Envoy. It provides enterprise-grade API management with native AI features including multi-model LLM proxy, token-based rate limiting, semantic caching for AI responses, MCP server hosting, and AI-specific observability.

What AI-specific features does Higress offer? Higress offers AI-specific features including: multi-model LLM proxy (route requests to different models), token-based rate limiting (cost control per API key), semantic AI caching (cache and reuse LLM responses), MCP server hosting (expose tools via Model Context Protocol), prompt engineering (prompt templates and transformation), and AI-specific metrics and logging.

Can Higress be used without AI features? Yes, Higress is a fully functional cloud-native API gateway for traditional workloads as well. It supports standard API gateway features including routing, load balancing, circuit breaking, authentication (OAuth2, JWT, OIDC), rate limiting, TLS termination, canary deployments, and gRPC proxy. The AI features are optional add-ons.

How do you get started with Higress? Higress can be deployed via Helm on Kubernetes: helm repo add higress.io https://higress.io/helm-charts and helm install higress -n higress-system higress.io/higress --create-namespace. For local testing, Docker Compose is also supported. Configuration is done through Kubernetes CRDs or a web-based console.

What enterprises use Higress in production? Higress is used by numerous enterprises within and outside of Alibaba’s ecosystem. It handles production traffic for Alibaba Cloud, Taobao, and various enterprise customers. The gateway has been battle-tested at Alibaba’s scale, processing billions of API calls daily across thousands of services.


Further Reading

TAG
CATEGORIES