"什么是 Ray？它解决了什么问题？"

"Ray 是一个开源统一框架，用于将 Python 和 AI 应用从笔记本电脑扩展到集群。它通过提供简单的原语来抽象化分布式系统的复杂性，解决了分布式计算的基本挑战，让开发者以最少的代码变更来扩展应用。"

"Ray 生态系统的关键组件有哪些？"

"Ray 生态系统包括 Ray Core（分布式任务/参与者框架）、Ray Train（分布式训练）、Ray Serve（模型服务）、RLlib（强化学习）、Ray Data（分布式数据处理）、Ray Tune（超参数调校）和 Ray Cluster 启动器。"

"Ray Serve 如何处理 LLM 服务？"

"Ray Serve 提供高性能模型服务层，处理 HTTP 请求路由、请求批处理、扩展和模型部署。它支持具有 OpenAI 兼容 API 的 LLM 服务，与 vLLM 集成进行推理，并处理 A/B 测试和灰度部署。"

"Ray 可以在单台机器上运行还是需要集群？"

"Ray 可在单台机器上运行以进行开发，并可扩展到多节点集群以进行生产。为笔记本电脑规模工作负载编写的相同代码无需修改即可在 1000 节点集群上运行。Ray 的本地模式可使用正常的 Python 调试器进行调试。"

"谁在生产环境中使用 Ray？"

"Ray 被 OpenAI、Uber、Amazon、Shopify、LinkedIn 等许多组织用于生产环境。OpenAI 使用 Ray 训练 GPT-4 和其他前沿模型。Uber 使用 Ray 进行 ML 平台。Amazon 使用 Ray 进行推荐系统。"

Ray：用于分布式 AI 与 Python 应用的通用框架

Ray 是一个统一的分布式 AI 与 Python 应用框架，提供分布式计算、RLlib 强化学习与 LLM 服务。

Keeping this site alive takes effort — your support means everything.

無程式碼也能輕鬆打造專業LINE官方帳號！一鍵導入模板，讓AI助你行銷加分！

技术编辑团队 May 05, 2026 阅读 6 分钟

Distributed computing is the hidden tax on AI and data-intensive applications. The logic of your application — the training loop, the batch processor, the inference pipeline — is straightforward. But distributing that logic across multiple machines introduces a cascade of complexity: task scheduling, data serialization, fault tolerance, resource management, and cluster coordination.

Ray was created at UC Berkeley’s RISELab to eliminate this tax. It provides a minimal set of distributed computing primitives — tasks for stateless remote execution, actors for stateful remote computation, and a distributed object store for data sharing — that are powerful enough to build any distributed application and simple enough that a single developer can use them productively. The Ray ecosystem extends these primitives into specialized libraries for AI workloads that have become the de facto standard for production AI infrastructure.

How Does Ray’s Programming Model Simplify Distributed Computing?

Ray’s core insight is that distributed computing primitives should look like normal Python function and class definitions. The @ray.remote decorator transforms any Python function into a remote task that can execute on any machine in the cluster. The same decorator on a class creates an actor — a remote object with state that persists across calls.

Tasks are the foundation. A @ray.remote function returns a future (ObjectRef) that represents the eventual result. Ray handles scheduling — finding an available machine, serializing arguments, transferring data, executing the function, and returning the result. Dependencies between tasks are expressed naturally by passing object references as arguments — Ray constructs a dependency graph and executes tasks in parallel where possible.

Distributed Concept	Ray Primitive	Python Equivalent
Remote procedure call	`@ray.remote` on function	`def`
Remote stateful service	`@ray.remote` on class	`class`
Distributed object	`ray.put()` / `ray.get()`	Variable assignment
Parallel execution	`ray.wait()` on futures	`asyncio.wait()`
Resource management	`@ray.remote(num_gpus=1)`	N/A

Actors provide stateful computation. An actor is instantiated on a specific machine and maintains its state across method calls. This is essential for model serving (models loaded in GPU memory), simulation state (game state across iterations), and coordination services (shared counters, rate limiters).

What Is the Ray AI Runtime Ecosystem?

Built on Ray Core, the Ray AI Runtime (Ray AIR) provides specialized libraries for the full AI development lifecycle. Each library handles a specific phase while sharing Ray’s distributed runtime — meaning data processing results flow directly into training, which flows into tuning, which flows into serving, without data movement or framework switching.

Ray Data provides distributed data loading and preprocessing. It reads from S3, GCS, HDFS, or local storage, transforms data with map/batch operations, and feeds directly into training. Ray Train handles distributed training with PyTorch, TensorFlow, and JAX, managing device placement, gradient synchronization, and checkpointing. RLlib provides production-ready reinforcement learning with built-in algorithms and distributed environment execution.

Ray AIR Component	Purpose	What It Replaces
Ray Data	Distributed data processing	Spark DataFrame, Dask
Ray Train	Distributed training	Custom torch.distributed setup
Ray Tune	Hyperparameter optimization	Optuna, Hyperopt
RLlib	Reinforcement learning	OpenAI Baselines, Stable Baselines
Ray Serve	Model serving (CPU/GPU)	Custom FastAPI + Kubernetes
Ray Cluster	Cluster management	Kubernetes YAML, manual setup

Ray Tune automates hyperparameter search across a cluster, supporting Bayesian optimization, population-based training, and asynchronous hyperband. RLlib provides implementations of popular RL algorithms (PPO, SAC, DQN, APEX) that scale from single-GPU to multi-node distributed training without code changes.

How Does Ray Serve Handle Production Model Serving?

Ray Serve is the serving component of the Ray ecosystem, designed for production model deployment. It handles HTTP request routing, model loading, request batching, autoscaling, and deployment management. Serve integrates with Ray’s object store for efficient data passing and supports multi-model deployments with A/B testing and canary releases.

For LLM serving, Ray Serve integrates with vLLM and Hugging Face Text Generation Inference. It handles the complexities of continuous batching, KV cache management, and GPU memory allocation. Deployments can span multiple GPUs with automatic request routing and load balancing.

flowchart TD
    A[HTTP 請求] --> B[Ray Serve 路由器]
    B --> C[部署 1<br/>GPT-style Model]
    B --> D[部署 2<br/>Embedding Model]
    B --> E[部署 N<br/>Custom Model]
    C --> F[請求批处理]
    F --> G[vLLM 推论]
    G --> H[回应]
    D --> I[批处理]
    I --> J[嵌入推论]
    J --> K[回应]
    E --> L[自定义逻辑]
    L --> M[回应]
    H --> N[用户端]
    K --> N
    M --> N

Deployments can be updated without downtime. A canary deployment routes a percentage of traffic to a new model version while the current version handles the rest. If metrics degrade, traffic shifts back. If performance is satisfactory, the canary graduates to full deployment.

How Do You Deploy Ray in Production?

Ray supports multiple deployment modes. For development, ray start --head starts a single-node cluster on your laptop. For production, the Ray Cluster launcher handles cloud (AWS, GCP, Azure) and Kubernetes deployment. Auto-scaling cluster configurations specify minimum, maximum, and desired node counts, with Ray automatically provisioning and terminating nodes based on workload.

Kubernetes deployment uses the Ray Kubernetes Operator, which manages Ray clusters as custom resources. The operator handles cluster creation, scaling, upgrades, and failure recovery. Ray’s autoscaler adjusts cluster size based on pending tasks and available resources, scaling down to zero when idle.

Deployment Feature	Ray	Kubernetes (bare)
Task scheduling	Ray scheduler	Kubernetes scheduler
GPU management	Automatic	Manual (node selectors)
Autoscaling	Workload-based	Metrics-based (HPA)
Fault tolerance	Built-in task retry	Pod restart
Multi-tenant	Per-job resource isolation	Namespaces + quotas

Ray’s resource management ensures efficient GPU utilization. Tasks and actors declare resource requirements (num_gpus=1, memory=4GB), and Ray’s scheduler packs them onto nodes optimally. This is significantly more efficient than Kubernetes-style per-pod GPU allocation, as Ray can multiplex multiple tasks on a single GPU.

FAQ

What is Ray and what problems does it solve? Ray is an open-source unified framework for scaling Python and AI applications. It provides simple primitives (tasks, actors, objects) that abstract distributed systems complexity.

What are the key components of the Ray ecosystem? Ray Core, Ray Train, Ray Serve, RLlib, Ray Data, Ray Tune, and the Ray Cluster launcher for cloud and Kubernetes deployment.

How does Ray Serve handle LLM serving? Ray Serve provides HTTP routing, request batching, autoscaling, and model deployment with OpenAI-compatible APIs and vLLM integration for production LLM serving.

Can Ray run on a single machine? Yes. Ray runs on a single machine for development and scales to multi-node clusters for production with the same code.

Who uses Ray in production? OpenAI, Uber, Amazon, Shopify, and LinkedIn use Ray for production AI workloads, including GPT-4 training.

Ray：用于分布式 AI 与 Python 应用的通用框架

How Does Ray’s Programming Model Simplify Distributed Computing?

What Is the Ray AI Runtime Ecosystem?

How Does Ray Serve Handle Production Model Serving?

How Do You Deploy Ray in Production?

FAQ

References

LATEST POST

马斯克、库克与芬克预计本周随特朗普访中代表团赴北京

佛州大学毕业典礼演讲者遭嘘声凸显世代价值观断层与言论风险

Workday、Anthropic 与 LISC 联手推出 AI 一人创业加速器

TAG

CATEGORIES

Ray：用于分布式 AI 与 Python 应用的通用框架

How Does Ray’s Programming Model Simplify Distributed Computing?

What Is the Ray AI Runtime Ecosystem?

How Does Ray Serve Handle Production Model Serving?

How Do You Deploy Ray in Production?

FAQ

References

LATEST POST

马斯克、库克与芬克预计本周随特朗普访中代表团赴北京

佛州大学毕业典礼演讲者遭嘘声 凸显世代价值观断层与言论风险

Workday、Anthropic 与 LISC 联手推出 AI 一人创业加速器

TAG

CATEGORIES

佛州大学毕业典礼演讲者遭嘘声凸显世代价值观断层与言论风险