VLLM

AI May 05, 2026

vLLM：具备 PagedAttention 的高吞吐量 LLM 推理引擎

Serving LLMs in production is fundamentally a memory management problem. The KV cache — the set of attention key-value pairs stored during …

AI May 03, 2026

IndexTTS-vLLM 是 IndexTTS 文本转语音系统的加速版本，将模型的推理流水线移植到 vLLM 上运行——vLLM 是原本为大型语言模型服务开发的高性能推理引擎。结果是 TTS 推理速度提升 2.5-3.5 倍，在消费级 GPU 上实现了具有零样本语音克隆和多角色音频混合的 …