Activation Locality

AI May 04, 2026

PowerInfer: High-Speed LLM Inference on Consumer GPUs via CPU-GPU Hybrid Design

Running large language models locally has always been constrained by a hard wall: GPU memory. A 175-billion parameter model in FP16 requires …