LLaMA-VID: An Image is Worth 2 Tokens -- Efficient Long Video Understanding with LLMs
LLaMA-VID (Large Language and Video Assistant) is an ECCV 2024 research project that tackles the fundamental bottleneck in video understanding …
LLaMA-VID (Large Language and Video Assistant) is an ECCV 2024 research project that tackles the fundamental bottleneck in video understanding …