An implementation of native video-audio-text interleaved multimodal architecture.
pytorch video-understanding qlora multimodal-llm llama-3 audio-visual-understanding native-architecture perceiver-resampler
-
Updated
Dec 13, 2025 - Python