AgentNav: Zero-shot sparsely grounded long-range visual navigation in real-world cities using Multimodal Large Language Models (MLLMs).
computer-vision visual-navigation embodied-ai vision-language-navigation multimodal-llm agentnav citynav long-range-navigation
-
Updated
Jan 6, 2026 - Python