A list of research papers and other related resources on Vision-Language-Action/Navigation (VLA/VLN) models for UAVs.
Contributions are welcome!
-
[Review] UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility (Information Fusion 2025.3)[paper][code]
-
OpenFly: A Comprehensive Platform for Aerial Vision-Language Navigation (arXiv 2025.7)[paper][code]
-
TypeFly: Low-Latency Drone Planning With Large Language Models (IEEE Transactions on Mobile Computing 2025.9) [paper][code]
-
Towards Realistic UAV Vision-Language Navigation: Platform, Benchmark, and Methodology (OpenUAV) (ICLR 2025)[paper][code]
-
UAV-Flow Colosseo: A Real-World Benchmark for Flying-on-a-Word UAV Imitation Learning (arXiv 2025.5)[paper][code]
-
UAV-ON: A Benchmark for Open-World Object Goal Navigation with Aerial Agents (ACM MM Dataset Track 2025)[paper][code]
-
AeroDuo: Aerial Duo for UAV-based Vision and Language Navigation (ACM MM 2025)[paper][[code]]
-
CityNav: A Large-Scale Dataset for Real-World Aerial Navigation (ICCV 2025)[paper][code]
-
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory (ACL 2025)[paper][code]
-
VLM-Nav: Mapless UAV-Navigation Using Monocular Vision Driven by Vision-Language Model (SSRN)[paper][code]
-
Learning Fine-Grained Alignment for Aerial Vision-Dialog Navigation (AAAI 2025)[paper][code]
-
UAV-VLA: Vision-Language-Action System for Large Scale Aerial Mission Generation (Int. Conf. on Human Robot Interaction, HRI 2025)[paper][code]
-
General-Purpose Aerial Intelligent Agents Empowered by Large Language Models (arXiv 2025.5)[paper][[code]]
-
RAVEN: Resilient Aerial Navigation via Open-Set Semantic Memory and Behavior Adaptation (arXiv 2025.9 "Best Paper Finalist at IROS 2025 Active Perception Workshop")[paper][project]
-
[Review] Large Language Models for UAVs: Current State and Pathways to the Future (IEEE Open Journal of Vehicular Technology 2024.8) [paper][[code]]
-
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models (arXiv 2024.8)[paper][[code]]
-
TPML: Task Planning for Multi-UAV System with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]
-
EAI-SIM: An Open-Source Embodied AI Simulation Framework with Large Language Models (2024 IEEE 18th International Conference on Control & Automation (ICCA))[paper][code]
-
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning (STMR) (Submitted to ICRA 2025)[paper][[code]]
-
Visual Agents as Fast and Slow Thinkers (ICLR 2025)[paper][code]
-
Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces (arXiv 2025)[paper][[code]]
-
Helix: A "System 1, System 2" VLA for Whole Upper Body Control (figure.ai) [link]
-
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models (Conference on Robot Learning (CoRL) 2024)[paper][project]
-
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models (Physical Intelligence (π)) (ICML 2025)[paper][blog]
-
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers (Conference on Robot Learning (CoRL) 2024)[paper][[code]]
-
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots (arXiv 2025.3)[paper][code][tech]
-
GR00T N1.5: An Improved Open Foundation Model for Generalist Humanoid Robots [tech][code][blog]