| MiniVLN: Efficient Vision-and-Language Navigation by Progressive Knowledge Distillation | 依庭 | 2026/3/30 |
| Run-time Observation Interventions Make Vision-Language-Action Models More Visually Robust | Teresa | 2026/3/30 |
| SpatialBot: Precise Spatial Understanding with Vision Language Models | Munir | 2026/4/06 |
| Dur360BEV: A Real-world 360-degree Single Camera Dataset and Benchmark for Bird-Eye View Mapping in Autonomous Driving | 桂茹 | 2026/4/06 |
| Domain Adaptation-Based Crossmodal Knowledge Distillation for 3D Semantic Segmentation | 旻璇 | 2026/4/13 |
| Towards Robust Autonomous Driving: Conditional Multimodal Large Language Models for Fine-Grained Perception | 新哲 | 2026/4/13 |
| DP-Habitat: Bridging the Gap Between Simulation and Reality for Visual Navigation in Dynamic Pedestrian Environments | 宇廷 | 2026/4/20 |
| A Comprehensive LLM-powered Framework for Driving Intelligence Evaluation | 汶璇 | 2026/4/20 |