SGDrive

SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving

Jingyu Li^1,2*, Junjie Wu^3*, Dongnan Hu^4,2, Xiangkai Huang³, Bin Sun^3†, Zhihui Hao^3†,
Xianpeng Lang³, Xiatian Zhu⁵, Li Zhang^1,2✉

¹Fudan University ²Shanghai Innovation Institute ³ Li Auto Inc. ⁴ Tongji University ⁵ University of Surrey

(*) Equal contribution. (†) Project leader. (✉) Corresponding author.

Arxiv 2025

News

Jan. 15th, 2026: We released results on NAVSIM v2 navhard_two_stage!
Jan. 09th, 2026: We released our paper on Arxiv. Code/Models are coming soon. Please stay tuned! ☕️

Updates

Release Paper
Release results on Navdard_two_stage
Release Full Models
Release Training/Evaluation Framework

Abstract

Recent end-to-end autonomous driving approaches have leveraged Vision-Language Models (VLMs) to enhance planning capabilities in complex driving scenarios. However, VLMs are inherently trained as generalist models, lacking specialized understanding of driving-specific reasoning in 3D space and time. When applied to autonomous driving, these models struggle to establish structured spatial-temporal representations that capture geometric relationships, scene context, and motion patterns critical for safe trajectory planning. To address these limitations, we propose SGDrive, a novel framework that explicitly structures the VLM's representation learning around driving-specific knowledge hierarchies. Built upon a pre-trained VLM backbone, SGDrive decomposes driving understanding into a scene-agent-goal hierarchy that mirrors human driving cognition: drivers first perceive the overall environment (scene context), then attend to safety-critical agents and their behaviors, and finally formulate short-term goals before executing actions. This hierarchical decomposition provides the structured spatial-temporal representation that generalist VLMs lack, integrating multi-level information into a compact yet comprehensive format for trajectory planning. Extensive experiments on the NAVSIM benchmark demonstrate that SGDrive achieves state-of-the-art performance among camera-only methods on both PDMS and EPDMS, validating the effectiveness of hierarchical knowledge structuring for adapting generalist VLMs to autonomous driving.

Getting Started

Checkpoint

We are still working toward achieving better results!

Results on NAVSIM v1 navtest

Method	Model Size	Training Method	PDMS	Weight Download(coming soon)
SGDrive-VLM	2B	Q&A SFT	85.5	Model
SGDrive-IL	2B	SFT	87.4	Model
SGDrive-RL	2B	RFT	91.1	Model

Results on NAVSIM v2 navtest

Method	Model Size	Training Stage	EPDMS	Weight Download(coming soon)
SGDrive-IL	2B	SFT	86.2	Model

Results on NAVSIM v2 navhard_two_stage (* denotes results reproduced with the official code repository or official checkpoint. )

Method	Model Size	Training Stage	EPDMS
DiffusionDrive*	--	IL	24.2
GTRS-DP*	--	IL	23.8
GuideFlow	--	IL	27.1
Recogdrive-IL*	2B	SFT	26.0
SGDrive-IL	2B	SFT	27.1

Qualitative Results on NAVSIM Navtest

Our qualitative results demonstrate strong alignment with ground truth across the scene–agent–goal hierarchy, indicating rich driving-world knowledge and reliable short-horizon representation.

SGDrive adaptively perceives the driving scene according to the ego-vehicle's motion state and navigation command. This demonstrates a more structured and effective representation of driving-relevant world knowledge, providing strong evidence that SGDrive successfully elicits the VLM's world-modeling ability.

We compare SGDrive (SFT) with ReCogDrive, both of which leverage structured driving-world knowledge and can extrapolate it reasonably to ensure safe and rational driving behavior. More visualizations are in the supplementary material.

Contact

If you have any questions, please contact Jingyu Li via email (jingyuli24@m.fudan.edu.cn).

Acknowledgement

SGDrive is greatly inspired by the following outstanding contributions to the open-source community: NAVSIM, RecogDrive, GR00T.

Citation

If you find SGDrive is useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.

@article{li2026sgdrive,
  title={SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving},
  author={Li, Jingyu and Wu, Junjie and Hu, Dongnan and Huang, Xiangkai and Sun, Bin and Hao, Zhihui and Lang, Xianpeng and Zhu, Xiatian and Zhang, Li},
  journal={arXiv preprint arXiv:2601.05640},
  year={2026}
}

@misc{li2026sgdrivescenetogoalhierarchicalworld,
      title={SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving}, 
      author={Jingyu Li and Junjie Wu and Dongnan Hu and Xiangkai Huang and Bin Sun and Zhihui Hao and Xianpeng Lang and Xiatian Zhu and Li Zhang},
      year={2026},
      eprint={2601.05640},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.05640}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
assets/images		assets/images
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SGDrive

SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving

News

Updates

Table of Contents

Abstract

Getting Started

Checkpoint

Qualitative Results on NAVSIM Navtest

Contact

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

LogosRoboticsGroup/SGDrive

Folders and files

Latest commit

History

Repository files navigation

SGDrive

SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving

News

Updates

Table of Contents

Abstract

Getting Started

Checkpoint

Qualitative Results on NAVSIM Navtest

Contact

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages