Awesome-LLM-for-Autonomous-Driving-Resources

This is a collection of research papers about LLM-for-Autonomous-Driving(LLM4AD). And the repository will be continuously updated to track the frontier of LLM4AD. Maintained by SJTU-ReThinklab.

Welcome to follow and star! If you find any related materials could be helpful, feel free to contact us ([email protected] or [email protected]) or make a PR.

Citation

Our survey paper is at https://arxiv.org/abs/2311.01043 which includes more detailed discussions and we will continuously update it as well.

If you find our repo is helpful, please consider cite it.

@misc{yang2023survey,
      title={LLM4Drive: A Survey of Large Language Models for Autonomous Driving}, 
      author={Zhenjie Yang and Xiaosong Jia and Hongyang Li and Junchi Yan},
      year={2023},
      eprint={2311.01043},
      archivePrefix={arXiv},
      primaryClass={cs.AI}
}

Awesome LLM-for-Autonomous-Driving(LLM4AD)

Overview of LLM4AD

LLM-for-Autonomous-Driving (LLM4AD) refers to the application of Large Language Models(LLMs) in autonomous driving. We divide existing works based on the perspective of applying LLMs: planning, perception, question answering, and generation.

Motivation of LLM4AD

The orange circle represents the ideal level of driving competence, akin to that possessed by an experienced human driver. There are two main methods to acquire such proficiency: one, through learning-based techniques within simulated environments; and two, by learning from offline data through similar methodologies. It’s important to note that due to discrepancies between simulations and the real-world, these two domains are not fully the same, i.e. sim2real gap. Concurrently, offline data serves as a subset of real-world data since it’s collected directly from actual surroundings. However, it is difficult to fully cover the distribution as well due to the notorious long-tailed nature of autonomous driving tasks. The final goal of autonomous driving is to elevate driving abilities from a basic green stage to a more advanced blue level through extensive data collection and deep learning.

ICLR 2024 Under Review

Toggle

format:
- [title](paper link) [links]
  - task
  - keyword
  - code or project page
  - datasets or environment or simulator
  - summary

Large Language Models as Decision Makers for Autonomous Driving
- Keywords: Large language model, Autonomous driving
- Previous summary
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
- Keywords: Interpretable autonomous driving, large language model, robotics, computer vision
- Previous summary
BEV-CLIP: Multi-modal BEV Retrieval Methodology for Complex Scene in Autonomous Driving
- Keywords: Autonomous Driving, BEV, Retrieval, Multi-modal, LLM, prompt learning
- Task: Contrastive learning, Retrieval tasks
- Datasets: nuScenes
- Summary:
  - Propose a multimodal retrieval method powered by LLM and knowledge graph to achieve contrastive learning between text description and BEV feature retrieval for autonomous driving.
LangProp: A code optimization framework using Language Models applied to driving
- Keywords: optimization, autonomous driving, Large Language Models, code generation
- Task: Code generation, Planning
- Code: LangProp
- Env: CARLA
- Summary:
  - LangProp is a framework for iteratively optimizing code generated by large language models (LLMs) in a supervised/reinforcement learning setting.
GPT-Driver: Learning to Drive with GPT
- Keywords: Motion Planning, Autonomous Driving, Large Language Models (LLMs), GPT
- Previous summary
Radar Spectra-language Model for Automotive Scene Parsing
- Keywords: radar spectra, radar perception, radar object detection, free space segmentation, autonomous driving, radar classification
- Task: Detection
- Datasets:
  - RADIal, CRUW, nuScenes
  - For RADIal and CRUW, both images and ground truth labels are used. From nuScenes, only images are taken.
  - Random captions for a frame from CRUW dataset based on ground truth object positions and pseudo ground-truth classes. (not open)
- Summary:
  - Conduct a benchmark comparison of off-the-shelf vision-language models (VLMs) for classification in automotive scenes.
  - Propose to fine-tune a large VLM specially for automated driving scenes.
GeoDiffusion: Text-Prompted Geometric Control for Object Detection Data Generation
- Keywords: diffusion model, controllable generation, object detection, autonomous driving
- Task: Detection, Data generation
- Datasets: nuScenes, which consists of 60K training samples and 15K validation samples with high-quality bounding box annotations from 10 semantic classes.
- Summary:
  - propose GEODIFFUSION, an embarrassing simple framework to integrate geometric controls into pre-trained diffusion models for detection data generation via text prompts.
SPOT: Scalable 3D Pre-training via Occupancy Prediction for Autonomous Driving
- Keywords: 3D pre-training, object detection, autonomous driving
- Task: Detection
- Summary:
  - propose SPOT a scalable 3D pre-training paradigm for LiDAR pretraining.
3D DENSE CAPTIONING BEYOND NOUNS: A MIDDLEWARE FOR AUTONOMOUS DRIVING
- Keywords: Autonomous Driving, Dense Captioning, Foundation model
- Task: Caption， Dataset Construction
- Datasets: nuScenes
- Summary:
  - Design a scalable rule-based auto-labelling methodology to generate 3D dense captioning.
  - Construct a large-scale dataset nuDesign based upon nuScenes, which consists of an unprecedented number of 2300k sentences.

Papers

Toggle

format:
- [title](paper link) [links]
  - author1, author2, and author3...
  - publisher
  - task
  - keyword
  - code or project page
  - datasets or environment or simulator
  - publish date
  - summary
  - metrics

VLP: Vision Language Planning for Autonomous Driving
- Chenbin Pan, Burhaneddin Yaman, Tommaso Nesti, Abhirup Mallik, Alessandro G Allievi, Senem Velipasalar, Liu Ren
- Publisher: Syracuse University, Bosch Research North America & Bosch Center for Artificial Intelligence (BCAI)
- Publish Date: 2023.01.14
- Datasets: nuScenes
- Summary:
  - Propose VLP, a Vision Language Planning model, which is composed of novel components ALP and SLP, aiming to improve the ADS from self-driving BEV reasoning and self-driving decision-making aspects, respectively.
  - ALP(agent-wise learning paradigm) aligns the produced BEV with a true bird’s-eye-view map.
  - SLP(selfdriving-car-centric learning paradigm) aligns the ego-vehicle query feature with the ego-vehicle textual planning feature.
DME-Driver: Integrating Human Decision Logic and 3D Scene Perception in Autonomous Driving
- Wencheng Han, Dongqian Guo, Cheng-Zhong Xu, and Jianbing Shen
- Publisher: SKL-IOTSC, CIS, University of Macau
- Publish Date: 2023.01.08
- Summary:
  - DME-Driver = Decision-Maker + Executor + CL
  - Executor network which is based on UniAD incorporates textual information for the OccFormer and the Planning module.
  - Decision-Maker which is based on LLaVA process inputs from three different modalities: visual inputs from the current and previous scenes textual inputs in the form of prompts, and current status information detailing the vehicle’s operating state.
  - CL is a consistency loss mechanism, slightly reducing performance metrics but significantly enhancing decision alignment between Executor and Decision-Maker.
Holistic Autonomous Driving Understanding by Bird’s-Eye-View Injected Multi-Modal Large Models
- Xinpeng Ding, Jinahua Han, Hang Xu, Xiaodan Liang, Wei Zhang, Xiaomeng Li
- Publisher: Hong Kong University of Science and Technology, Huawei Noah’s Ark Lab, Sun Yat-Sen University
- Publish Date: 2023.12.21
- Task: Datasets + VQA
- Code: official
- Summary:
  - Introduce NuInstruct, a novel dataset with 91K multi-view video-QA pairs across 17 subtasks, which based on nuScenes.
  - Propose BEV-InMLMM to integrate instructionaware BEV features with existing MLLMs, enhancing them with a full suite of information, including temporal, multi-view, and spatial details.
LLM-ASSIST: Enhancing Closed-Loop Planning with Language-Based Reasoning
- S P Sharan, Francesco Pittaluga, Vijay Kumar B G, Manmohan Chandraker
- Publisher: UT Austin， NEC Labs America， UC San Diego
- Publish Date: 2023.12.30
- Task: Planning
- Env/Datasets: nuPlan Closed-Loop Non-Reactive Challenge
- Project: LLM-ASSIST
- Summary:
  - LLM-Planner takes over scenarios that PDM-Closed cannot handle
  - Propose two LLM-based planners.
    - LLM-ASSIST(unc) considers the most unconstrained version of the planning problem, in which the LLM must directly return a safe future trajectory for the ego car.
    - LLM-ASSIST(par) considers a parameterized version of the planning problem, in which the LLM must only return a set of parameters for a rule-based planner, PDM-Closed.
DriveLM: Driving with Graph Visual Question Answering
- Chonghao Sima, Katrin Renz, Kashyap Chitta, Li Chen, Hanxue Zhang, Chengen Xie, Ping Luo, Andreas Geiger, Hongyang Li
- Publisher: OpenDriveLab, University of Tübingen, Tübingen AI Center, University of Hong Kong
- Code: official
- Publish Date: 2023.12.21
- Summary:
  - DriveLM-Task
    - Graph VQA involves formulating P1-3(Perception, Prediction, Planning) reasoning as a series of questionanswer pairs (QAs) in a directed graph.
  - DriveLM-Data
    - DriveLM-Carla
      - Collect data using CARLA 0.9.14 in the Leaderboard 2.0 framework [17] with a privileged rule-based expert.
    - Drive-nuScenes
      - Selecting key frames from video clips, choosing key objects within these key frames, and subsequently annotating the frame-level P1−3 QAs for these key objects. A portion of the Perception QAs are generated from the nuScenes and OpenLane-V2 ground truth, while the remaining QAs are manually annotated.
  - DriveLM-Agent
    - DriveLMAgent is built upon a general vision-language model and can therefore exploit underlying knowledge gained during pre-training.
LingoQA: Video Question Answering for Autonomous Driving
- Ana-Maria Marcu, Long Chen, Jan Hünermann, Alice Karnsund, Benoit Hanotte, Prajwal Chidananda, Saurabh Nair, Vijay Badrinarayanan, Alex Kendall, Jamie Shotton, Oleg Sinavski
- Publisher: Wayve
- Task: VQA + Evaluation/Datasets
- Code: official
- Publish Date: 2023.12.21
- Summary:
  - Introduce a novel benchmark for autonomous driving video QA using a learned text classifier for evaluation.
  - Introduce a Video QA dataset of central London consisting of 419k samples with its free-form questions and answers.
  - Establish a new baseline based on Vicuna-1.5-7B for this field with an identified model combination.
DriveMLM: Aligning Multi-Modal Large Language Models with Behavioral Planning States for Autonomous Driving
- Wenhai Wang, Jiangwei Xie, ChuanYang Hu, Haoming Zou, Jianan Fan, Wenwen Tong, Yang Wen, Silei Wu, Hanming Deng, Zhiqi Li, Hao Tian, Lewei Lu, Xizhou Zhu, Xiaogang Wang, Yu Qiao, Jifeng Dai
- Publisher: OpenGVLab, Shanghai AI Laboratory, The Chinese University of Hong Kong, SenseTime Research, Stanford University, Nanjing University, Tsinghua University
- Task: Planning + Explanation
- Code: official
- Env: Carla
- Publish Date: 2023.12.14
- Summary:
  - DriveMLM, the first LLM-based AD framework that can perform close-loop autonomous driving in realistic simulators.
  - Design an MLLM planner for decision prediction, and develop a data engine that can effectively generate decision states and corresponding explanation annotation for model training and evaluation.
  - Achieve 76.1 DS, 0.955 MPI results on CARLA Town05 Long.
Large Language Models for Autonomous Driving: Real-World Experiments
- Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li, Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao, Ziran Wang, Chao Zheng
- Publisher: Purdue University
- Publish Date: 2023.12.14
- Project: official
- Summary:
  - Introduce a Large Language Model (LLM)-based framework Talk-to-Drive (Talk2Drive) to process verbal commands from humans and make autonomous driving decisions with contextual information, satisfying their personalized preferences for safety, efficiency, and comfort.
LMDrive: Closed-Loop End-to-End Driving with Large Language Models
- Hao Shao, Yuxuan Hu, Letian Wang, Steven L. Waslander, Yu Liu, Hongsheng Li
- Publisher: CUHK MMLab, SenseTime Research, CPII under InnoHK, University of Toronto, Shanghai Artificial Intelligence Laboratory
- Task: Planning + Datasets
- Code: official
- Env: Carla
- Publish Date: 2023.12.12
- Summary:
  - LMDrive, a novel end-to-end, closed-loop, languagebased autonomous driving framework.
  - Release 64K clips dataset, including navigation instruction, notice instructions, multi-modal multi-view sensor data, and control signals.
  - Present the benchmark LangAuto for evaluating the autonomous agents.
Evaluation of Large Language Models for Decision Making in Autonomous Driving
- Kotaro Tanahashi, Yuichi Inoue, Yu Yamaguchi, Hidetatsu Yaginuma, Daiki Shiotsuka, Hiroyuki Shimatani, Kohei Iwamasa, Yoshiaki Inoue, Takafumi Yamaguchi, Koki Igari, Tsukasa Horinouchi, Kento Tokuhiro, Yugo Tokuchi, Shunsuke Aoki
- Publisher: Turing Inc., Japan
- Task: Evalution
- Publish Date: 2023.12.11
- Summary:
  - Evaluate the two core capabilities
    - the spatial awareness decision-making ability, that is, LLMs can accurately identify the spatial layout based on coordinate information;
    - the ability to follow traffic rules to ensure that LLMs Ability to strictly abide by traffic laws while driving
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
- Yunsheng Ma, Can Cui, Xu Cao, Wenqian Ye, Peiran Liu, Juanwu Lu, Amr Abdelraouf, Rohit Gupta, Kyungtae Han, Aniket Bera, James M. Rehg, Ziran Wang
- Publisher: Purdue University, University of Illinois Urbana-Champaign, University of Virginia, InfoTech Labs, Toyota Motor North American
- Task: Benchmark
- Publish Date: 2023.12.07
- Summary:
  - LaMPilot is the first interactive environment and dataset designed for evaluating LLM-based agents in a driving context.
  - It contains 4.9K scenes and is specifically designed to evaluate command tracking tasks in autonomous driving.
Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving
- Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, Li Zhang
- Publisher: Fudan University, Huawei Noah’s Ark Lab
- Task: VQA + Datasets
- Code: official
- Datasets:
  - nuScenes
  - Waymo
  - ONCE
- Publish Date: 2023.12.06
- Summary:
  - Reason2Drive, a benchmark dataset with over 600K video-text pairs, aimed at facilitating the study of interpretable reasoning in complex driving.
  - Introduce a novel evaluation metric to assess chain-based reasoning performance in autonomous driving environments, and address the semantic ambiguities of existing metrics such as BLEU and CIDEr.
  - Introduce a straightforward yet effective framework that enhances existing VLMs with two new components: a prior tokenizer and an instructed vision decoder.
Dolphins: Multimodal Language Model for Driving
- Yingzi Ma, Yulong Cao, Jiachen Sun, Marco Pavone, Chaowei Xiao
- Publisher: University of Wisconsin-Madison, NVIDIA, University of Michigan, Stanford University
- Task: VQA
- Project: Dolphins
- Code: Dolphins
- Datasets:
  - Image instruction-following dataset
    - GQA
    - MSCOCO: VQAv2, OK-VQA, TDIUC, Visual Genome dataset
  - Video instruction-following dataset
    - BDD-X
- Publish Date: 2023.12.01
- Summary:
  - Dolphins which is base on OpenFlamingo architecture is a VLM-based conversational driving assistant.
  - Devise grounded CoT (GCoT) instruction tuning and develop datasets.
Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving
- Yuqi Wang, Jiawei He, Lue Fan, Hongxin Li, Yuntao Chen, Zhaoxiang Zhang
- Publisher: CASIA, CAIR, HKISI, CAS
- Task: Generation
- Project: Drive-WM
- Code: Drive-WM
- Datasets: nuScenes, Waymo Open Dataset
- Publish Date: 2023.11.29
- Summary:
  - Drive-WM, a multiview world model, which is capable of generating high-quality, controllable, and consistent multiview videos in autonomous driving scenes.
  - The first to explore the potential application of the world model in end-to-end planning for autonomous driving.
Empowering Autonomous Driving with Large Language Models: A Safety Perspective
- Yixuan Wang, Ruochen Jiao, Chengtian Lang, Sinong Simon Zhan, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu
- Publisher: Northwestern University, University of Liverpool, Yale University
- Task: Planning
- Env: HighwayEnv
- Code: official
- Publish Date: 2023.11.28
- Summary:
  - Deploys the LLM as an intelligent decision-maker in planning, incorporating safety verifiers for contextual safety learning to enhance overall AD performance and safety.
GPT-4V Takes the Wheel: Evaluating Promise and Challenges for Pedestrian Behavior Prediction
- Jia Huang, Peng Jiang, Alvika Gautam, Srikanth Saripalli
- Publisher: Texas A&M University, College Station, USA
- Task: Evaluation(Pedestrian Behavior Prediction)
- Datasets:
  - JAAD
  - PIE
  - WiDEVIEW
- Summary:
  - Provides a comprehensive evaluation of the potential of GPT-4V for pedestrian behavior prediction in autonomous driving using publicly available datasets.
  - It still falls short of the state-of-the-art traditional domain-specific models.
  - While GPT-4V represents a considerable advancement in AI capabilities for pedestrian behavior prediction, ongoing development and refinement are necessary to fully harness its capabilities in practical applications.
ADriver-I: A General World Model for Autonomous Driving
- Fan Jia, Weixin Mao, Yingfei Liu, Yucheng Zhao, Yuqing Wen, Chi Zhang, Xiangyu Zhang, Tiancai Wang
- Publisher: MEGVII Technology, Waseda University, University of Science and Technology of China, Mach Drive
- Task: Generation + Planning
- Datasets: nuScenes, Largescale private datasets
- Publish Date: 2023.11.22
- Summary:
  - ADriver-I takes the vision-action pairs as inputs and autoregressively predicts the control signal of current frame. The generated control signals together with the historical vision-action pairs are further conditioned to predict the future frames.
  - MLLM(Multimodal large language model)=LLaVA-7B-1.5, VDM(Video Diffusion Model)=latent-diffusion
- Metrics:
  - L1 error including speed and steer angle of current frame.
  - Quality of Generation: Frechet Inception Distance(FID), Frechet Video Distance(FVD).
A Language Agent for Autonomous Driving
- Jiageng Mao, Junjie Ye, Yuxi Qian, Marco Pavone, Yue Wang
- University of Southern California, Stanford University, NVIDIA
- Task: Generation + Planning
- Project: Agent-Driver
- Datasets: nuScenes
- Publish Date: 2023.11.17
- Summary:
  - Agent-Driver integrates a tool library for dynamic perception and prediction, a cognitive memory for human knowledge, and a reasoning engine that emulates human decision-making.
  - For motion planning, follow GPT-Driver(#GPT-Driver) and fine-tune the LLM with human driving trajectories in the nuScenes training set for one epoch.
  - For neural modules, adopte the modules in UniAD.
- Metric:
  - L2 error (in meters) and collision rate (in percentage).
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
- Yi Yang, Qingwen Zhang, Ci Li, Daniel Simões Marta, Nazre Batool, John Folkesson
- Publisher: KTH Royal Institute of Technology, Scania AB
- Task: QA
- Code: DriveCmd
- Datasets: UCU Dataset
- Publish Date: 2023.11.14
- Summary:
  - Propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users’ commands.
  - LLVM-AD Workshop @ WACV 2024
- Metric:
  - Accuracy at the question level(accuracy for each individual question).
  - Accuracy at the command level(accuracy is only acknowledged if all questions for a particular command are correctly identified).
On the Road with GPT-4V(ision): Early Explorations of Visual-Language Model on Autonomous Driving
- Licheng Wen, Xuemeng Yang, Daocheng Fu, Xiaofeng Wang, Pinlong Cai, Xin Li, Tao Ma, Yingxuan Li, Linran Xu, Dengke Shang, Zheng Zhu, Shaoyan Sun, Yeqi Bai, Xinyu Cai, Min Dou, Shuanglu Hu, Botian Shi
- Publisher: Shanghai Artificial Intelligence Laboratory, GigaAI, East China Normal University, The Chinese University of Hong Kong, WeRide.ai
- Project: official
- Datasets:
  - Scenario Understanding: nuScenes, BDD-X, Carla, TSDD, Waymo, DAIR-V2X, CitySim.
  - Reasoning Capability: nuScenes, D2-city, Carla, CODA and the internet
  - Act as a driver: Real-world driving scenarios.
- Publish Date: 2023.11.9
- Summary:
  - Conducted a comprehensive and multi-faceted evaluation of the GPT-4V in various autonomous driving scenarios.
  - Test the capabilities of GPT-4V in Scenario Understanding, Reasoning, Act as a driver.
ChatGPT as Your Vehicle Co-Pilot: An Initial Attempt
- Shiyi Wang, Yuxuan Zhu, Zhiheng Li, Yutong Wang, Li Li, Zhengbing He
- Publisher: Tsinghua University, Institute of Automation, Chinese Academy of Sciences, Massachusetts Institute of Technology
- Task: Planning
- Publish Date: 2023.10.17
- Summary:
  - Design a universal framework that embeds LLMs as a vehicle "Co-Pilot" of driving, which can accomplish specific driving tasks with human intention satisfied based on the information provided.
MagicDrive: Street View Generation with Diverse 3D Geometry Control
- Ruiyuan Gao, Kai Chen, Enze Xie, Lanqing Hong, Zhenguo Li, Dit-Yan Yeung, Qiang Xu
- Publisher: The Chinese University of Hong Kong, Hong Kong University of Science and Technology, Huawei Noah’s Ark Lab
- Task: Generation
- Project: MagicDrive
- Code: MagicDrive
- Datasets: nuScenes
- Publish Date: 2023.10.13
- Summary:
  - MagicDrive generates highly realistic images, exploiting geometric information from 3D annotations by independently encoding road maps, object boxes, and camera parameters for precise, geometry-guided synthesis. This approach effectively solves the challenge of multi-camera view consistency.
  - It also faces huge challenges in some complex scenes, such as night views and unseen weather conditions.
Receive, Reason, and React: Drive as You Say with Large Language Models in Autonomous Vehicles
- Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang
- Publisher: Purdue University, University of Illinois Urbana-Champaign，University of Virginia，PediaMed.AI.
- Task: Planning
- Project: video
- Env: HighwayEnv
- Publish Date: 2023.10.12
- Summary:
  - Utilize LLMs’ linguistic and contextual understanding abilities with specialized tools to integrate the language and reasoning capabilities of LLMs into autonomous vehicles.
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model
- Xiaofan Li, Yifu Zhang, Xiaoqing Ye
- Publisher: Baidu Inc.
- Task: Generation
- Project: official
- Datasets: nuScenes
- Summary:
  - Address the new problem of multi-view video data generation from 3D layout in complex urban scenes.'
  - Propose a generative model DrivingDiffusion to ensure the cross-view, cross-frame consistency and the instance quality of the generated videos.
  - Achieve state-of-the-art video synthesis performance on nuScenes dataset.
- Metrics:
  - Quality of Generation: Frechet Inception Distance(FID), Frechet Video Distance(FVD)
  - Segmentation Metrics: mIoU
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
- Hao Sha, Yao Mu, Yuxuan Jiang, Li Chen, Chenfeng Xu, Ping Luo, Shengbo Eben Li, Masayoshi Tomizuka, Wei Zhan, Mingyu Ding
- Publisher: Tsinghua University, The University of Hong Kong, University of California, Berkeley
- Task: Planning/Control
- Code: official
- Env:
  - ComplexUrbanScenarios
  - Carla
- Publish Date: 2023.10.04
- Summary:
  - Leverage LLMs to provide high-level decisions through chain-of-thought.
  - Convert high-level decisions into mathematical representations to guide the bottom-level controller(MPC).
  - Metrics: Number of failure/collision cases， Inefficiency，time, Penalty
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
- Long Chen, Oleg Sinavski, Jan Hünermann, Alice Karnsund, Andrew James Willmott, Danny Birch, Daniel Maund, Jamie Shotton
- Publisher: Wayve
- Task: Planning + VQA
- Code: official
- Simulator: a custom-built realistic 2D simulator.(The simulator is not open source.)
- Datasets: Driving QA, data collection using RL experts in simulator.
- Publish Date: 2023.10.03
- Summary:
  - Propose a unique object-level multimodal LLM architecture(Llama2+Lora), using only vectorized representations as input.
  - Develop a new dataset of 160k QA pairs derived from 10k driving scenarios(control commands collected by RL(PPO), QA pair generated by GPT-3.5)
  - Metrics:
    - Accuracy of traffic light detection
    - MAE for traffic light distance prediction
    - MAE for acceleration
    - MAE for brake pressure
    - MAE for steering wheel angle
Talk2BEV: Language-enhanced Bird’s-eye View Maps for Autonomous Driving
- Vikrant Dewangan, Tushar Choudhary, Shivam Chandhok, Shubham Priyadarshan, Anushka Jain, Arun K. Singh, Siddharth Srivastava, Krishna Murthy Jatavallabhula, K. Madhava Krishna
- Publisher: IIIT Hyderabad, University of British Columbia, University of Tartu, TensorTour Inc, MIT
- Project Page: official
- Code: Talk2BEV
- Publish Date: 2023.10.03
- Summary:
  - Introduces Talk2BEV, a large visionlanguage model (LVLM) interface for bird’s-eye view (BEV) maps in autonomous driving contexts.
  - Does not require any training or finetuning, relying instead on pre-trained image-language models
  - Develop and release Talk2BEV-Bench, a benchmark encom- passing 1000 human-annotated BEV scenarios, with more than 20,000 questions and ground-truth responses from the NuScenes dataset.
DriveGPT4: Interpretable End-to-end Autonomous Driving via Large Language Model
- Zhenhua Xu, Yujia Zhang, Enze Xie, Zhen Zhao, Yong Guo, Kenneth K. Y. Wong, Zhenguo Li, Hengshuang Zhao
- Publisher: The University of Hong Kong, Zhejiang University, Huawei Noah’s Ark Lab, University of Sydney
- Project Page: official
- Task: Planning/Control + VQA
- Datasets:
  - BDD-X dataset.
- Publish Date: 2023.10.02
- Summary:
  - Develop a new visual instruction tuning dataset(based on BDD-X) for interpretable AD assisted by ChatGPT/GPT4.
  - Present a novel multimodal LLM called DriveGPT4(Valley + LLaVA).
- Metrics:
  - BLEU4, CIDEr and METETOR, ChatGPT Score.
  - RMSE for control signal prediction.
GPT-DRIVER: LEARNING TO DRIVE WITH GPT
- Jiageng Mao, Yuxi Qian, Hang Zhao, Yue Wang
- Publisher: University of Southern California, Tsinghua University
- Task: Planning(Fine-tuning Pre-trained Model)
- Project: official
- Datasets: nuScenes
- Code: GPT-Driver
- Publish Date: 2023.10.02
- Summary:
  - Motion planning as a language modeling problem.
  - Align the output of the LLM with human driving behavior through fine-tuning strategies using the OpenAI fine-tuning API.
  - Leverage the LLM to generate driving trajectories.
- Metrics:
  - L2 metric and Collision rate
GAIA-1: A Generative World Model for Autonomous Driving
- Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, Gianluca Corrado
- Publisher: Wayve
- Task: Generation
- Datasets:
  - Training dataset consists of 4,700 hours at 25Hz of proprietary driving data collected in London, UK between 2019 and 2023. It corresponds to approximately 420M unique images.
  - Validation dataset contains 400 hours of driving data from runs not included in the training set.
  - text coming from either online narration or offline metadata sources
- Publish Date: 2023.09.29
- Summary:
  - Introduce GAIA-1, a generative world model that leverages video(pre-trained DINO), text(T5-large), and action inputs to generate realistic driving scenarios.
  - Serve as a valuable neural simulator, allowing the generation of unlimited data.
DiLu: A Knowledge-Driven Approach to Autonomous Driving with Large Language Models
- Licheng Wen, Daocheng Fu, Xin Li, Xinyu Cai, Tao Ma, Pinlong Cai, Min Dou, Botian Shi, Liang He, Yu Qiao
- Publisher: Shanghai AI Laboratory, East China Normal University, The Chinese University of Hong Kong
- Publish Date: 2023.09.28
- Task: Planning
- Env:
  - HighwayEnv
  - CitySim, a Drone-Based vehicle trajectory dataset.
- Summary:
  - Propose the DiLu framework, which combines a Reasoning and a Reflection module to enable the system to perform decision-making based on common-sense knowledge and evolve continuously.
SurrealDriver: Designing Generative Driver Agent Simulation Framework in Urban Contexts based on Large Language Model
- Ye Jin, Xiaoxi Shen, Huiling Peng, Xiaoan Liu, Jingli Qin, Jiayang Li, Jintao Xie, Peizhong Gao, Guyue Zhou, Jiangtao Gong
- Keywords: human-AI interaction, driver model, agent, generative AI, large language model, simulation framework
- Env: CARLA
- Publisher: Tsinghua University
- Summary: Propose a generative driver agent simulation framework based on large language models (LLMs), capable of perceiving complex traffic scenarios and providing realistic driving maneuvers.
Drive as You Speak: Enabling Human-Like Interaction with Large Language Models in Autonomous Vehicles
- Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Ziran Wang
- Publisher: Purdue University, PediaMed.AI Lab, University of Virginia
- Task: Planning
- Publish Date: 2023.09.18
- Summary:
  - Provide a comprehensive framework for integrating Large Language Models (LLMs) into AD.
DriveDreamer: Towards Real-world-driven World Models for Autonomous Driving
- Xiaofeng Wang, Zheng Zhu, Guan Huang, Xinze Chen, Jiwen Lu
- Publisher: GigaAI, Tsinghua University
- Task: Generation
- Project Page: official
- Datasets: nuScenes
- Publish Date: 2023.09.18
- Summary:
  - Harness the powerful diffusion model to construct a comprehensive representation of the complex environment.
  - Generate future driving videos and driving policies by a multimodal(text, image, HDMap, Action, 3DBox) world model.
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving
- Ali Keysan, Andreas Look, Eitan Kosman, Gonca Gürsun, Jörg Wagner, Yu Yao, Barbara Rakitsch
- Publisher: Bosch Center for Artificial Intelligence, University of Tubingen,
- Task: Prediction
- Datasets: nuScenes
- Publish Date: 2023.09.13
- Summary:
  - Integrating pre-trained language models as textbased input encoders for the AD trajectory prediction task.
- Metrics:
  - minimum Average Displacement Error (minADEk)
  - Final Displacement Error (minFDEk)
  - MissRate over 2 meters
TrafficGPT: Viewing, Processing and Interacting with Traffic Foundation Models
- Siyao Zhang, Daocheng Fu, Zhao Zhang, Bin Yu, Pinlong Cai
- Publisher: Beihang University, Key Laboratory of Intelligent Transportation Technology and System, Shanghai Artificial Intelligence Laboratory
- Task: Planning
- Code: official
- Publish Date: 2023.09.13
- Summary:
  - Present TrafficGPT—a fusion of ChatGPT and traffic foundation models.
  - Bridges the critical gap between large language models and traffic foundation models by defining a series of prompts.
HiLM-D: Towards High-Resolution Understanding in Multimodal Large Language Models for Autonomous Driving
- Xinpeng Ding, Jianhua Han, Hang Xu, Wei Zhang, Xiaomeng Li
- Publisher: The Hong Kong University of Science and Technology, Huawei Noah’s Ark Lab
- Task: Detection + VQA
- Datasets: DRAMA
- Publish Date: 2023.09.11
- Summary:
  - Propose HiLM-D (Towards High-Resolution Understanding in MLLMs for Autonomous Driving), an efficient method to incorporate HR information into MLLMs for the ROLISP task.
  - ROLISP that aims to identify, explain and localize the risk object for the ego-vehicle meanwhile predicting its intention and giving suggestions.
- Metrics:
  - LLM metrics, BLEU4, CIDEr and METETOR, SPICE.
  - Detection metrics, mIoU, IoUs so on.
Language Prompt for Autonomous Driving
- Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, Jianbing Shen
- Publisher: Beijing Institute of Technology, University of Macau, MEGVII Technology, Beijing Academy of Artificial Intelligence
- Task: Tracking
- Code: official
- Datasets: NuPrompt(not open), based on nuScenes.
- Publish Date: 2023.09.08
- Summary:
  - Propose a new large-scale language prompt set(based on nuScenes) for driving scenes, named NuPrompt(3D object-text pairs).
  - Propose an efficient prompt-based tracking model with prompt reasoning modification on PFTrack, called PromptTrack.
MTD-GPT: A Multi-Task Decision-Making GPT Model for Autonomous Driving at Unsignalized Intersections
- Jiaqi Liu, Peng Hang, Xiao Qi, Jianqiang Wang, Jian Sun. ITSC 2023
- Publisher: Tongji University, Tsinghua University
- Task: Prediction
- Env: HighwayEnv
- Publish Date: 2023.07.30
- Summary:
  - Design a pipeline that leverages RL algorithms to train single-task decision-making experts and utilize expert data.
  - Propose the MTD-GPT model for multi-task(left-turn, straight-through, right-turn) decision-making of AV at unsignalized intersections.
Domain Knowledge Distillation from Large Language Model: An Empirical Study in the Autonomous Driving Domain
- Yun Tang, Antonio A. Bruto da Costa, Xizhe Zhang, Irvine Patrick, Siddartha Khastgir, Paul Jennings. ITSC 2023
- Publisher: University of Warwick
- Task: QA
- Publish Date: 2023.07.17
- Summary:
  - Develop a web-based distillation assistant enabling supervision and flexible intervention at runtime by prompt engineering and the LLM ChatGPT.
Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
- Daocheng Fu, Xin Li, Licheng Wen, Min Dou, Pinlong Cai, Botian Shi, Yu Qiao
- Publisher: Shanghai AI Lab, East China Normal University
- Task: Planning
- Code: official
- Env: HighwayEnv
- Publish Date: 2023.07.14
- Summary:
  - Identify three key abilities: Reasoning, Interpretation and Memorization(accumulate experience and self-reflection).
  - Utilize LLM in AD as decision-making to solve long-tail corner cases and increase interpretability.
  - Verify interpretability in closed-loop offline data.
Language-Guided Traffic Simulation via Scene-Level Diffusion
- Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, Baishakhi Ray
- Publisher: Columbia University, NVIDIA Research, Stanford University, Georgia Tech
- Task: Diffusion
- Publish Date: 2023.07.10
- Summary:
  - Present CTG++, a language-guided scene-level conditional diffusion model for realistic query-compliant traffic simulation.
  - Leverage an LLM for translating a user query into a differentiable loss function and propose a scene-level conditional diffusion model (with a spatial-temporal transformer architecture) to translate the loss function into realistic, query compliant trajectories.
ADAPT: Action-aware Driving Caption Transformer
- Bu Jin, Xinyu Liu, Yupeng Zheng, Pengfei Li, Hao Zhao, Tong Zhang, Yuhang Zheng, Guyue Zhou, Jingjing Liu ICRA 2023
- Publisher: Chinese Academy of Sciences, Tsinghua University, Peking University, Xidian University, Southern University of Science and Technology, Beihang University
- Code: ADAPT
- Datasets: BDD-X dataset
- Summary:
  - Propose ADAPT, a new end-to-end transformerbased action narration and reasoning framework for self-driving vehicles.
  - propose a multi-task joint training framework that aligns both the driving action captioning task and the control signal prediction task.

WorkShop

Toggle

Large Language and Vision Models for Autonomous Driving(LLVM-AD) Workshop @ WACV 2024
- Publisher: Tencent Maps HD Map T.Lab, University of Illinois Urbana- Champaign, Purdue University, University of Virginia
- Challenge 1: MAPLM: A Large-Scale Vision-Language Dataset for Map and Traffic Scene Understanding
  - Datasets: Download
  - Task: QA
  - Code: https://github.com/LLVM-AD/MAPLM
  - Description: MAPLM combines point cloud BEV (Bird's Eye View) and panoramic images to provide a rich collection of road scenario images. It includes multi-level scene description data, which helps models navigate through complex and diverse traffic environments.
  - Metric:
    - Frame-overall-accuracy (FRM): A frame is considered correct if all closed-choice questions about it are answered correctly.
    - Question-overall-accuracy (QNS): A question is considered correct if its answer is correct.
    - LAN: How many lanes in current road?
    - INT: Is there any road cross, intersection or lane change zone in the main road?
    - QLT: What is the point cloud data quality in current road area of this image?
    - SCN: What kind of road scene is it in the images? (SCN)
- Challenge 2: In-Cabin User Command Understanding (UCU)
  - Datasets: Download
  - Task: QA
  - Code: https://github.com/LLVM-AD/ucu-dataset
  - Description:
    - This dataset focuses on understanding user commands in the context of autonomous vehicles. It contains 1,099 labeled commands. Each command is a sentence that describes a user’s request to the vehicle.
  - Metric:
    - Command-level accuracy: A command is considered correctly understood if all eight answers are correct.
    - Question-level accuracy: Evaluation at the individual question level.

Datasets

Toggle

format:
- [title](dataset link) [links]
  - author1, author2, and author3...
  - keyword
  - experiment environments or tasks

Rank2Tell: A Multimodal Driving Dataset for Joint Importance Ranking and Reasoning
- Enna Sachdeva, Nakul Agarwal, Suhas Chundi, Sean Roelofs, Jiachen Li, Behzad Dariush, Chiho Choi, Mykel Kochenderfer
- Publisher: Honda Research Institute, Stanford University
- Publish Date: 2023.09.10
- Summary:
  - A multi-modal ego-centric dataset for Ranking the importance level and Telling the reason for the importance.
  - Introduce a joint model for joint importance level ranking and natural language captions generation to benchmark our dataset.
DriveLM: Drive on Language
- Publisher: DriveLM Contributors
- Dataset: DriveLM
- Publish Date: 2023.08
- Summary:
  - Construct dataset based on the nuScenes dataset.
  - Perception questions require the model to recognize objects in the scene.
  - Prediction questions ask the model to predict the future status of important objects in the scene.
  - Planning questions prompt the model to give reasonable planning actions and avoid dangerous ones.
WEDGE: A multi-weather autonomous driving dataset built from generative vision-language models
- Aboli Marathe, Deva Ramanan, Rahee Walambe, Ketan Kotecha. CVPR 2023
- Publisher: Carnegie Mellon University, Symbiosis International University
- Dataset: WEDGE
- Publish Date: 2023.05.12
- Summary:
  - A multi-weather autonomous driving dataset built from generative vision-language models.
NuScenes-QA: A Multi-modal Visual Question Answering Benchmark for Autonomous Driving Scenario
- Tianwen Qian, Jingjing Chen, Linhai Zhuo, Yang Jiao, Yu-Gang Jiang
- Publisher: Fudan University
- Dataset: NuScenes-QA
- Summary:
  - NuScenes-QA provides 459,941 question-answer pairs based on the 34,149 visual scenes, with 376,604 questions from 28,130 scenes used for training, and 83,337 questions from 6,019 scenes used for testing, respectively.
  - The multi-view images and point clouds are first processed by the feature extraction backbone to obtain BEV features.
DRAMA: Joint Risk Localization and Captioning in Driving
- Srikanth Malla, Chiho Choi, Isht Dwivedi, Joon Hee Choi, Jiachen Li
- Publisher:
- Datasets: DRAMA
- Summary:
  - Introduce a novel dataset DRAMA that provides linguistic descriptions (with the focus on reasons) of driving risks associated with important objects and that can be used to evaluate a range of visual captioning capabilities in driving scenarios.
Language Prompt for Autonomous Driving
- Datasets: Nuprompt(Not open)
- Previous summary
Driving with LLMs: Fusing Object-Level Vector Modality for Explainable Autonomous Driving
- Datasets: official, data collection using RL experts in simulator.
- Previous summary
Textual Explanations for Self-Driving Vehicles
- Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, Zeynep Akata ECCV 2018.
- Publisher: University of California, Berkeley, Saarland Informatics Campus, University of Amsterdam
- BDD-X dataset
Grounding Human-To-Vehicle Advice for Self-Driving Vehicles
- Jinkyu Kim, Teruhisa Misu, Yi-Ting Chen, Ashish Tawari, John Canny CVPR 2019
- Publisher: UC Berkeley, Honda Research Institute USA, Inc.
- HAD dataset

License

Awesome LLM for Autonomous Driving Resources is released under the Apache 2.0 license.

jabbala / awesome-llm4ad Goto Github PK

awesome-llm4ad's Introduction

Awesome-LLM-for-Autonomous-Driving-Resources

Citation

Table of Contents

Overview of LLM4AD

Motivation of LLM4AD

ICLR 2024 Under Review

Papers

WorkShop

Datasets

License

awesome-llm4ad's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent