Giter Site home page Giter Site logo

occworld's Introduction

OccWorld

OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving

Wenzhao Zheng*, Weiliang Chen*, Yuanhui Huang, Borui Zhang, Yueqi Duan, Jiwen Lu

* Equal contribution

News

  • [2024/3/13] We release the code for visualization, our training log and our pretrianed model.
  • [2023/12/7] We update the code and config files for OccWorld.

OccWorld models the joint evolutions of 3D scenes and ego movements.

Combined with self-supervised (SelfOcc), LiDAR-collected (TPVFormer), or machine-annotated (SurroundOcc) 3D occupancy, OccWorld has the potential to scale up to large-scale training, paving the way for interpretable end-to-end large driving models.

Demo

demo

Overview

overview

Given past 3D occupancy observations, our self-supervised OccWorld trained can forecast future scene evolutions and ego movements jointly. This task requires a spatial understanding of the 3D scene and temporal modeling of how driving scenarios develop. We observe that OccWorld can successfully forecast the movements of surrounding agents and future map elements such as drivable areas. OccWorld even generates more reasonable drivable areas than the ground truth, demonstrating its ability to understand the scene rather than memorizing training data. Still, it fails to forecast new vehicles entering the sight, which is difficult given their absence in the inputs.

Installation

  1. Create conda environment with python version 3.8.0

  2. Install all the packages in environment.yaml

  3. Anything about the installation of mmdetection3d, please refer to mmdetection3d

Preparing

  1. Create soft link from data/nuscenes to your_nuscenes_path

  2. Prepare the gts semantic occupancy introduced in Occ3d

  3. Download our generated train/val pickle files and put them in data/

    nuscenes_infos_train_temporal_v3_scene.pkl

    nuscenes_infos_val_temporal_v3_scene.pkl

The dataset should be organized as follows:

OccWorld/data
    nuscenes                 -    downloaded from www.nuscenes.org
        lidarseg
        maps
        samples
        sweeps
        v1.0-trainval
        gts                  -    download from Occ3d
    nuscenes_infos_train_temporal_v3_scene.pkl
    nuscenes_infos_val_temporal_v3_scene.pkl
  1. We also provide our pretrained model in https://cloud.tsinghua.edu.cn/d/ff4612b2453841fba7a5/ .

Getting Started

Training

Train the VQVAE on RTX 4090 with 24G GPU memory.

python train.py --py-config config/train_vqvae.py --work-dir out/vqvae

Train the OccWorld on RTX 4090 with 24G GPU memory. (Remember to change the checkpoint path of VQVAE in the config file)

python train.py --py-config config/train_occworld.py --work-dir out/occworld

Evaluation

Eval the model on RTX 4090 with 24G GPU memory. (Remember to change the checkpoint path of OccWorld in the config file)

python eval_metric_stp3.py --py-config config/occworld.py --work-dir out/occworld

Visualization

Visualize the results use the following code.

python visualize_demo.py --py-config config/train_occworld.py --work-dir out/occworld

Also, you can specific the visulized scene index by adding --scene-idx i0 i1 i2 .... After running the above code, two folders will be created: "i_input" and "i." The "i" folder will contain the autoregressive predicted results for t=0.5, 1, 1.5, 2, 2.5, and 3. The "i_input" folder will contain visualized ground truth results for t=0, 0.5, 1, 1.5, 2, 2.5, and 3.

Related Projects

Our code is based on TPVFormer, SelfOcc, and PointOcc.

Also thanks to these excellent open-sourced repos: SurroundOcc OccFormer BEVFormer

Citation

If you find this project helpful, please consider citing the following paper:

@article{zheng2023occworld,
    title={OccWorld: Learning a 3D Occupancy World Model for Autonomous Driving},
    author={Zheng, Wenzhao and Chen, Weiliang and Huang, Yuanhui and Zhang, Borui and Duan, Yueqi and Lu, Jiwen },
    journal={arXiv preprint arXiv: 2311.16038},
    year={2023}
}

occworld's People

Contributors

chen-wl20 avatar wzzheng avatar gusongen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.