Giter Site home page Giter Site logo

openscene's Introduction

OpenScene

The Largest 3D Occupancy Prediction Benchmark in Autonomous Driving

OpenScene: v1.0 License: Apache2.0

Grad-and-Go

  • [2023/08/04] OpenScene v1.0 released

Table of Contents

Highlights

๐Ÿš˜ Representing 3D Scene as Occupancy

As we quote from OccNet:

Occupancy serves as a general representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving. 3D Occupancy is a geometry-aware representation of the scene.

Compared to the formulation of 3D bounding box and BEV segmentation, 3D occupancy could capture the fine-grained details of critical obstacles in the driving scene.

๐Ÿ”ฅ OpenScene: The Largest Benchmark for 3D Occupancy Prediction

Driving behavior on a sunny day does not apply to that in dancing snowflakes. For machine learning, data is the must-have food. To highlight, we build OpenScene on top of nuPlan, covering a wide span of over 120 hours of occupancy labels collected in various cities, from Boston, Pittsburgh, Las Vegas to Singapore. The stats of the dataset is summarized here.

Dataset Original Database Sensor Data (hr) Flow Semantic Category
MonoScene NYUv2 / SemanticKITTI 5 / 6 โŒ 10 / 19
Occ3D nuScenes / Waymo 5.5 / 5.7 โŒ 16 / 14
Occupancy-for-nuScenes nuScenes 5.5 โŒ 16
SurroundOcc nuScenes 5.5 โŒ 16
OpenOccupancy nuScenes 5.5 โŒ 16
SSCBench KITTI-360 / nuScenes / Waymo 1.8 / 4.7 / 5.6 โŒ 19 / 16 / 14
OccNet nuScenes 5.5 โŒ 16
OpenScene nuPlan ๐Ÿ’ฅ 120 โœ”๏ธ TODO
  • The time span of LiDAR frames accumulated for each occupancy annotation is 20 seconds.
  • Flow: the annotation of motion direction and velocity for each occupancy grid.
  • TODO: Full semantic labels of grids would be released in future version

๐Ÿ”ฅ OpenScene: Empowering DriveAGI in the era of Foundation Model

Which formulation is good for modeling the autonomous driving scenarios?

We posit that incorporating the motion information of occupancy flow can help bridge the gap between decision-making and scene representation. Besides, the OpenScene dataset provides a semantic label for each foreground grid, serving as a crucial initial step toward achieving DriveAGI.

(back to top)

Task and Evaluation Metric

Disclaimer: The following task (or title) is prone to change as we are shaping the 2024 edition of the Autonomous Driving Challenge.

Large-Scale Occupancy Prediction

Given massive images from multiple cameras in OpenScene, the goal is to predict the current occupancy state and semantics of each voxel grid in the scene. In this task, we use the intersection-over-union (mIoU) over all classes to evaluate model performance.

Here we provide a naive baseline for the Large-Scale Occupancy Prediction on OpenScene mini set, trained with 8 Tesla A100 GPUs.

Backbone mIoU IoU@Car Precision Recall Memory Time
ResNet-50 7.5 (not fully trained) 21.4 24.4 65.3 9260 43
VoVNet-99 14.4 (not fully trained) 35.9 46.7 76.1 14537 81
  • mIoU (%), IoU@Car (%), Precision (%), and Recall (%) are evaluated on 20% OpenScene mini set.
  • Memory (MB/GPU) and Time (hr) are recorded as the reference of resource consumption during training.

Foundation Model Challenge

In this task, given arbitrary data and architecture, we aim to have a unified backbone (aka, foundation model) to effectively address multifaceted downstream tasks. The OpenScene metric (OSM) is adopted to evaluate the effectiveness of such a foundation model in all aspects. In order to train the large model, you can use OpenScene or whatever means of solution at your discretion.

Downstream Task KITTI nuScenes Waymo Scene Diversity OSM
3D Detection โœ”๏ธ downtown crowded NDS
Semantic Segmentation โœ”๏ธ downtown crowded mIoU
Scene Completion โœ”๏ธ downtown crowded mIoU
Map Construction โœ”๏ธ downtown crowded mAP
Object Tracking โœ”๏ธ suburb nighttime rainy MOTA
Depth Estimation โœ”๏ธ countryside highway SILog
Visual Odometry โœ”๏ธ countryside highway Translation
Flow Estimation โœ”๏ธ countryside highway Fl-all
3D Lane Detection โœ”๏ธ suburb nighttime rainy F1-Score
  • We consolidate the above metrics to OSM by computing a weighted sum.
  • The listed datasets and tasks are tentative. Please refer to the AD24 challenge (TBA) for details.

(back to top)

Ecosystem and Leaderboard

Upcoming Challenge in 2024

We plan to release a trailer version of the upcoming challenge. Please stay tuned for more details in Late August.

CVPR 2023 3D Occupancy Prediction Challenge (Server Remains Active)

  • Please submit your great work as we would regularly maintain this leaderboard!
  • Challenge website: AD23Challenge

Leaderboard

(back to top)

TODO

  • OpenScene v1.0
  • Full-stack annotation update: background label and camera-view mask
  • Official Announcement for Autonomous Driving Challenge 2024

(back to top)

Getting Started

(back to top)

License and Citation

Our dataset is based on the nuPlan Dataset and therefore we distribute the data under Creative Commons Attribution-NonCommercial-ShareAlike license and nuPlan Dataset License Agreement for Non-Commercial Use. You are free to share and adapt the data, but have to give appropriate credit and may not use the work for commercial purposes. All code within this repository is under Apache License 2.0.

Please consider citing our paper if the project helps your research with the following BibTex:

@misc{openscene2023,
      title = {OpenScene: The Largest Up-to-Date 3D Occupancy Prediction Benchmark in Autonomous Driving},
      author = {OpenScene Contributors},
      howpublished={\url{https://github.com/OpenDriveLab/OpenScene}},
      year = {2023}
}

@article{sima2023_occnet,
      title={Scene as Occupancy}, 
      author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng  and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
      year={2023},
      eprint={2306.02851},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

(back to top)

Related Resources

Awesome

(back to top)

openscene's People

Contributors

faikit avatar zhouyunsong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.