- Medium Blog | Zhihu (in Chinese)
- CVPR 2023 Autonomous Driving Challenge - Occupancy Track
- Point of contact: [email protected]
[2023/08/04]
OpenScenev1.0
released
- Highlights
- Task and Evaluation Metric
- Ecosystem and Leaderboard
- TODO
- Getting Started
- License and Citation
- Related Resources
As we quote from OccNet:
Occupancy serves as a
general
representation of the scene and could facilitate perception and planning in the full-stack of autonomous driving. 3D Occupancy is a geometry-aware representation of the scene.
Compared to the formulation of 3D bounding box
and BEV segmentation
, 3D occupancy could capture the fine-grained details of critical obstacles in the driving scene.
Driving behavior on a sunny day does not apply to that in dancing snowflakes. For machine learning, data is the must-have
food.
To highlight, we build OpenScene on top of nuPlan, covering a wide span of over 120 hours of occupancy labels collected in various cities, from Boston
, Pittsburgh
, Las Vegas
to Singapore
.
The stats of the dataset is summarized here.
Dataset | Original Database | Sensor Data (hr) | Flow | Semantic Category |
---|---|---|---|---|
MonoScene | NYUv2 / SemanticKITTI | 5 / 6 | โ | 10 / 19 |
Occ3D | nuScenes / Waymo | 5.5 / 5.7 | โ | 16 / 14 |
Occupancy-for-nuScenes | nuScenes | 5.5 | โ | 16 |
SurroundOcc | nuScenes | 5.5 | โ | 16 |
OpenOccupancy | nuScenes | 5.5 | โ | 16 |
SSCBench | KITTI-360 / nuScenes / Waymo | 1.8 / 4.7 / 5.6 | โ | 19 / 16 / 14 |
OccNet | nuScenes | 5.5 | โ | 16 |
OpenScene | nuPlan | ๐ฅ 120 | โ๏ธ | TODO |
- The time span of LiDAR frames accumulated for each occupancy annotation is 20 seconds.
- Flow: the annotation of motion direction and velocity for each occupancy grid.
TODO
: Full semantic labels of grids would be released in future version
๐ฅ OpenScene: Empowering DriveAGI in the era of Foundation Model
Which formulation is good for modeling the autonomous driving scenarios?
We posit that incorporating the motion information of occupancy flow can help bridge the gap between decision-making
and scene representation
.
Besides, the OpenScene dataset provides a semantic label for each foreground grid, serving as a crucial initial step toward achieving DriveAGI.
Disclaimer: The following task (or title) is prone to change as we are shaping the 2024 edition of the Autonomous Driving Challenge.
Given massive images from multiple cameras in OpenScene, the goal is to predict the current occupancy state and semantics of each voxel grid in the scene. In this task, we use the intersection-over-union (mIoU) over all classes to evaluate model performance.
Here we provide a naive baseline for the Large-Scale Occupancy Prediction on OpenScene mini
set, trained with 8 Tesla A100 GPUs.
Backbone | mIoU | IoU@Car | Precision | Recall | Memory | Time |
---|---|---|---|---|---|---|
ResNet-50 | 7.5 (not fully trained) | 21.4 | 24.4 | 65.3 | 9260 | 43 |
VoVNet-99 | 14.4 (not fully trained) | 35.9 | 46.7 | 76.1 | 14537 | 81 |
mIoU
(%),IoU@Car
(%),Precision
(%), andRecall
(%) are evaluated on 20% OpenScenemini
set.Memory
(MB/GPU) andTime
(hr) are recorded as the reference of resource consumption during training.
In this task, given arbitrary data and architecture, we aim to have
a unified backbone (aka, foundation model
) to effectively address multifaceted downstream tasks.
The OpenScene metric (OSM) is adopted to evaluate the effectiveness of such a foundation model in all aspects.
In order to train the large model, you can use OpenScene
or whatever means of solution at your discretion.
Downstream Task | KITTI | nuScenes | Waymo | Scene Diversity | OSM |
---|---|---|---|---|---|
3D Detection | โ๏ธ | downtown crowded |
NDS | ||
Semantic Segmentation | โ๏ธ | downtown crowded |
mIoU | ||
Scene Completion | โ๏ธ | downtown crowded |
mIoU | ||
Map Construction | โ๏ธ | downtown crowded |
mAP | ||
Object Tracking | โ๏ธ | suburb nighttime rainy |
MOTA | ||
Depth Estimation | โ๏ธ | countryside highway |
SILog | ||
Visual Odometry | โ๏ธ | countryside highway |
Translation | ||
Flow Estimation | โ๏ธ | countryside highway |
Fl-all | ||
3D Lane Detection | โ๏ธ | suburb nighttime rainy |
F1-Score |
- We consolidate the above metrics to OSM by computing a weighted sum.
- The listed datasets and tasks are tentative. Please refer to the AD24 challenge (TBA) for details.
We plan to release a trailer version of the upcoming challenge. Please stay tuned for more details in Late August
.
- Challenge website: AD24Challenge
- Please submit your great work as we would
regularly
maintain this leaderboard! - Challenge website: AD23Challenge
- OpenScene
v1.0
- Full-stack annotation update: background label and camera-view mask
- Official Announcement for Autonomous Driving Challenge 2024
Our dataset is based on the nuPlan Dataset and therefore we distribute the data under Creative Commons Attribution-NonCommercial-ShareAlike license and nuPlan Dataset License Agreement for Non-Commercial Use. You are free to share and adapt the data, but have to give appropriate credit and may not use the work for commercial purposes. All code within this repository is under Apache License 2.0.
Please consider citing our paper if the project helps your research with the following BibTex:
@misc{openscene2023,
title = {OpenScene: The Largest Up-to-Date 3D Occupancy Prediction Benchmark in Autonomous Driving},
author = {OpenScene Contributors},
howpublished={\url{https://github.com/OpenDriveLab/OpenScene}},
year = {2023}
}
@article{sima2023_occnet,
title={Scene as Occupancy},
author={Chonghao Sima and Wenwen Tong and Tai Wang and Li Chen and Silei Wu and Hanming Deng and Yi Gu and Lewei Lu and Ping Luo and Dahua Lin and Hongyang Li},
year={2023},
eprint={2306.02851},
archivePrefix={arXiv},
primaryClass={cs.CV}
}