Giter Site home page Giter Site logo

mdtest's Introduction

MagicDrive-t

MagicDrive video generation. We release this version mainly for reference. Please be prepared to solve any issue. Before getting start, it is necessary for users to setup and understand the code in main branch.

Environment Setup

The environment should be compatible with MagicDrive (single frame). However, this codebase rely on another version of bevfusion (in third_party) and some video related python packages.

The code is tested with Pytorch==1.10.2 and torchvision==0.11.3. You should have these packages before starting. To install additional packages, follow:

cd ${ROOT}
pip install -r requirements.txt

We opt to install the source code for the following packages, with cd ${FOLDER}; pip install -e .

# install third-party
third_party/
├── bevfusion -> based on db75150
├── diffusers -> based on v0.17.1 (afcca3916)
└── xformers -> (optional) we minorly change 0.0.19 to install with pytorch1.10.2

If you need our xformers, please find it here. Please read FAQ if you encounter any issues.

Pretrained Weights

Our training are based on stable-diffusion-v1-5

We assume you put them at ${ROOT}/../pretrained/ as follows:

{ROOT}/../pretrained/stable-diffusion-v1-5/
├── README.md
├── feature_extractor
├── model_index.json
├── safety_checker
├── scheduler
├── text_encoder
├── tokenizer
├── unet
├── v1-5-pruned-emaonly.ckpt
├── v1-5-pruned.ckpt
├── v1-inference.yaml
└── vae

Pretrained weight of MagicDrive (image generation)

{ROOT}/../MagicDrive-pretrained/
└── SDv1.5mv-rawbox_2023-09-07_18-39_224x400

Datasets

Please prepare the nuScenes dataset as bevfusion's instructions. Note:

  1. Run with our forked version of mmdet3d.
  2. It is better to run generation ONE-BY-ONE to avoid overwrite.
  3. You have to move nuscenes_dbinfos_train.pkl and nuscenes_gt_database manual from nuscenes root to ann_file folder like nuscenes_mmdet3d.

After preparation, you should have

${ROOT}/../data/
├── nuscenes
│   ├── ...
│   └── sweeps
└── nuscenes_mmdet3d

Generation ann_file for video frames (with keyframes / sweeps). We use them to train 7~16-frame video model.

# create `nuscenes_mmdet3d-t-keyframes`
python tools/create_data.py nuscenes \
	--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-keyframes/ \
	--extra-tag nuscenes --only_info

# create `nuscenes_mmdet3d-t-use-break`
USE_BREAK=True \
python tools/create_data.py nuscenes \
	--root-path ../data/nuscenes --out-dir ../data/nuscenes_mmdet3d-t-use-break/ \
	--extra-tag nuscenes --only_info --with_cam_sweeps

The data structure should looks like:

${ROOT}/../data/
├── ...
├── nuscenes_mmdet3d-t-use-break
│   ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
│   ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database/
│   ├── nuscenes_infos_train_t6.pkl
│   └── nuscenes_infos_val_t6.pkl
└── nuscenes_mmdet3d-t-keyframes
    ├── nuscenes_dbinfos_train.pkl -> ../nuscenes_mmdet3d/nuscenes_dbinfos_train.pkl
    ├── nuscenes_gt_database -> ../nuscenes_mmdet3d/nuscenes_gt_database
    ├── nuscenes_infos_train.pkl
    └── nuscenes_infos_val.pkl

Generation annotations for sweep frames and ann_file for MagicDrive. We will use them to train 16-frame video models, and video generation for all 13~16 frame models.

  1. Please follow ASAP to generate interp annotations for nuScenes. Simply, the following command should do the work:
    # in ASAP root.
    bash scripts/ann_generator.sh 12 --ann_strategy 'interp' 	
  2. (Optional) Generate advanced annotations for sweeps. (We do not observe major difference between interp and advanced. This step can be skipped.)
  3. Use commands in scripts/prepare_dataset.sh to generate ann_file and cache.

You should have

${ROOT}/../data/
├── ...
├── nuscenes
│	  ├── advanced_12Hz_trainval
│	  ├── interp_12Hz_trainval
│	  ├── nuscenes_advanced_12Hz_gt_database
│	  └── nuscenes_interp_12Hz_gt_database
└── nuscenes_mmdet3d-12Hz
	  ├── nuscenes_advanced_12Hz_dbinfos_train.pkl
	  ├── nuscenes_advanced_12Hz_infos_train.pkl
	  ├── nuscenes_advanced_12Hz_infos_val.pkl
	  ├── nuscenes_interp_12Hz_dbinfos_train.pkl
	  ├── nuscenes_interp_12Hz_infos_train.pkl
	  └── nuscenes_interp_12Hz_infos_val.pkl

(Optional but recommended) To accelerate data loading, we prepared cache files in h5 format for BEV maps. They can be generated through tools/prepare_map_aux.py with config in configs/exp/map_cache_gen.yaml. You have to rename the cache files correctly after generating them.

${ROOT}/../data/
├── ...
├── nuscenes_map_aux  # single frame cache, keyframes also use this.
│   ├── train_26x200x200_map_aux_full.h5
│   ├── train_26x400x400_map_aux_full.h5
│   ├── val_26x200x200_map_aux_full.h5
│   └── val_26x400x400_map_aux_full.h5
├── nuscenes_map_aux_12Hz_adv  # from advanced
│		├── train_26x200x200_12Hz_advanced.h5
│ 	└── val_26x200x200_12Hz_advanced.h5
├── nuscenes_map_aux_12Hz_int  # from interp
│		├── train_26x200x200_12Hz_interp.h5
│		└── val_26x200x200_12Hz_interp.h5
└── nuscenes_map_cache_t-use-break  # with sweep, use break
		├── train_8x200x200_map_use-break.h5
		└── val_8x200x200_map_use-break.h5

Train MagicDrive-t

Run training for 224x400 with 7 frames.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.3

Run training for 224x400 with 16 frames.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.3.4

Run training for 224x400 with 16 frames with sweeps and generated annotations.

scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.3
# or
scripts/dist_train.sh 8 runner=8gpus_t +exp=rawbox_mv2.0t_0.4.4

Typically, train ~80000 steps would be enough.

Video Generation

Our default log directory is ${ROOT}/magicdrive-t-log. Please be prepared.

Run video generation with 12Hz annotations.

python tools/test.py resume_from_checkpoint=${RUN_LOG_DIR} task_id=${ANY} \
	runner.validation_times=4 runner.pipeline_param.init_noise=rand_all \
	++dataset.data.val.ann_file=${ROOT}/../data/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val.pkl

Cite Us

@inproceedings{gao2023magicdrive,
  title={{MagicDrive}: Street View Generation with Diverse 3D Geometry Control},
  author={Gao, Ruiyuan and Chen, Kai and Xie, Enze and Hong, Lanqing and Li, Zhenguo and Yeung, Dit-Yan and Xu, Qiang},
  booktitle = {International Conference on Learning Representations},
  year={2024}
}

Credit

We adopt following open-sourced projects:

mdtest's People

Contributors

flymin avatar

Watchers

AetherZ25 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.