Simultaneous Visual Odometry, Object Detection, and Instance Segmentation. SimVODIS extracts both semantic and physical attributes from a sequence of image frames. SimVODIS evaluates the relative pose between frames, while detecting objects and segementing the object boundaries. During the process, depth can be optionally estimated.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
- Ubuntu 16.04+
- CUDA >= 9.0
- Python 3.6+
- Pytorch 1.0.0 from a nightly release
- MaskRCNN (included in this project)
- GCC >= 4.9
We tested the code in the following environments: 1) CUDA 9.0 on Ubuntu 16.04 and 2) CUDA 10.1 on Ubuntu 18.04. SimVODIS may work in other environments, but you might need to modify a part of the code. We recommend you using Anaconda for the environment setup.
conda create --name SimVODIS python=3.6.7
conda activate SimVODIS
conda install ipython
pip install ninja yacs cython matplotlib tqdm opencv-python
# conda install -c pytorch pytorch-nightly=1.0 torchvision=0.2.2 cudatoolkit=10.0
conda install -c pytorch pytorch-nightly=1.0 torchvision cudatoolkit=9.0
# install SimVODIS
git clone https://github.com/Uehwan/SimVODIS.git
cd SimVODIS
# the following will install the lib with symbolic links,
# so that you can modify the files if you want and won't need to re-build it
python setup.py build develop
pip install tensorboardX
conda install -c anaconda path.py scipy=1.2
Download the following pretrained Mask-RCNN model and place it under the root directory.
For more detailed information on the Mask-RCNN models, refer to the Facebook Mask-RCNN benchmark repo
For KITTI, first download the dataset using this script provided on the official website of KITTI. Placing the dataset on SSD would increase the training speed.
You can also download Malaga, ScanNet, NYU Depth, RGB-D SLAM, Make3D and 7 Scenes datasets.
- Malaga: Download from the official web
- ScanNet Request access from the official repository
- NYU Depth: Download from the official web
- RGB-D SLAM: Download from the official web
- Make3D: Download from the official web
- 7 Scenes: Download from the official web
The following trains the SimVODIS_k model described in the paper.
python train.py \
--data_path PATH/TO/DATASET \
--split eigen_zhou \
--model_name simvodis_k \
--log_dir PATH/TO/LOG/DIR \
To use other datasets for training, use the following.
python train.py \
--data_path PATH/TO/DATASET \
--split custom \
--model_name simvodis_a \
--dataset mixed \
--log_dir PATH/TO/LOG/DIR \
After starting the training script, you can check the training process with the following
tensorboard --logdir=PATH/TO/LOG/DIR
First, you need to export the ground-truth depth map. We follow the approach described in the Monodepth2 repository.
python export_gt_depth.py --data_path PATH/TO/KITTI/DATASET --splie eigen
python export_gt_depth.py --data_path PATH/TO/KITTI/DATASET --splie eigen_benchmark
The following evaluates the depth map prediction performance of trained models on the KITTI benchmark.
python evaluate_depth.py \
--data_path PATH/TO/DATASET \
--load_weights_folder PATH/TO/MODEL/WEIGHTS \
--post_process --save_pred_disp --eval_mono
The following evaluates the pose estimation performance of trained models on the KITTI benchmark.
python evaluate_pose.py \
--eval_split odom_9 \
--data_path PATH/TO/KITTI/ODOM/DATASET \
--load_weights_folder PATH/TO/MODEL/WEIGHTS
The following is the pretrained model.
This project is licensed under the MIT License - see the LICENSE.md file for details
Please consider citing this project in your publications if you find this helpful. The following is the BibTeX.
@article{kim2019simvodis,
title={SimVODIS: Simultaneous Visual Odometry, Object Detection, and Instance Segmentation},
author={Ue-Hwan Kim, Se-Ho Kim and Jong-Hwan Kim},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence, Under Review},
year={2019}
}
We base our project on the following repositories