Giter Site home page Giter Site logo

diversedepth's Introduction

DiverseDepth Project

This project aims to improve the generalization ability of the monocular depth estimation method on diverse scenes. We propose a learning method and a diverse dataset, termed DiverseDepth, to solve this problem. The DiverseDepth contents have been published in our "Virtual Normal" TPAMI version.

This repository contains the source code of our paper (the DiverseDepth part):

  1. Wei Yin, Yfan Liu, Chunhua Shen, Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction.
  2. Wei Yin, Xinlong Wang, Chunhua Shen, Yifan Liu, Zhi Tian, Songcen Xu, Changming Sun, Dou Renyin. DiverseDepth: Affine-invariant Depth Prediction Using Diverse Data

Training codes have been released!!

Some Results

Any images online Point cloud

Some Dataset Examples

Dataset


Hightlights

  • Generalization: Our method demonstrates strong generalization ability on several zero-shot datasets. The predicted depth is affine-invairiant.

Installation

Datasets

We collect multi-source data to construct our DiverseDepth dataset. The It consists of three parts: Part-in (collected from taskonomy): contains over 100K images Part-out (collected from DIML, we have reprocessed its disparity): contains over 120K images Part-fore (collected from webstereo images and videos): contains 109703 images. We used the GNet method to recompute the disparity of DIML data instead of original provided disparity maps. We provide two ways to download data.

  1. Download from Cloudstor. You can download them with the following method.
sh download_data.sh
  1. Download from Google Drive. See here. for details.

Quick Start (Inference)

  1. Download the model weights

  2. Prepare data.

    • Move the downloaded weights to <project_dir>/
    • Put the testing RGB images to <project_dir>/Minist_Test/test_images/. Predicted depths and reconstructed point cloud are saved under <project_dir>/Minist_Test/test_images/outputs
  3. Test monocular depth prediction. Note that the predicted depths are affine-invariant.

export PYTHONPATH="<PATH to DiverseDepth>"
# run the ResNet-50
python ./Minist_Test/tools/test_depth.py --load_ckpt model.pth
 

Training

  1. Download the ResNeXt pretrained weight and put it under Train/datasets/resnext_pretrain
  2. Download the training data. Refer to 'download_data.sh'. All data are organized under the Train/datasets. The structure of all data are as follows.
|--Train
|--data
|--tools
|--scripts
|--datasets
|    |--DiverseDepth
|    |   |--annotations
|    |   |--depths
|    |   |--rgbs
|    |--taskonomy
|    |   |--annotations
|    |   |--depths
|    |   |--rgbs
|    |   |--ins_planes
|    |--DIML_GANet
|    |   |--annotations
|    |   |--depth
|    |   |--rgb
|    |   |--sky_mask
|    |--resnext_pretrain
|    |   |--resnext50_32x4d.pth
  1. Train the network. The default setting used 4 gpus. If you want to use more gpus, please set $CUDA_VISIBLE_DEVICES, such as export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7. The --batchsize is the number of samples on a single gpu.

    cd Train/scripts
    sh train.sh
    
  2. Test the network on a benchmark. We provide a sample code for testing on NYU. Please download the NYU testing data test.mat for evaluation. If you want to test on other benchmarks, you can follow the sample code.

    cd Train/scripts
    sh test.sh
    

Citation

@article{yin2021virtual,
  title={Virtual Normal: Enforcing Geometric Constraints for Accurate and Robust Depth Prediction},
  author={Yin, Wei and Liu, Yifan and Shen, Chunhua},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
  year={2021}
}

Contact

Wei Yin: [email protected]

diversedepth's People

Contributors

yvanyin avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.