Giter Site home page Giter Site logo

laomeng0703 / mvsformer Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ewrfcas/mvsformer

0.0 0.0 0.0 150 KB

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)

License: Apache License 2.0

Python 100.00%

mvsformer's Introduction

MVSFormer

Codes of MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth (TMLR2023)

arxiv paper

  • Releasing codes of training and testing
  • Adding dynamic pointcloud fusion for T&T
  • Releasing pre-trained models

Installation

git clone https://github.com/ewrfcas/MVSFormer.git
cd MVSFormer
pip install -r requirements.txt

We also highly recommend to install fusibile from (https://github.com/YoYo000/fusibile) for the depth fusion.

git clone https://github.com/YoYo000/fusibile.git
cd fusibile
cmake .
make

Tips: You should revise CUDA_NVCC_FLAGS in CMakeLists.txt according the gpu device you used. We set -gencode arch=compute_70,code=sm_70 instead of -gencode arch=compute_60,code=sm_60 with V100 GPUs. For other GPU types, you can follow

# 1080Ti
-gencode arch=compute_60,code=sm_60

# 2080Ti
-gencode arch=compute_75,code=sm_75

# 3090Ti
-gencode arch=compute_86,code=sm_86

# V100
-gencode arch=compute_70,code=sm_70

Datasets

DTU

  1. Download preprocessed poses from DTU training data, and depth from Depths_raw.
  2. We also need original rectified images from the official website.
  3. DTU testing set can be downloaded from MVSNet.
dtu_training
 ├── Cameras
 ├── Depths
 ├── Depths_raw
 └── DTU_origin/Rectified (downloaded from the official website with origin image size)

BlendedMVS

Download high-resolution images from BlendedMVS

BlendedMVS_raw
 ├── 57f8d9bbe73f6760f10e916a
 .   └── 57f8d9bbe73f6760f10e916a
 .       └── 57f8d9bbe73f6760f10e916a
 .           ├── blended_images
             ├── cams
             └── rendered_depth_maps

Tank-and-Temples (T&T)

Download preprocessed T&T pre-processed by MVSNet. Note that users should use the short depth range of cameras, run the evaluation script to produce the point clouds. Remember to replace the cameras by those in short_range_caemeras_for_mvsnet.zip in the intermediate folder, which is available at short_range_caemeras_for_mvsnet.zip

tankandtemples
 ├── advanced
 │  ├── Auditorium
 │  ├── Ballroom
 │  ├── ...
 │  └── Temple
 └── intermediate
        ├── Family
        ├── Francis
        ├── ...
        ├── Train
        └── short_range_cameras

Training

Pretrained weights

DINO-small (https://github.com/facebookresearch/dino): Weight Link

Twins-small (https://github.com/Meituan-AutoML/Twins): Weight Link

Training MVSFormer (Twins-based) on DTU with 2 32GB V100 GPUs cost 2 days. We set the max epoch=15 in DTU, but it could achieve the best one in epoch=10 in our implementation. You are free to adjust the max epoch, but the learning rate decay may be influenced.

CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer.json \
                                         --exp_name MVSFormer \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP

MVSFormer-P (frozen DINO-based).

                                         
CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer-p.json \
                                         --exp_name MVSFormer-p \
                                         --data_path ${YOUR_DTU_PATH} \
                                         --DDP

We should finetune our model based on BlendedMVS before the testing on T&T.

CUDA_VISIBLE_DEVICES=0,1 python train.py --config configs/config_mvsformer_blendmvs.json \
                                         --exp_name MVSFormer-blendedmvs \
                                         --data_path ${YOUR_BLENDEMVS_PATH} \
                                         --dtu_model_path ${YOUR_DTU_MODEL_PATH} \
                                         --DDP

Test

Pretrained models: OneDrive

For testing on DTU:

CUDA_VISIBLE_DEVICES=0 python test.py --dataset dtu --batch_size 1 \
                                       --testpath ${dtu_test_path} \
                                       --testlist ./lists/dtu/test.txt \
                                       --resume ${MODEL_WEIGHT_PATH} \
                                       --outdir ${OUTPUT_DIR} \
                                       --fusibile_exe_path ./fusibile/fusibile \
                                       --interval_scale 1.06 --num_view 5 \
                                       --numdepth 192 --max_h 1152 --max_w 1536 --filter_method gipuma \
                                       --disp_threshold 0.1 --num_consistent 2 \
                                       --prob_threshold 0.5,0.5,0.5,0.5 \
                                       --combine_conf --tmps 5.0,5.0,5.0,1.0

For testing on T&T, T&T uses dpcd, whose confidence is controled by conf rather than prob_threshold. Sorry for the confused parameter names, which is the black history of this project. Note that we recommend to use num_view=20 here, but you should build a new pair.txt with 20 views as MVSNet.

CUDA_VISIBLE_DEVICES=0 python test.py --dataset tt --batch_size 1 \
                                      --testpath ${tt_test_path}/intermediate(or advanced) \
                                      --testlist ./lists/tanksandtemples/intermediate.txt(or advanced.txt)
                                      --resume ${MODEL_WEIGHT_PATH} \
                                      --outdir ${OUTPUT_DIR} \ 
                                      --interval_scale 1.0 --num_view 10 --numdepth 256 \
                                      --max_h 1088 --max_w 1920 --filter_method dpcd \
                                      --prob_threshold 0.5,0.5,0.5,0.5 \
                                      --use_short_range --combine_conf --tmps 5.0,5.0,5.0,1.0

Cite

If you found our project helpful, please consider citing:

@article{caomvsformer,
  title={MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth},
  author={Cao, Chenjie and Ren, Xinlin and Fu, Yanwei},
  journal={Transactions of Machine Learning Research},
  year={2023}
}

Our codes are partially based on CDS-MVSNet, DINO, and Twins.

mvsformer's People

Contributors

ewrfcas avatar maybelx avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.