Giter Site home page Giter Site logo

vggsfm's Introduction

VGGSfM: Visual Geometry Grounded Deep Structure From Motion

Teaser

Meta AI Research, GenAI; University of Oxford, VGG

Jianyuan Wang, Nikita Karaev, Christian Rupprecht, David Novotny

[Paper] [Project Page] [Version 1.1]

Updates:

  • [Apr 23, 2024] Release the code and model weight for VGGSfM v1.1.

Installation

We provide a simple installation script that, by default, sets up a conda environment with Python 3.10, PyTorch 2.1, and CUDA 12.1.

source install.sh

Testing on IMC

1. Download Dataset and Model

To get started, you'll need to download the IMC dataset. You can do this by running the following commands in your terminal:

wget https://www.cs.ubc.ca/research/kmyi_data/imc2021-public/imc-2021-test-gt-phototourism.tar.gz

tar -xzvf imc-2021-test-gt-phototourism.tar.gz

Once the dataset is downloaded and extracted, you'll need to specify its path in the IMC_DIR field in the ./cfgs/test.yaml configuration file or give it as an input such as python test.py IMC_DIR=YOUR/PATH.

Next, you'll need to download the model checkpoint of v1.1 for testing or v1.2 for demo.

After downloading the model checkpoint, specify its path in the resume_ckpt field in ./cfgs/test.yaml.

2. Run Testing

python test.py

When it finishes (it would take several hours to complete the testing on the whole IMC dataset), you should see something like:

----------------------------------------------------------------------------------------------------
On the IMC dataset (query_frame_num=3)
Auc_3  (%): 64.74418604651163
Auc_5  (%): 72.20720930232558
Auc_10 (%): 80.98441860465115
----------------------------------------------------------------------------------------------------

If your machine support torch.bfloat16, you are welcome to enable the use_bf16 option in the configuration file or by python test.py use_bf16=True. Our model was trained using bf16 and the testing performance is nearly identical when using bf16.

Typically, running our model on a 25-frame IMC scene takes approximately 40 seconds. If you're looking to save time, you can adjust the query_frame_num to 1. This adjustment reduces the inference time to roughly 15 seconds, while maintaining a comparable performance.

----------------------------------------------------------------------------------------------------
On the IMC dataset (query_frame_num=1)
Auc_3  (%): 61.99207579672695
Auc_5  (%): 69.78997416020671
Auc_10 (%): 78.88826873385013
----------------------------------------------------------------------------------------------------

If want to run the model on your own data, please check the run_one_scene function in test.py. We are also going to provide a demo file for it very soon. The default output cameras of run_one_scene follows the PyTorch3D convention. You can set return_in_pt3d=False to let it return in COLMAP convention.

Acknowledgement

We are highly inspired by colmap, pycolmap, posediffusion, cotracker, and kornia.

License

See the LICENSE file for details about the license under which this code is made available.

Citing VGGSfM

If you find our repository useful, please consider giving it a star โญ and citing our paper in your work:

@article{wang2023vggsfm,
  title={VGGSfM: Visual Geometry Grounded Deep Structure From Motion},
  author={Wang, Jianyuan and Karaev, Nikita and Rupprecht, Christian and Novotny, David},
  journal={arXiv preprint arXiv:2312.04563},
  year={2023}
}

vggsfm's People

Contributors

jytime avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.