Giter Site home page Giter Site logo

flowcam's Introduction

FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow

Cameron Smith, Yilun Du, Ayush Tewari, Vincent Sitzmann

MIT

This is the official implementation of the paper "FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow".

High-Level structure

The code is organized as follows:

  • models.py contains the model definition
  • run.py contains a generic argument parser which creates the model and dataloaders for both training and evaluation
  • train.py and eval.py contains train and evaluation loops
  • mlp_modules.py and conv_modules.py contain common MLP and CNN blocks
  • vis_scripts.py contains plotting and wandb logging code
  • renderer.py implements volume rendering helper functions
  • geometry.py implements various geometric operations (projections, 3D lifting, rigid transforms, etc.)
  • data contains a list of dataset scripts
  • demo.py contains a script to run our model on any image directory for pose estimates. See the file header for an example on running it.

Reproducing experiments

See python run.py --help for a list of command line arguments. An example training command for CO3D-Hydrants is python train.py --dataset hydrant --vid_len 8 --batch_size 2 --online --name hydrants_flowcam --n_skip 1 2. Similarly, replace --dataset hydrants with any of [realestate,kitti,10cat] for training on RealEstate10K, KITTI, or CO3D-10Category.

Example training commands for each dataset are listed below:
python train.py --dataset hydrant --vid_len 8 --batch_size 2 --online --name hydrant_flowcam --n_skip 1 2
python train.py --dataset 10cat --vid_len 8 --batch_size 2 --online --name 10cat_flowcam --n_skip 1
python train.py --dataset realestate --vid_len 8 --batch_size 2 --online --name realestate_flowcam --n_skip 9
python train.py --dataset kitti --vid_len 8 --batch_size 2 --online --name kitti_flowcam --n_skip 0

Use the --online flag for summaries to be logged to your wandb account or omit it otherwise.

Environment variables

We use environment variables to set the dataset and logging paths, though you can easily hardcode the paths in each respective dataset script. Specifically, we use the environment variables CO3D_ROOT, RE10K_IMG_ROOT, RE10K_POSE_ROOT, KITTI_ROOT, and LOGDIR. For instance, you can add the line export CO3D_ROOT="/nobackup/projects/public/facebook-co3dv2" to your .bashrc.

Data

The KITTI dataset we use can be downloaded here: https://www.cvlibs.net/datasets/kitti/raw_data.php

Instructions for downloading the RealEstate10K dataset can be found here: https://github.com/yilundu/cross_attention_renderer/blob/master/data_download/README.md

We use the V2 version of the CO3D dataset, which can be downloaded here: https://github.com/facebookresearch/co3d

Using FlowCam to estimate poses for your own scenes

You can query FlowCam for any set of images using the script in demo.py and specifying the rgb_path, intrinsics (fx,fy,cx,cy), the pretrained checkpoint, whether to render out the reconstructed images or not (slower but illustrates how accurate the geometry is estimated by the model), and the image resolution to resize to in preprocessing (should be around 128 width to avoid memory issues).
For example: python demo.py --demo_rgb /nobackup/projects/public/facebook-co3dv2/hydrant/615_99120_197713/images --intrinsics 1.7671e+03,3.1427e+03,5.3550e+02,9.5150e+02 -c pretrained_models/co3d_hydrant.pt --render_imgs --low_res 144 128. The script will write the poses, a rendered pose plot, and re-rendered rgb and depth (if requested) to the folder demo_output.
The RealEstate10K pretrained (pretrained_models/re10k.pt) model probably has the most general prior to use for your own scenes. We are planning on training and releasing a model on all datasets for a more general prior, so stay tuned for that.

Coordinate and camera parameter conventions

This code uses an "OpenCV" style camera coordinate system, where the Y-axis points downwards (the up-vector points in the negative Y-direction), the X-axis points right, and the Z-axis points into the image plane.

Citation

If you find our work useful in your research, please cite:

@misc{smith2023flowcam,
      title={FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses via Pixel-Aligned Scene Flow}, 
      author={Cameron Smith and Yilun Du and Ayush Tewari and Vincent Sitzmann},
      year={2023},
      eprint={2306.00180},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Contact

If you have any questions, please email Cameron Smith at [email protected] or open an issue.

flowcam's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.