Giter Site home page Giter Site logo

nips16_ptn's Introduction

Perspective Transformer Nets (PTN)

This is the code for NIPS 2016 paper Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision by Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo and Honglak Lee

Please follow the instructions to run the code.

Requirements

PTN requires or works with

  • Mac OS X or Linux
  • NVIDIA GPU

Installing Dependency

The following command installs the Perspective Transformer Layer:

./install_ptnbhwd.sh

Dataset Downloading

  • Please run the command to download the pre-processed dataset (including rendered 2D views and 3D volumes):
./prepare_data.sh

Pre-trained Models Downloading (single-class experiment)

PTN-Proj: ptn_proj.t7

PTN-Comb: ptn_comb.t7

CNN-Vol: cnn_vol.t7

  • The following command downloads the pre-trained models:
./download_models.sh

Testing using Pre-trained Models (single-class experiment)

  • The following command evaluates the pre-trained models:
./eval_models.sh

Training (single-class experiment)

  • If you want to pre-train the view-point indepedent image encoder on single-class, please run the following command. Note that the pre-training could take a few days on a single TITAN X GPU.
./demo_pretrain_singleclass.sh
  • If you want to train PTN-Proj (unsupervised) on single-class based on pre-trained encoder, please run the command.
./demo_train_ptn_proj_singleclass.sh
  • If you want to train PTN-Comb (3D supervision) on single-class based on pre-trained encoder, please run the command.
./demo_train_ptn_comb_singleclass.sh
  • If you want to train CNN-Vol (3D supervision) on single-class based on pre-trained encoder, please run the command.
./demo_train_cnn_vol_singleclass.sh

Using your own camera

  • In many cases, you want to implement your own camera matrix (e.g., intrinsic or extrinsic). Please feel free to modify this function.

  • Before start your own implementation, we recommand to go through some basic camera geometry in this computer vision textbook written by Richard Szeliski (see Eq 2.59 at Page 53).

  • Note that in our voxel ray-tracing implementation, we used the inverse camera matrix.

Third-party Implementation

Besides our torch implementation, we recommend to see also the following third-party re-implementation:

  • TensorFlow Implementation: This re-implementation was developed during Xinchen's Google internship; If you find a bug, please file a bug including @xcyan.

Citation

If you find this useful, please cite our work as follows:

@incollection{NIPS2016_6206,
title = {Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision},
author = {Yan, Xinchen and Yang, Jimei and Yumer, Ersin and Guo, Yijie and Lee, Honglak},
booktitle = {Advances in Neural Information Processing Systems 29},
editor = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
pages = {1696--1704},
year = {2016},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/6206-perspective-transformer-nets-learning-single-view-3d-object-reconstruction-without-3d-supervision.pdf}
}

nips16_ptn's People

Contributors

xcyan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nips16_ptn's Issues

What is "disf" in "PerspectiveGridGenerator.lua"

Hi, I'm trying to replicate your projection work with PyTorch. But I'm confused about the definition of "focal length" and "disf" in your code. That is:


focal_length = math.sqrt(3)/2 ?
dmin = 1/(focal_length + math.sqrt(3))
dmax = 1/(focal_length)
for k=1,depth do
disf = dmin + (k-1)/(depth-1) * (dmax-dmin)
baseGrid[k][i][j][1] = 1/disf


Please forgive my offense, I have followed your advice and read the appropriate books. I still don't understand how focal length is defined. And in the paper your point that "the minimum and maximum disparity in the camera frame are denoted as dmin and dmax", So who is a disparty, disf or 1/disf? Your help means a lot to me.

Data corrupted

Hi Doc. @xcyan, I encountered the following problem during running "eval_models.sh".
torch/install/share/lua/5.1/torch/File.lua:351: read error: read 39866875 blocks instead of 46656000 at torch/pkg/torch/lib/TH/THDiskFile.c:356
I thought it is because of the data corruption. Could you please share the checksum of the pretrained models for the data validation? Thank you.

Results don't match paper

Hi,
I downloaded the pretrained models and after running run ./eval_models.sh and got these numbers
CNN-VOL IOU = 0.500177 PTN-COMB IOU = 0.509016 PTN-PROJ IOU = 0.503761
they are not far off, but still don't correspond to the paper numbers, any idea why I am getting different results?

Only half of the training data is used

In https://github.com/xcyan/nips16_PTN/blob/master/scripts/train_rotatorRNN_base.lua#L274 in the expression math.min(data:size() * opt.nview / 2 , opt.ntrain), I don't understand where the division by two is coming. data:size() is the number of trainings scenes you have, 4744 in this case, and you have 24 views per scene, so multiplication makes sense, but why divide it by two. Maybe I missed something, but it seems that only half of the views/data is used?

Results of trained models don't match paper

Hello,
after training the encoder(CNN-Vol) and the perspective transformer (PTN-Proj) I test the final model by changing the lines:
`base_loader = torch.load(opt.checkpoint_dir .. 'arch_PTN_singleclass_nv24_adam1_bs6_nz512_wd0.001_lbg(0,1)_ks24_vs32/net-epoch-100.t7')
encoder = base_loader.encoder
base_voxel_dec = base_loader.voxel_dec

unsup_loader = torch.load(opt.checkpoint_dir .. 'arch_PTN_singleclass_nv24_adam1_bs6_nz512_wd0.001_lbg(1,0)_ks24_vs32/net-epoch-100.t7')
unsup_voxel_dec = unsup_loader.voxel_dec

sup_loader = torch.load(opt.checkpoint_dir .. 'ptn_comb.t7')
sup_voxel_dec = sup_loader.voxel_dec`

The results on the testset are:
cat [chair]: CNN-VOL IOU = 0.459553 PTN-COMB IOU = 0.162989 PTN-PROJ IOU = 0.472389

which are 4 to 5 points lower than reported in the paper. What could be the reasons for it? Also I noticed the pretrained ptn-comb model has some problems (0.16) when evaluated with my encoder (instead of the pretrained encoder). What is the reason for this?

TensorFlow implement

I am sorry to bother you . Recently i want to reproduct the PTN with tensorflow implementation . I want to use my own data , but having problems in making tfrecords format data. Could you please show me the codes to make tfrecords format data with the following features(float representations) : image , mask, vox . Looking forward to your reply , thanks.

Out of memory

Hi,
I was just evaluating the pretrained model and encountered the following issue regarding the memory space of the gpu,
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1355/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/sbasavaraju/torch/install/bin/luajit: ...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-1355/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function <...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:245>
[C]: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/sbasavaraju/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/sbasavaraju/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
scripts/eval_quant_test.lua:63: in main chunk
[C]: in function 'dofile'
...raju/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

I have Ubuntu 14,04 with graphics model NVIDIA GeForce GTX 470 1 GB , is the graphics not sufficient to run the program ?

Thanks

About focal_length and the translate matrix

Thank you for what you have done! I'm confused about the translate matrix .here are the specific question:
(1)Is the original focal length=1(i.e. if elevation=0deg) ?
(2)Why the translate matrix set to be as follows at first?
1

could you please explain it?
I'll appreciate for you reply!

Can't test the trained encoder

Hi,
I trained the single class encoder with ./demo_pretrain_singleclass.sh. Now I wanted to evaluate the trained models. So in eval_quant_test.lua I just changed the name of the loaded file (cnn_vol.t7) to the last trained model:
base_loader = torch.load(opt.checkpoint_dir .. 'arch_rotatorRNN_singleclass_nv24_adam2_bs8_nz512_wd0.001_lbg10_ks16/net-epoch-20.t7')
encoder = base_loader.encoder
base_voxel_dec = base_loader.voxel_dec

When I run the testcript eval_models.sh I get following error:
/home/meeso/torch/install/bin/luajit: scripts/eval_quant_test.lua:90: attempt to index global 'base_voxel_dec' (a nil value)
stack traceback:
scripts/eval_quant_test.lua:90: in main chunk
[C]: in function 'dofile'
...eeso/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Any idea how I fix this?

How to get the RGB images?

Very great work!
I want to ask, is your RGB images rendered by yourself? Is there color information?
If I want to get a dataset which includes RGB images and point clouds of each single object, is there such a data set?
Looking forward to your reply!
Thank you!

typo?

There might be a typo in scripts/eval_quant_test.lua:
require 'stn' ------> require 'ptn'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.