xcyan / nips16_ptn Goto Github PK

Torch Implementation of NIPS'16 paper: Perspective Transformer Nets

License: MIT License

Shell 1.98% Lua 98.02%

deep-learning nips-2016 3d-graphics torch7 shapenet 3d-models

nips16_ptn's Introduction

Perspective Transformer Nets (PTN)

This is the code for NIPS 2016 paper Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision by Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo and Honglak Lee

Please follow the instructions to run the code.

Requirements

PTN requires or works with

Mac OS X or Linux
NVIDIA GPU

Installing Dependency

Install Torch
Install Mattorch
Install Perspective Transformer Layer

The following command installs the Perspective Transformer Layer:

./install_ptnbhwd.sh

Dataset Downloading

Please run the command to download the pre-processed dataset (including rendered 2D views and 3D volumes):

./prepare_data.sh

Disclaimer: Please cite the ShapeNet paper as well.

Pre-trained Models Downloading (single-class experiment)

PTN-Proj: ptn_proj.t7

PTN-Comb: ptn_comb.t7

CNN-Vol: cnn_vol.t7

The following command downloads the pre-trained models:

./download_models.sh

Testing using Pre-trained Models (single-class experiment)

The following command evaluates the pre-trained models:

./eval_models.sh

Training (single-class experiment)

If you want to pre-train the view-point indepedent image encoder on single-class, please run the following command. Note that the pre-training could take a few days on a single TITAN X GPU.

./demo_pretrain_singleclass.sh

If you want to train PTN-Proj (unsupervised) on single-class based on pre-trained encoder, please run the command.

./demo_train_ptn_proj_singleclass.sh

If you want to train PTN-Comb (3D supervision) on single-class based on pre-trained encoder, please run the command.

./demo_train_ptn_comb_singleclass.sh

If you want to train CNN-Vol (3D supervision) on single-class based on pre-trained encoder, please run the command.

./demo_train_cnn_vol_singleclass.sh

Using your own camera

In many cases, you want to implement your own camera matrix (e.g., intrinsic or extrinsic). Please feel free to modify this function.
Before start your own implementation, we recommand to go through some basic camera geometry in this computer vision textbook written by Richard Szeliski (see Eq 2.59 at Page 53).
Note that in our voxel ray-tracing implementation, we used the inverse camera matrix.

Third-party Implementation

Besides our torch implementation, we recommend to see also the following third-party re-implementation:

TensorFlow Implementation: This re-implementation was developed during Xinchen's Google internship; If you find a bug, please file a bug including @xcyan.

Citation

If you find this useful, please cite our work as follows:

@incollection{NIPS2016_6206,
title = {Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D Supervision},
author = {Yan, Xinchen and Yang, Jimei and Yumer, Ersin and Guo, Yijie and Lee, Honglak},
booktitle = {Advances in Neural Information Processing Systems 29},
editor = {D. D. Lee and M. Sugiyama and U. V. Luxburg and I. Guyon and R. Garnett},
pages = {1696--1704},
year = {2016},
publisher = {Curran Associates, Inc.},
url = {http://papers.nips.cc/paper/6206-perspective-transformer-nets-learning-single-view-3d-object-reconstruction-without-3d-supervision.pdf}
}

nips16_ptn's People

Contributors

Stargazers

Watchers

nips16_ptn's Issues

What is "disf" in "PerspectiveGridGenerator.lua"

Hi, I'm trying to replicate your projection work with PyTorch. But I'm confused about the definition of "focal length" and "disf" in your code. That is:

focal_length = math.sqrt(3)/2 ?
dmin = 1/(focal_length + math.sqrt(3))
dmax = 1/(focal_length)
for k=1,depth do
disf = dmin + (k-1)/(depth-1) * (dmax-dmin)
baseGrid[k][i][j][1] = 1/disf

Please forgive my offense, I have followed your advice and read the appropriate books. I still don't understand how focal length is defined. And in the paper your point that "the minimum and maximum disparity in the camera frame are denoted as dmin and dmax", So who is a disparty, disf or 1/disf? Your help means a lot to me.

tensorflow implementation is broken

Hi, @xcyan,

The link of your tensorflow implementation is broken, could you fix it?

THX

Data corrupted

Hi Doc. @xcyan, I encountered the following problem during running "eval_models.sh".
torch/install/share/lua/5.1/torch/File.lua:351: read error: read 39866875 blocks instead of 46656000 at torch/pkg/torch/lib/TH/THDiskFile.c:356
I thought it is because of the data corruption. Could you please share the checksum of the pretrained models for the data validation? Thank you.

Results don't match paper

Hi,
I downloaded the pretrained models and after running run ./eval_models.sh and got these numbers
CNN-VOL IOU = 0.500177 PTN-COMB IOU = 0.509016 PTN-PROJ IOU = 0.503761
they are not far off, but still don't correspond to the paper numbers, any idea why I am getting different results?

Hello， I can't download your data, do you have other method for us to download your data?

Only half of the training data is used

In https://github.com/xcyan/nips16_PTN/blob/master/scripts/train_rotatorRNN_base.lua#L274 in the expression math.min(data:size() * opt.nview / 2 , opt.ntrain), I don't understand where the division by two is coming. data:size() is the number of trainings scenes you have, 4744 in this case, and you have 24 views per scene, so multiplication makes sense, but why divide it by two. Maybe I missed something, but it seems that only half of the views/data is used?

Results of trained models don't match paper

Hello,
after training the encoder(CNN-Vol) and the perspective transformer (PTN-Proj) I test the final model by changing the lines:
`base_loader = torch.load(opt.checkpoint_dir .. 'arch_PTN_singleclass_nv24_adam1_bs6_nz512_wd0.001_lbg(0,1)_ks24_vs32/net-epoch-100.t7')
encoder = base_loader.encoder
base_voxel_dec = base_loader.voxel_dec

unsup_loader = torch.load(opt.checkpoint_dir .. 'arch_PTN_singleclass_nv24_adam1_bs6_nz512_wd0.001_lbg(1,0)_ks24_vs32/net-epoch-100.t7')
unsup_voxel_dec = unsup_loader.voxel_dec

sup_loader = torch.load(opt.checkpoint_dir .. 'ptn_comb.t7')
sup_voxel_dec = sup_loader.voxel_dec`

The results on the testset are:
cat [chair]: CNN-VOL IOU = 0.459553 PTN-COMB IOU = 0.162989 PTN-PROJ IOU = 0.472389

which are 4 to 5 points lower than reported in the paper. What could be the reasons for it? Also I noticed the pretrained ptn-comb model has some problems (0.16) when evaluated with my encoder (instead of the pretrained encoder). What is the reason for this?

Hi, I can't download data ,model. Web pages cannot download data sets and models？？Thanks

TensorFlow implement

I am sorry to bother you . Recently i want to reproduct the PTN with tensorflow implementation . I want to use my own data , but having problems in making tfrecords format data. Could you please show me the codes to make tfrecords format data with the following features(float representations) : image , mask, vox . Looking forward to your reply , thanks.

Out of memory

Hi,
I was just evaluating the pretrained model and encountered the following issue regarding the memory space of the gpu,
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-1355/cutorch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
/home/sbasavaraju/torch/install/bin/luajit: ...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: cuda runtime error (2) : out of memory at /tmp/luarocks_cutorch-scm-1-1355/cutorch/lib/THC/generic/THCStorage.cu:66
stack traceback:
[C]: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function <...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:245>
[C]: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/sbasavaraju/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
/home/sbasavaraju/torch/install/share/lua/5.1/nn/Module.lua:192: in function 'read'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:351: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:369: in function 'readObject'
...e/sbasavaraju/torch/install/share/lua/5.1/torch/File.lua:409: in function 'load'
scripts/eval_quant_test.lua:63: in main chunk
[C]: in function 'dofile'
...raju/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

I have Ubuntu 14,04 with graphics model NVIDIA GeForce GTX 470 1 GB , is the graphics not sufficient to run the program ?

Thanks

About focal_length and the translate matrix

Thank you for what you have done! I'm confused about the translate matrix .here are the specific question:
(1)Is the original focal length=1(i.e. if elevation=0deg) ?
(2)Why the translate matrix set to be as follows at first?

could you please explain it?
I'll appreciate for you reply!

Can't test the trained encoder

Hi,
I trained the single class encoder with ./demo_pretrain_singleclass.sh. Now I wanted to evaluate the trained models. So in eval_quant_test.lua I just changed the name of the loaded file (cnn_vol.t7) to the last trained model:
base_loader = torch.load(opt.checkpoint_dir .. 'arch_rotatorRNN_singleclass_nv24_adam2_bs8_nz512_wd0.001_lbg10_ks16/net-epoch-20.t7')
encoder = base_loader.encoder
base_voxel_dec = base_loader.voxel_dec

When I run the testcript eval_models.sh I get following error:
/home/meeso/torch/install/bin/luajit: scripts/eval_quant_test.lua:90: attempt to index global 'base_voxel_dec' (a nil value)
stack traceback:
scripts/eval_quant_test.lua:90: in main chunk
[C]: in function 'dofile'
...eeso/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x00406670

Any idea how I fix this?

How to get the RGB images?

Very great work!
I want to ask, is your RGB images rendered by yourself? Is there color information?
If I want to get a dataset which includes RGB images and point clouds of each single object, is there such a data set?
Looking forward to your reply!
Thank you!

typo?

There might be a typo in scripts/eval_quant_test.lua:
require 'stn' ------> require 'ptn'