Website: https://alexyu.net
Discord: @doriath0
PixelNeRF Official Repository
Home Page: https://alexyu.net/pixelnerf
License: BSD 2-Clause "Simplified" License
Website: https://alexyu.net
Discord: @doriath0
Hi, I can't get rid of this error no matter what I try. It seems to be possibly related somehow to the train.py having two "-1" in it. It pops up after about 10 or so minutes of training. I'm trying to do a custom data set that atm has 18 items with 98 frames in the "train" folder, and 3 items with 98 frames as well in both the val and test folders. I'm using the srn format, and have checked and rechecked that there are no indexing errors that I can find as far as having correct and corresponding numbers of items in all folders. Here is my error readout. I'm on Windows 10, 1 Nvidia 970. Thank you so much.
[ 2.16s/it] EXPERIMENT NAME: shoes2
CONTINUE? yes
index >= -sizes[i] && index < sizes[i] && "index out of bounds"
faindex >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.pixel-nerf/src/model/resnetfc.py
Line 55 in ddddb6b
It seems like you are doing relu -> conv -> relu -> conv + res for the residual block, which is different from the original one, but more like the identity mapping one.
Could you provide some insights as to why (does the original resblock not work well?) Also why no batchnorm?
Hello,
First, thanks for your fantastic work!
But I got some problems with large-scale scene rendering:
Looking forward to your reply! Thanks
Shengkun Tang
hello, as your paper your proposed approach can Learns scene prior knowledge which nerf can't, I am a question about the prior knowledge, what the prior knowledge means and how it learned from the data. thank you for answer.
Hello, Thanks for your source code! I've read your paper recently, and I have one question about this paper.
Hello everyone, thank you for releasing the code for this awesome research! I managed to run the evaluation code on the DTU dataset, and one thing was quite strange to me. When I ran, I got a PSNR of 18.99 and an SSIM of 0.678, which is slightly different from the reported in the paper (PSNR 19.33 and an SSIM of 0.695). Is this little difference due to randomness in the training of the NeRFs during the optimization? Thanks for the attention!
Hi Alex, thanks for sharing your impressive work!
I get strange video results though for ShapeNet Single-Category (SRN) like in issue #9 which isn't explicitly resolved there. Could you provide a hint what could cause the messed up video?
I run: python eval/gen_video.py -n srn_chair --gpu_id=0 --split test -P '64 104' -D <srn data dir>/chairs -S 1
Running the above command also throws the following message:
/home/f/pixel-nerf/src/model/models.py:291: UserWarning: WARNING: checkpoints/srn_chair/pixel_nerf_latest does not exist, not loaded!! Model will be re-initialized. If you are trying to load a pretrained model, STOP since it's not in the right place. If training, unless you are startin a new experiment, please remember to pass --resume.
Hi Alex,
First of all, congrats on the great work!
I was playing around with your code on DTU dataset a bit, and encountered one question, which might be a bit dumb:
In this line, you use cv2.decomposeProjectionMatrix(P)
to obtain the K
, R
and t
from the projection matrix P
. However, when I tried to compose these components back via K @ np.concatenate((R, t[:3]/t[3]), axis=1)
, I could not obtain P
back. After playing around with it a bit, I realize that the translation part is not supposed to be t[:3]/t[3]
, but -R@(t[:3]/t[3])
, and we can get the projection matrix back. However, in your implementation, you are using t[:3]/t[3]
for all the rest. I am wondering how could this not lead to issue, since the gen_rays
function takes in camera poses as the input.
Many thanks for your answers in advance!
Best,
Songyou
Hello, thank you for releasing this wonderful project! I have been trying to train the pixel-nerf from scratch using the DTU dataset. However, it does not seem to be conversing. Which parameters did you use during your training setup, and how many iterations did it take? I am running with only one GPU, and the only thing I change from the default setup, which you mentioned in the README, is that I am using a batch size of 2 instead of 4.
Hello,
Great work and thanks for releasing the code!
I'm a bit confused about the training process, in the paper it is mentioned as "A single model is trained for each object class with 50 random views per object instance, randomly sampling either one or two of the training views to encode", I understand this as you firstly use 50 views per object instance to train the general categorical representation, and for the actual training scene, you use only 1-2 views? I think I didn't quite get this process, some further elaboration will be very helpful.
Many thanks.
I have used pixel nerf to generate depth images of multiple views of an object. I am using the SRN dataset, so I have the poses of every view. Can I build a point cloud based on those generated depth images and poses? If yes, how?
Hi @sxyu,
I was really amazed by this work and decided to implement my own version drawing inspiration from pixel nerf & keeping nerf as the base. After a lot of iterations & experiments, I've reached a level where the model is starting to show some reasonable results on training data. But the results look only good from the input poses that the model is trained on and doesn't learn the 3d geometry.
I'm training a model from scratch using 6 objects, with 8 views each -> [0,45,90,135,180,225,270,315] degrees as azimuth & 0 as elevation. After training for 50k iterations, the model is able to predict reasonable views from the 8 input poses but it is absolutely noisy when rendered from azimuths which are not part of training data. And when I pass in a test image as input, the model simply produces a noisy combination of all the 6 input images i.e. it is not at all generalizing. Can you kindly help me resolve these consistency & generalisability issues?
Thanks!
PS- Earlier I was stuck where the model used to produce all black results. I resolved this by reducing the learning rate & by modifying the MSE so that it gives less weight to background (black) pixels in the ground truth image. I hope this doesn't cause any issues with training & generalizing.
Hi @sxyu
How are different dataset classified into types or format? one difference could be they are all different datasets. Are there any criteria for a dataset to be categorized to specific data_type or format?
I would like to use this code or custom dataset and could you help to set this type of dataset? Should I have to convert the custom data into one of the data_types(srn, multi_obj, dvr)?
Thanks in Advance!
Best regards
Anil
In models.py, uv = -xyz[:, :, :2] / xyz[:, :, 2:]
, which causes its origin to be in the left bottom corner of the image, since your R will convert the world coordinate system to the camera coordinate system in OPGL format.
Maybe the y-axis (v) needs to be flipped, i.e. uv = uv * torch.tensor([[[1, -1]]], device=uv.device)
.
Hi, I read the main manuscript and supplementary material, but can't find the training detail for DTU dataset.
How many epochs did you train on DTU dataset?
I trained quite a lot, but cannot reproduce the table 3 here.
ROW 50 in SRNDataset.py:
if is_chair:
self.z_near = 1.25
self.z_far = 2.75
else:
self.z_near = 0.8
self.z_far = 1.8
If I train PixelNerf on other datasets, how can I get the z_near and z_far?
uv = - xyz[:, :, :2] / xyz[:, :, 2:]
in models.py
Why is it multiplied by -1?
Hi, @sxyu, congrats on your great work in CVPR 2021!
When I am reading your code, I notice that in the SRNDataset and DVRDataset, there is a coordinate transform step.
Specifically, for DTU in DVRDataset, you do:
if sub_format == "dtu":
self._coord_trans_world = torch.tensor(
[[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]],
dtype=torch.float32,
)
self._coord_trans_cam = torch.tensor(
[[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]],
dtype=torch.float32,
)
pose = (
self._coord_trans_world
@ torch.tensor(pose, dtype=torch.float32)
@ self._coord_trans_cam
)
for NMR in DVRDataset, you do:
else:
self._coord_trans_world = torch.tensor(
[[1, 0, 0, 0], [0, 0, -1, 0], [0, 1, 0, 0], [0, 0, 0, 1]],
dtype=torch.float32,
)
self._coord_trans_cam = torch.tensor(
[[1, 0, 0, 0], [0, -1, 0, 0], [0, 0, -1, 0], [0, 0, 0, 1]],
dtype=torch.float32,
)
pose = (
self._coord_trans_world
@ torch.tensor(pose, dtype=torch.float32)
@ self._coord_trans_cam
)
However, for SRN dataset in SRNDataset, you do:
self._coord_trans = torch.diag(
torch.tensor([1, -1, -1, 1], dtype=torch.float32)
)
pose = pose @ self._coord_trans
First, I understand that your camera coordinate system is x right, y up, z out (to my knowledge, this is the OpenGL style coordinate system). Then for DVR, they use the OpenCV system, where x right, y down, z in. Also, I have checked SRN, they use the OpenCV style camera coordinate system as well.
Second, after seeing #2, I also understand the reason why there is a coordinate transform step. Because you want to utilize the Rt matrix where OpenCV coordinate system is used, to transform cam_unproj_map
to the world space, you have to first transform cam_unproj_map
from OpenGL style into OpenCV style by timing self._coord_trans_cam
, then you do old camera2world transform pose
, finally you time self._coord_trans_world
to transform back to OpenGL style system. This is something that I can understand. (Also refer to stackoverflow link)
However, here are some questions:
pose = pose @ self._coord_trans
in SRNDataset, while self._coord_trans_world @ torch.tensor(pose, dtype=torch.float32) @ self._coord_trans_cam
in DVRDataset?self._coord_trans_world
and self._coord_trans_cam
in DVRDataset?Best,
Xingyi Li
Hi,
May I ask you about the number of GPUs you used for training, and the training time needed for each dataset?
Best,
Hi, thanks for your impressive work.
However, when I read your paper, it mentions that your model only needs 'relative camera pose'. I can't understand what's the difference between 'absolute camera pose' and 'relative camera pose'.
I suppose is it about the world coordinate and the camera coordinate? If it is, I can't understand how do you get the correct position of sampled points in the scene, without the uasge of camara poses in world coordinates, since the rendered scene is under the world coordinate. And I read your code of gen_rays() in src/util/util.py, it seems there's no difference between your usage of camera pose and NeRF.
Could you please explain it to me?
Thank you for your perfect work. I want to apply the pixel-nerf on our scene which contains the extrinsic and intrinsic. Could you give me some clues how to replace the input by our multi-view data and calibration data and which pre-trained model would be suit for scenes which are different from chair and car.
Thank you very much.
Hi,
This might be a trivial question. But I'm trying to figure out how to generate what looks like a camera matrix in pose.txt
Say I have 2 images like below from internet. I know what the camera angles are θ and Φ (Spherical coordinate system). From there how do I generate the matrix in pose folder for each ?
when run with:
python eval/gen_video.py --name sn64_unseen -D /data/private/pixel_nerf_data/NMR_Dataset --gpu_id 0 --split test -P 2 -S 0
Hi, thanks for the great work.
While reading the code I find the normalize_z setting a bit confusing.
In
pixel-nerf/src/model/models.py
Line 171 in a5a5142
when running the real example
Traceback (most recent call last):
File "scripts/preproc.py", line 58, in
from detectron2 import model_zoo
File "/usr/local/lib/python3.6/dist-packages/detectron2/model_zoo/init.py", line 7, in
from .model_zoo import get, get_config_file, get_checkpoint_url
File "/usr/local/lib/python3.6/dist-packages/detectron2/model_zoo/model_zoo.py", line 6, in
from detectron2.checkpoint import DetectionCheckpointer
File "/usr/local/lib/python3.6/dist-packages/detectron2/checkpoint/init.py", line 7, in
from .detection_checkpoint import DetectionCheckpointer
File "/usr/local/lib/python3.6/dist-packages/detectron2/checkpoint/detection_checkpoint.py", line 5, in
import detectron2.utils.comm as comm
AttributeError: module 'detectron2' has no attribute 'utils'
Thank you for releasing the source code.
During reading the code, I wonder where the target view is, because there is no RGB loss between predicted target view and ground-truth target view.
Hi,
In my experiments, I found that sometimes the visualized images in ./visuals becomes black images during early epochs, either coarse or fine output (seems to be collapsed?). I am not sure this is due to my modification of code, or the model itself is sensitive to the initialization weights? I noticed that you saved the initial weight of model as 'backup'. Have you ever encountered this issue during your experiments ?
The following comment is unclear to me:
Line 228 in ddddb6b
From my understanding of the code, I don't see the TV regularizer from the neural volumes paper around this comment. Is the regularizer used in pixelNeRF? Where is it implemented? Thanks!
Hi,
I encountered the bad image error when running the full evaluation on SRN cars, i.e., 876d92ce6a0e4bf399588eee976baae
is composed of white images with no objects. I wonder how did you process it? Should I just skip it?
Thx
Nice work! and thanks for sharing the code.
Will you show any demo/ script to run the code for the scene shown in the paper later?
Thanks!
It takes about 8 minutes to render 20 frames and hours to render hundreds of frames at 192x144 on a 2080ti. Have you any idea how to accelerate the rendering?
Hello, I want to get the 3D models of the reconstructed objects in the output part of the network, such as. obj or .ply and other forms of files. How do I get them? Thanks.
when running the real example
Traceback (most recent call last):
File "scripts/preproc.py", line 58, in
from detectron2 import model_zoo
File "/usr/local/lib/python3.6/dist-packages/detectron2/model_zoo/init.py", line 7, in
from .model_zoo import get, get_config_file, get_checkpoint_url
File "/usr/local/lib/python3.6/dist-packages/detectron2/model_zoo/model_zoo.py", line 6, in
from detectron2.checkpoint import DetectionCheckpointer
File "/usr/local/lib/python3.6/dist-packages/detectron2/checkpoint/init.py", line 7, in
from .detection_checkpoint import DetectionCheckpointer
File "/usr/local/lib/python3.6/dist-packages/detectron2/checkpoint/detection_checkpoint.py", line 5, in
import detectron2.utils.comm as comm
AttributeError: module 'detectron2' has no attribute 'utils'
Hi guys, I have a quick question.
In the DTU dataloader, after the transformation step, the output pose is in camera or world space (cam-to-world or world-to-cam) ?
I am also a bit confused here as well. It seems like the sampled 3D points are transformed to the camera space of input views. The coordinates xyz of these points are then feed to the MLP. I thought the input 3D coordinates to the MLP should be points in the world space just like NERF paper. Can you clarify this ?
Hello,
I'm wondering how much GPU memory is needed? I'm using an RTX2080Ti with 11Gb memory but keep getting CUDA out of memory error even if I add "-R 100" flag(for DTU experiments).
Many thanks.
In the loader for dtu dataset there is a coordinate transform step.
Does it use the Blender coordinate system or OpenCV?
On running the gen_video.py file for the DTU dataset, the following error is there while loading the pre-trained model provided in the repository:
RuntimeError: Error(s) in loading state_dict for PixelNeRFNet:
Missing key(s) in state_dict: "poses", "image_shape", "focal", "c", "encoder.latent", "encoder.latent_scaling".
Hi,
I'm having issues running the shape-net single-category example code. I have downloaded the srn_cars.zip file from https://drive.google.com/drive/folders/1PsT3uKwqHHD2bEEHkIXB99AlIjtmrEiR as instructed. The file path looks like
├── pixel-nerf
│ ├── srn_cars
│ │ ├── cars_test
│ │ ├── cars_train
│ │ ├── cars_val
and the call is python eval/gen_video.py -n srn_cars --gpu_id=0 --split test -P '64 104' -D srn_cars/cars_test/ -S 1
.
Following the source code it seems you're trying to append these .lst
labels, but I can't seem to find any files in the dataset with this postfix.
In encoder.py, line 163,
self.latent_scaling = self.latent_scaling / (self.latent_scaling - 1) * 2.0
self.latent_scaling should bet a scale factor around 2, this makes no sense, because in line 98
scale = self.latent_scaling / image_size
,
self.latent_scaling
should be a absolute resolution number, I think they are contraditory.
On running
python train/train.py -n dtu_exp -c conf/exp/dtu.conf -D data/rs_dtu_4 -V 3 --gpu_id=0 --resume
on DTU dataset
I get the following error.
EXPERIMENT NAME: dtu_exp
CONTINUE? yes
- Config file: conf/exp/dtu.conf
- Dataset format: dvr_dtu
- Dataset location: data/rs_dtu_4
Loading DVR dataset data/rs_dtu_4 stage train 0 objs type: dtu
Loading DVR dataset data/rs_dtu_4 stage val 0 objs type: dtu
Loading DVR dataset data/rs_dtu_4 stage test 0 objs type: dtu
dset z_near 0.1, z_far 5.0, lindisp False
Using torchvision resnet34 encoder
train dir data <data.data_util.ColorJitterDataset object at 0x1471c5566e80>
Traceback (most recent call last):
File "train/train.py", line 344, in
trainer = PixelNeRFTrainer()
File "train/train.py", line 83, in init
super().init(net, dset, val_dset, args, conf["train"], device=device)
File "/ssd_scratch/cvit/avani.gupta/pixel-nerf/train/trainlib/trainer.py", line 17, in init
self.train_data_loader = torch.utils.data.DataLoader(
File "/home/avani.gupta/anaconda3/envs/pixelnerf/lib/python3.8/site-packages/torch/utils/data/dataloader.py", line 224, in init
sampler = RandomSampler(dataset, generator=generator)
File "/home/avani.gupta/anaconda3/envs/pixelnerf/lib/python3.8/site-packages/torch/utils/data/sampler.py", line 95, in init
raise ValueError("num_samples should be a positive integer "
ValueError: num_samples should be a positive integer value, but got num_samples=0
Can you clarify it the structure of rf_dtu_4 folder, it is rf_dtu_4/DTU/
has .lst and scan folders/
If this is correct, can you point out what is possibly causing this error?
I believe there may be a bug in the data loader for SRN dataset when world_scale
is not 1. Specifically, because focal
is given in pixels (for example as 131.25), it should not be scaled by world_scale
.
To fix this, just delete line 122 in src/data/SRNDataset.py
:
if self.world_scale != 1.0:
focal *= self.world_scale
all_poses[:, :3, 3] *= self.world_scale
would become
if self.world_scale != 1.0:
all_poses[:, :3, 3] *= self.world_scale
Thanks for your great work!
I am trying to test the pre-trained srn-chair model for custom data. I am working on the synthetic chair model from Nerf paper since camera intrinsics and rotation matrix for each image is available.
Approach
Downsampled images to 128x128, filled their transparent backgrounds with white.
Updated some of the parameters with respect to the usage in Nerf code, explicitly: elevation,z_near,z_far.
Visual results from gen_video.py
have three main problems:
It seems the problems may occur because simply the chair model doesn't belong to the dataset. Still, do you have any suggestions?
Hi authors,
Great work! And thanks for sharing the very well crafted codebase. I have a question about this images_0to1
in both loss calculation and visualization: images_0to1 = images * 0.5 + 0.5
. Could you explain what's the use of it?
Many thanks.
I was wondering if you could please provide some more information concerning the evaluation protocol for the two object scene evaluation results. I see that there's a file named viewlist/2obj_eval_views.txt with some indices in it, but it doesn't appear to be in the format used for the other evaluations.
For these tests, are you using 1-3 specific input view indices to render a specific target viewpoint? Or are you measuring the results when rendering all non-input views?
I also noticed that the paper says that there are 50 testing views, but the test data split appears to only have 20 views, as in the training data (Sec. B.2.4). Is the data that's provided the same as what's being used for these evaluations?
can you please add a license file thanks
Dear @sxyu ,
Maybe it is not directly related to pixelnerf but I am attracted by the figure and video.
What tools did you use for making this?
Thanks,
How to render a 3D surface?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.