chenhsuanlin / signed-distance-srn Goto Github PK

View Code? Open in Web Editor NEW

124.0 4.0 18.0 8.08 MB

SDF-SRN: Learning Signed Distance 3D Object Reconstruction from Static Images 🎯 (NeurIPS 2020)

License: MIT License

Python 99.06% Makefile 0.60% C++ 0.34%

signed-distance-srn's People

Contributors

Stargazers

Watchers

Forkers

wx-b zxhuang1698 albertotono pkurainbow toothless-cat qihangwang bruinxiong cv-ip stonehye akaganeite hwjiang1510 kimsoohwan klonggan lukemelas techthiyanes eviliclufas icodein innocentmchry

signed-distance-srn's Issues

Shapenet Multi-view data training does not work

Hi,
I use the following command to train shapenet multiview data (the single-view training works fine)
python3 train.py --model=sdf_srn --yaml=options/shapenet/sdf_srn.yaml --name=chair --data.shapenet.cat=chair --max_epoch=28
but the result seems not good, I trained several times, all experiments on multi-view data give me similar results. Chamfer distance does not change during the whole process, reconstructed image is similar to the ground truth, depth-map is also ok, but predicted normal-map is bad, .ply is empty,

how to test

I want to use a picture to test this network to directly generate output (such as .ply files). Do you have a test script?

Searching for /pointcloud3.npz when should be /pointcloud.npz

running the evaluation.py I got this FileNotFoundError: [Errno 2] No such file or directory: 'data/NMR_Dataset/03001627/f5643e3b42fe5144c9f41f411b2bb452/pointcloud3.npz

Pascal Evaluation

Hi @chenhsuanlin , thanks for the great work! I have few questions about your Pascal evaluation. I realize you mentioned in the paper that you applied ICP for Pascal pointcloud comparison but not shapenet. I'm wondering what's the specific reason for using ICP on Pascal? Also is the segmentation masks you are using the unoccluded version from mesh projection?
Thanks!

Shapenet camera

Hello, for the shapenet data used in this repo, the camera matrix is the projection matrix from the centered and normalized pointclouds instead of the real unnormalized .obj mesh in shapenet right? Thanks!

Using pre-trained Implicit network to render novel views?

Though I can render novel views using reconstructed mesh, but I was wondering if it's possible to render new views or zoom into the 3D model; using the implicit network alone? I tried changing the camera distance for zoom but that doesn't lead to any change in reconstructed RGB view.

Where to specify GPUs to use?

Hi! I really appreciate your work. However, I encounter an error "RuntimeError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 10.92 GiB total capacity; 833.88 MiB already allocated; 58.00 MiB free; 938.00 MiB reserved in total by PyTorch)", and I check out that the GPU0 has been used by others. I want to use another GPU but I find that it is not trivial to find where to specify GPUs. So may I know where to specify GPUs?

Rendering script for shapenet

Hello, is that possible for you to share the rendering script of shapenet inputs and the generation script for pointcloud? Thank you.

ShapeNet dataset questions

Hi @chenhsuanlin , thanks for releasing this great work! I'm reading your code and the camera rotation augmentation doesn't quite make sense to me here. Why is the rotation over X axis instead of Z axis?
Also another quick question about data: I remember there are 100000 points for evaluation in like DVR. It seems that you created a pointcloud3.npz containing 30000 points by excluding interior. But you still sample 100000 on the predicted mesh. Can you briefly elaborate on why you use a different number of samples (yours vs DVR & your gt vs your pred)?

Thank you!

Input image resolution

Thanks for sharing the code, it's a great work!
I was wondering whether a higher resolution of the input RGB image will improve the quality of output 3D model?

Further detail on ray_intersection_loss , and ray_freespace_loss

Great work with the paper, it has been a pleasure to read and test the code. While I was perusing the code I wanted to better understand the loss functions that you used.

Specifically these ( Reference 7 in the paper and ray_intersection_loss , and ray_freespace_loss)

Coping the code below for further reference

def ray_intersection_loss(self,opt,var,level_eps=0.01):
        batch_size = len(var.idx)
        level_in = var.level_all[...,-1:] # [B,HW,1]
        weight = 1/(var.dt_input+1e-8) if opt.impl.importance else None
        if opt.impl.occup:
            loss = self.BCE_loss(level_in,var.mask_input,weight=weight)
        else:
            loss = self.L1_loss((level_in+level_eps).relu_(),weight=weight,mask=var.mask_input.bool()) \
                  +self.L1_loss((-level_in+level_eps).relu_(),weight=weight,mask=~var.mask_input.bool())
        return loss

    def ray_freespace_loss(self,opt,var,level_eps=0.01):
        level_out = var.level_all[...,:-1] # [B,HW,N-1]
        if opt.impl.occup:
            loss = self.BCE_loss(level_out,torch.tensor(0.,device=opt.device))
        else:
            loss = self.L1_loss((-level_out+level_eps).relu_())
        return loss

I am not able to fully understand the comparison with the formula in the paper,

why do you add 1e-8? is it supposed to be the epsilon, I thought it was the level_eps that one?
level_in and level_eps?

If you can expand on this topic it would be great

PS: why some shapes have that kind of wave function? Is it because of the eikonal? why did you use the MSE_loss there?

Thanks again for your help and availability in advance

Train with other datasets

Would it be possible to train SDF-SRN on another dataset?
If it is possible, what should i prepare for the training process?
thanks.

Rendering script for Pascal3D+

Thanks for sharing the code, it's a great work!
Can you provide the script for Pascal 3D+ datasets to generate masks and ground-truth point clouds for other categories? Thank you very much!

Sphere-like structure

Hi, this is an excellent paper with very neat implementation! Especially solve the pain to acquire ground truth signed distance field in 3D supervision such as IM-Net and OccNet, and can be applied to real-world image. But there still exist some artifacts. Here are my two problems:

Sphere-like structure especially for Pascal3D with othographic projection when the part is very thin, for example, the chair's leg. Figure 6 in paper. I guess the reason to generate this sphere-like structure is because you optimize the signed distance f(zu) to be D(u), formula (4) in paper and formula (13) in supplementary, these two formulas make the assumption that the predicted 3D signed distance should be like its 2D counterpart, which is a circle in 2D, a sphere in 3D. But when the part is very thin, meaning there are only few pixels in this region, the signed distance is actually dominated by these few pixels, and the sdf value is too small to be optimized, leading to artifacts like 糖葫芦(Tang Hu Lu). My questions are: are there other reasons resulting in the sphere-like artifacts? How to eliminate this artifact (detect the thin part by estimating the width of sdf field in the horizontal direction, assign 'focal weights' to the thin-part loss? not very sure...)
'Constrain z* to fall within the last two ray-marching steps by encouraging the Nth step to be negative and the first (N-1) steps to be positive', this sentence confuse me because it seems you assume there is only one intersection with the surface. Actually there should be two intersections, the first intersection with front surface, st. z(N-1)>0 and z(N)<0, and the second intersection with back surface, st. z(N-1)<0 and z(N)>0. There should be two places that 'sign changes' happen. I believe you already consider these two cases, see Line144 (side = ((y0<0)^(y2>0)）in implicit.py. But the sentence you write in the paper is still confusing, in my opinion, it should be 'encourage Z(N)>0 and z(N-1)<0 for backsurface intersection, and Z(N-?)<0 and Z(N-?-1) for frontsurface intersection, ?>=1 and also ? depends on the 'width of sdf field ' as mentioned before. Maybe I am wrong, hope to see your reply.

how to test with my photos

hi,
I want to test my own picture, does the network input have to have the corresponding MASK.PNG?