samsunglabs / spin-nerf Goto Github PK

3D Scene Inpainting with NeRFs

License: Other

Python 18.41% C++ 0.15% C 0.02% Cuda 0.15% Shell 0.30% Jupyter Notebook 80.94% Dockerfile 0.03%

spin-nerf's Introduction

SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

Project | Paper | YouTube | Dataset

Pytorch implementation of SPIn-NeRF. SPIn-NeRF leverages 2D priors from image inpainters, and enables view-consistent inpainting of NeRFs.

Quick Start

Dependencies

After installing Pytorch according to your CUDA version, install the rest of the dependencies:

pip install -r requirements.txt

Also, install LaMa dependencies with the following:

pip install -r lama/requirements.txt

You will also need COLMAP installed to compute poses if you want to run on your data.

Dataset preparation

Download the zip files of the dataset from here. Extract them under /data. Here, we provide information for running the statue scene. For other scenes, a similar approach with potentially different factor can be done.

Extract statue.zip under /data. This can be done with unzip statue.zip -d data. You might need to install unzip with sudo apt-get install unzip.

If you want to use your own data, make sure that you put your data in a folder under data with the following format (Note that labels under statue/images_2/label are 1 where we need inpainting, and 0 otherwise):

statue
├── images
│   ├── IMG_2707.jpg
│   ├── IMG_2708.jpg
│   ├── ...
│   └── IMG_2736.jpg
└── images_2
    ├── IMG_2707.png
    ├── IMG_2708.png
    ├── ...
    ├── IMG_2736.png
    └── label
        ├── IMG_2707.png
        ├── IMG_2708.png
        ├── ...
        └── IMG_2736.png

where in this example, we want to use --factor 2 for the images to use 2x downsized images for the fitting, thus we have put 2x downsized images under images_2. If your original images are larger, put the original images under images, and the Nx downsized images under images_N, where N is chosen based on your GPU availabitlity. Also, make sure to obtain camera parameters using COLMAP. This can be done with the following command:

python imgs2poses.py <your_datadir>

For example, for the sample statue dataset, the camera parameters can be obtained as python imgs2poses.py data/statue. Note that for this specific dataset, we have already provided the camera parameters and you can skip running COLMAP.

Running an initial NeRF for getting the depths

First, use the following command to render disparities from the training views. This can be done with the following:

rm -r LaMa_test_images/*
rm -r output/label/*
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --render_factor 1 --prepare --i_weight 1000000000 --i_video 1000000000 --i_feat 4000 --N_iters 4001 --expname statue --datadir ./data/statue --factor 2 --N_gt 0

After this, rendered disparities (inverse depths) are ready at lama/LaMa_test_images, with their corresponding labels at lama/LaMa_test_images/label.

Running LaMa to generate geometry and appearance guidance

First, let's run LaMa to generate depth priors:

cd lama

Now, make sure to follow the LaMa instructions for downloading the big-lama model.

export TORCH_HOME=$(pwd) && export PYTHONPATH=$(pwd)
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output

Now, the inpainted disparities are ready at lama/output/label. Copy the images and put the under data/statue/images_2/depth. It can be done with the following:

dataset=statue
factor=2

rm -r ../data/$dataset/images_$factor/depth
mkdir ../data/$dataset/images_$factor/depth
cp ./output/label/*.png ../data/$dataset/images_$factor/depth

Now, let's generate the inpainted RGB images:

dataset=statue
factor=2

rm -r LaMa_test_images/*
rm -r output/label/*
cp ../data/$dataset/images_$factor/*.png LaMa_test_images
mkdir LaMa_test_images/label
cp ../data/$dataset/images_$factor/label/*.png LaMa_test_images/label
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output
rm -r ../data/$dataset/images_$factor/lama_images
mkdir ../data/$dataset/images_$factor/lama_images
cp ../data/$dataset/images_$factor/*.png ../data/$dataset/images_$factor/lama_images
cp ./output/label/*.png ../data/$dataset/images_$factor/lama_images

The inpainted RGB images are now ready under lama/output/label, and have been copied to data/statue/images_2/lama_images.

statue
├── colmap_depth.npy
├── images
│   ├── IMG_2707.jpg
│   ├── ...
│   └── IMG_2736.jpg
├── images_2
│   ├── depth
│   │   ├── img000.png
│   │   ├── ...
│   │   └── img028.png
│   ├── IMG_2707.png
│   ├── IMG_2708.png
│   ├── ...
│   ├── IMG_2736.png
│   ├── label
│   │   ├── IMG_2707.png
│   │   ├── ... 
│   │   ├── IMG_2736.png
│   └── lama_images
│       ├── IMG_2707.png
│       ├── ...
│       └── IMG_2736.png
└── sparse

Let's move back to the main directory by cd ...

Running multiview inpainter

Now, using the following command, the optimization of the final inpainted NeRF will be started. A video of the inpainted NeRF will be saved every i_video iterations. The fitting will be done for N_iters iterations. A sample rendering from a random view point is saved to /test_renders every i_feat iterations, which can be used for early sanity checks and hyper-parameter tunings.

python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --i_feat 200 --lpips --i_weight 1000000000000 --i_video 1000 --N_iters 10001 --expname statue --datadir ./data/statue --N_gt 0 --factor $factor

Note that our experiments were done on Nvidia A6000 GPUs. In case of running on GPUs with lower memory, you might get out-of-memory errors. To prevent that, please try increasing the arguments --lpips_render_factor and --patch_len_factor, or reducing --lpips_batch_size.

Notes on mask dilation

Please note that as mentioned in the paper, the masks are dilated by default with a 5x5 kernel for 5 iterations to ensure that all of the object is masked, and that the effects of the shadow of the unwanted objects on the scene is reduced. If you wish to alter the dilation, first, you need to change the dilations applied by the LaMa model to generate the inpaintings under lama/saicinpainting/evaluation/refinement.py at the following line:

tmp = cv2.dilate(tmp.cpu().numpy().astype('uint8'), np.ones((5, 5), np.uint8), iterations=5)

Then, you also need to change the LLFF loader to load the masks with proper dilations applied to them under DS_NeRF/load_llff.py. In this file, the following line is responsible for the dilations:

msk = cv2.dilate(msk, np.ones((5, 5), np.uint8), iterations=5)

BibTeX

If you find SPIn-NeRF useful in your work, please consider citing it:

@inproceedings{spinnerf,
      title={{SPIn-NeRF}: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields}, 
      author={Ashkan Mirzaei and Tristan Aumentado-Armstrong and Konstantinos G. Derpanis and Jonathan Kelly and Marcus A. Brubaker and Igor Gilitschenski and Alex Levinshtein},
      year={2023},
      booktitle={CVPR},
}

spin-nerf's People

Contributors

Stargazers

Watchers

Forkers

spinnerf3d ricklentz freemang 3a1b2c3 wunianguiguolu paperwave udonda jackzhousz jiseunghong anton-brandl kaistchangmin gdbnpio chenhonghua jhzhang2077

spin-nerf's Issues

Datasets on your project page

Hi,
Thank you for your great work! I'm wondering how to get the datasets showed on your project page (e.g. piano)

How to run multiview segmentation part?

It seems MVseg part is not mentioned in Readme. How to run the mask part? Could you give some advices how to do this? Thanks.

Output size is too small

How to solve the following problem when running on a 3070 laptop?
RuntimeError: Given input size: (256x1x2). Calculated output size: (256x0x1). Output size is too small.

Error running run_nerf.py

Hi! I'm testing your solution on my computer and after installing everything, when i run

python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --render_factor 1 --prepare --i_weight 1000000000 --i_video 1000000000 --i_feat 4000 --N_iters 4001 --expname statue --datadir ./data/statue --factor 2 --N_gt 0

I get this error: File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/cv2/typing/__init__.py", line 169, in <module> LayerId = cv2.dnn.DictValue AttributeError: module 'cv2.dnn' has no attribute 'DictValue'
Do you have any clue as to why? I found online that i gotta comment that line on the init.py but i don't know, doing that gets me this new error
File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/imageio/__init__.py", line 97, in imread return imread_v2(uri, format=format, **kwargs) File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/imageio/v2.py", line 360, in imread result = file.read(index=0, **kwargs) TypeError: read() got an unexpected keyword argument 'ignoregamma'
So i left it like it was.

Any suggestione?

Parameters setting for the released spinnerf dataset

I am trying to reproduce the results on the released spinnerf dataset.
Are there any parameters that need tuning? or just the same as the statue dataset.

Statue Dataset Label All Black

Hi,
Thank you for releasing the great work on 3D inpainting! However, I encountered an issue with the sample statue dataset, and I found all masks under statue/images_2/label are zero. I wonder if this is expected? Thanks in advance for your response.

How to use mask_refinement.py?

IN README.md, I don't know how to use mask_refinement.py? what is its role?

Please explain it. Thanks.

cannot reproduce the seg results.

I have tried the DS_NeRF/run_nerf.py and find that the mask generated by semantic nerf is worse than mask provided. Does there any suggestions?

How many steps was the NeRF trained for to generate paper results?

In the paper, how many steps was the NeRF trained for to generate the results in Table 2 (training on your dataset)?

Should I directly run the code in the README to reproduce your paper results?
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --i_feat 200 --lpips --i_weight 1000000000000 --i_video 1000 --N_iters 10001 --expname statue --datadir ./data/spinnerf-dataset/1 --N_gt 0 --factor 4

Thanks!

how to use your Multiview-Segmentation-Data about llff dataset

hello, thanks for your excellent work!

I have a question about your dataset, I find that your Multiview-Segmentation-Data include processed llff dataset, but I don't know how to use it...

Could you please help me ?

Lama pretrained weight cannot be access

Hi,
thanks for your awesome work. As you mentioned in the readme, we need to have Lama : big-lama model to reproduce your results. However, the download link of the pretrained model weight in Lama is currently invalid.

Can you provide the big-lama model weight? That would be very helpful for us to understand your work.

Thanks in advance.

colmap_depth.npy

Can i ask how colmap_depth.npy get, it is just do colmap for depth images?

Instruction on how to create other datasets and use MVSeg is desired.

How to create my own dataset? 1)creating masks, 2)officail code for downscaling
How to use MVSeg? 1)generate masks from spin-nerf

Output contains artifacts in first stage

Hi, @ashmrz

Thank you for sharing your excellent work!
I'm trying to reproduce your result. I can reproduce your result using the "statue" scene.

But, using other scenes, NeRF in the first stage contains artifacts. If I understand correctly, NeRF in the first stage aims to produce depth maps, so it should be trained using only non-inpainted images with masks.
Currently, I have two problems. I describe these problems below.

1. When no implementations are changed.

This way loads 100 images including GT images. For GT images, all pixels are masked-out following this line (I'm not sure).
Then, I obtained the result below. A target object, which is a watering can, is semi-transparent. I think GT images that include no target bring a leak in the results.

10_vanilla_lpips_False_prepare_True_002000_rgb.mp4

2. Only using 60 images.

The provided camera parameters are optimized using 100 images by COLMAP as described in a log of COLMAP in spinnerf-dataset/${scene}/colmap_output.txt. So, just changing to use 60 images is an unanticipated implementation method. Actually, as shown in the video, the result looks very bad.

If I re-generate camera parameters using 60 images, your method could work well like the statue scene. But, it contradicts the published COLMAP information.

10_vanilla_lpips_False_prepare_True_001000_rgb.mp4

Question

Is the method of estimating camera parameters using COLMAP with only 60 images (problem 2) the correct way to reproduce your method?

I apologize in advance if my understanding is incorrect.

DS_NeRT/run_nerf.py script stuck at 100% 10001/10001 and there's no output apart from test_renders

I run DS_Nerf/run_nerf.py for about an hour and it shows progress of the training but then it reaches 100% and there's no more output from the script but it doesn't terminate either. Furthermore, there's no output model seen inside the project folder; the only output I see are the True and False images inside test_renders. Is this expected?

A question about processing data

hello, thanks for your excellent work

but I want to ask how do you process the 360 unbounded scene, I only find LLFF data loading code ....

Thanks a lot

how do you generate the original masks in ./labels ?

Thanks for the great work.
I'd like to ask, you said you used a video segment model to finush video segmentation. Which model did you use? Can you point out
the github repo? thanks

variables meaning

hello! Thanks for your excellent work !

But could you please tell me what these variables mean: rays_rgb, rays_inp,rays_rgb_clf(I think this means the region of image out of the mask)

where can I find the data after I Running multiview inpainter?

I can't find the data after I run multiview inpainter, where can I find the inpainted data?

resize factor

did you use resize factor 2 or 4 for your own dataset?

Is there a way to prevent the model from outputting the following information during training

Colmap output of spin-nerf-data

Thanks for the great work.
I want to ask why both the data of the removed object (GT) and the data of the unremoved object are sent to Colmap to generate the camera pose at the same time.
Shouldn't these two sets of data be sent separately to Colmap to generate the camera pose?
So when we run the data in the spin nerf dataset, we have to re feed the data of the unmodified object into the colmap for preprocessing once before we can continue running the spin nerf?