Giter Site home page Giter Site logo

spin-nerf's Introduction

SPIn-NeRF: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

Project | Paper | YouTube | Dataset

Pytorch implementation of SPIn-NeRF. SPIn-NeRF leverages 2D priors from image inpainters, and enables view-consistent inpainting of NeRFs.


Quick Start

Dependencies

After installing Pytorch according to your CUDA version, install the rest of the dependencies:

pip install -r requirements.txt

Also, install LaMa dependencies with the following:

pip install -r lama/requirements.txt

You will also need COLMAP installed to compute poses if you want to run on your data.

Dataset preparation

Download the zip files of the dataset from here. Extract them under /data. Here, we provide information for running the statue scene. For other scenes, a similar approach with potentially different factor can be done.

Extract statue.zip under /data. This can be done with unzip statue.zip -d data. You might need to install unzip with sudo apt-get install unzip.

If you want to use your own data, make sure that you put your data in a folder under data with the following format (Note that labels under statue/images_2/label are 1 where we need inpainting, and 0 otherwise):

statue
├── images
│   ├── IMG_2707.jpg
│   ├── IMG_2708.jpg
│   ├── ...
│   └── IMG_2736.jpg
└── images_2
    ├── IMG_2707.png
    ├── IMG_2708.png
    ├── ...
    ├── IMG_2736.png
    └── label
        ├── IMG_2707.png
        ├── IMG_2708.png
        ├── ...
        └── IMG_2736.png

where in this example, we want to use --factor 2 for the images to use 2x downsized images for the fitting, thus we have put 2x downsized images under images_2. If your original images are larger, put the original images under images, and the Nx downsized images under images_N, where N is chosen based on your GPU availabitlity. Also, make sure to obtain camera parameters using COLMAP. This can be done with the following command:

python imgs2poses.py <your_datadir>

For example, for the sample statue dataset, the camera parameters can be obtained as python imgs2poses.py data/statue. Note that for this specific dataset, we have already provided the camera parameters and you can skip running COLMAP.

Running an initial NeRF for getting the depths

First, use the following command to render disparities from the training views. This can be done with the following:

rm -r LaMa_test_images/*
rm -r output/label/*
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --render_factor 1 --prepare --i_weight 1000000000 --i_video 1000000000 --i_feat 4000 --N_iters 4001 --expname statue --datadir ./data/statue --factor 2 --N_gt 0

After this, rendered disparities (inverse depths) are ready at lama/LaMa_test_images, with their corresponding labels at lama/LaMa_test_images/label.

Running LaMa to generate geometry and appearance guidance

First, let's run LaMa to generate depth priors:

cd lama

Now, make sure to follow the LaMa instructions for downloading the big-lama model.

export TORCH_HOME=$(pwd) && export PYTHONPATH=$(pwd)
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output

Now, the inpainted disparities are ready at lama/output/label. Copy the images and put the under data/statue/images_2/depth. It can be done with the following:

dataset=statue
factor=2

rm -r ../data/$dataset/images_$factor/depth
mkdir ../data/$dataset/images_$factor/depth
cp ./output/label/*.png ../data/$dataset/images_$factor/depth

Now, let's generate the inpainted RGB images:

dataset=statue
factor=2

rm -r LaMa_test_images/*
rm -r output/label/*
cp ../data/$dataset/images_$factor/*.png LaMa_test_images
mkdir LaMa_test_images/label
cp ../data/$dataset/images_$factor/label/*.png LaMa_test_images/label
python bin/predict.py refine=True model.path=$(pwd)/big-lama indir=$(pwd)/LaMa_test_images outdir=$(pwd)/output
rm -r ../data/$dataset/images_$factor/lama_images
mkdir ../data/$dataset/images_$factor/lama_images
cp ../data/$dataset/images_$factor/*.png ../data/$dataset/images_$factor/lama_images
cp ./output/label/*.png ../data/$dataset/images_$factor/lama_images

The inpainted RGB images are now ready under lama/output/label, and have been copied to data/statue/images_2/lama_images.

statue
├── colmap_depth.npy
├── images
│   ├── IMG_2707.jpg
│   ├── ...
│   └── IMG_2736.jpg
├── images_2
│   ├── depth
│   │   ├── img000.png
│   │   ├── ...
│   │   └── img028.png
│   ├── IMG_2707.png
│   ├── IMG_2708.png
│   ├── ...
│   ├── IMG_2736.png
│   ├── label
│   │   ├── IMG_2707.png
│   │   ├── ... 
│   │   ├── IMG_2736.png
│   └── lama_images
│       ├── IMG_2707.png
│       ├── ...
│       └── IMG_2736.png
└── sparse

Let's move back to the main directory by cd ...

Running multiview inpainter

Now, using the following command, the optimization of the final inpainted NeRF will be started. A video of the inpainted NeRF will be saved every i_video iterations. The fitting will be done for N_iters iterations. A sample rendering from a random view point is saved to /test_renders every i_feat iterations, which can be used for early sanity checks and hyper-parameter tunings.

python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --i_feat 200 --lpips --i_weight 1000000000000 --i_video 1000 --N_iters 10001 --expname statue --datadir ./data/statue --N_gt 0 --factor $factor

Note that our experiments were done on Nvidia A6000 GPUs. In case of running on GPUs with lower memory, you might get out-of-memory errors. To prevent that, please try increasing the arguments --lpips_render_factor and --patch_len_factor, or reducing --lpips_batch_size.

Notes on mask dilation

Please note that as mentioned in the paper, the masks are dilated by default with a 5x5 kernel for 5 iterations to ensure that all of the object is masked, and that the effects of the shadow of the unwanted objects on the scene is reduced. If you wish to alter the dilation, first, you need to change the dilations applied by the LaMa model to generate the inpaintings under lama/saicinpainting/evaluation/refinement.py at the following line:

tmp = cv2.dilate(tmp.cpu().numpy().astype('uint8'), np.ones((5, 5), np.uint8), iterations=5)

Then, you also need to change the LLFF loader to load the masks with proper dilations applied to them under DS_NeRF/load_llff.py. In this file, the following line is responsible for the dilations:

msk = cv2.dilate(msk, np.ones((5, 5), np.uint8), iterations=5)

BibTeX

If you find SPIn-NeRF useful in your work, please consider citing it:

@inproceedings{spinnerf,
      title={{SPIn-NeRF}: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields}, 
      author={Ashkan Mirzaei and Tristan Aumentado-Armstrong and Konstantinos G. Derpanis and Jonathan Kelly and Marcus A. Brubaker and Igor Gilitschenski and Alex Levinshtein},
      year={2023},
      booktitle={CVPR},
}

spin-nerf's People

Contributors

ashkan-mirzaei avatar ashmrz avatar spinnerf3d avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spin-nerf's Issues

Datasets on your project page

Hi,
Thank you for your great work! I'm wondering how to get the datasets showed on your project page (e.g. piano)

Output size is too small

How to solve the following problem when running on a 3070 laptop?
RuntimeError: Given input size: (256x1x2). Calculated output size: (256x0x1). Output size is too small.

Error running run_nerf.py

Hi! I'm testing your solution on my computer and after installing everything, when i run

python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --render_factor 1 --prepare --i_weight 1000000000 --i_video 1000000000 --i_feat 4000 --N_iters 4001 --expname statue --datadir ./data/statue --factor 2 --N_gt 0

I get this error: File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/importlib/__init__.py", line 127, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/cv2/typing/__init__.py", line 169, in <module> LayerId = cv2.dnn.DictValue AttributeError: module 'cv2.dnn' has no attribute 'DictValue'
Do you have any clue as to why? I found online that i gotta comment that line on the init.py but i don't know, doing that gets me this new error
File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/imageio/__init__.py", line 97, in imread return imread_v2(uri, format=format, **kwargs) File "/home/david/miniconda3/envs/spinnerf/lib/python3.8/site-packages/imageio/v2.py", line 360, in imread result = file.read(index=0, **kwargs) TypeError: read() got an unexpected keyword argument 'ignoregamma'
So i left it like it was.

Any suggestione?

Statue Dataset Label All Black

Hi,
Thank you for releasing the great work on 3D inpainting! However, I encountered an issue with the sample statue dataset, and I found all masks under statue/images_2/label are zero. I wonder if this is expected? Thanks in advance for your response.

cannot reproduce the seg results.

I have tried the DS_NeRF/run_nerf.py and find that the mask generated by semantic nerf is worse than mask provided. Does there any suggestions?
image

How many steps was the NeRF trained for to generate paper results?

In the paper, how many steps was the NeRF trained for to generate the results in Table 2 (training on your dataset)?

Should I directly run the code in the README to reproduce your paper results?
python DS_NeRF/run_nerf.py --config DS_NeRF/configs/config.txt --i_feat 200 --lpips --i_weight 1000000000000 --i_video 1000 --N_iters 10001 --expname statue --datadir ./data/spinnerf-dataset/1 --N_gt 0 --factor 4

Thanks!

Lama pretrained weight cannot be access

Hi,
thanks for your awesome work. As you mentioned in the readme, we need to have Lama : big-lama model to reproduce your results. However, the download link of the pretrained model weight in Lama is currently invalid.

Can you provide the big-lama model weight? That would be very helpful for us to understand your work.

Thanks in advance.

colmap_depth.npy

Can i ask how colmap_depth.npy get, it is just do colmap for depth images?

Output contains artifacts in first stage

Hi, @ashmrz

Thank you for sharing your excellent work!
I'm trying to reproduce your result. I can reproduce your result using the "statue" scene.

But, using other scenes, NeRF in the first stage contains artifacts. If I understand correctly, NeRF in the first stage aims to produce depth maps, so it should be trained using only non-inpainted images with masks.
Currently, I have two problems. I describe these problems below.

1. When no implementations are changed.

This way loads 100 images including GT images. For GT images, all pixels are masked-out following this line (I'm not sure).
Then, I obtained the result below. A target object, which is a watering can, is semi-transparent. I think GT images that include no target bring a leak in the results.

10_vanilla_lpips_False_prepare_True_002000_rgb.mp4

2. Only using 60 images.

The provided camera parameters are optimized using 100 images by COLMAP as described in a log of COLMAP in spinnerf-dataset/${scene}/colmap_output.txt. So, just changing to use 60 images is an unanticipated implementation method. Actually, as shown in the video, the result looks very bad.

If I re-generate camera parameters using 60 images, your method could work well like the statue scene. But, it contradicts the published COLMAP information.

10_vanilla_lpips_False_prepare_True_001000_rgb.mp4

Question

Is the method of estimating camera parameters using COLMAP with only 60 images (problem 2) the correct way to reproduce your method?

I apologize in advance if my understanding is incorrect.

A question about processing data

hello, thanks for your excellent work

but I want to ask how do you process the 360 unbounded scene, I only find LLFF data loading code ....

Thanks a lot

variables meaning

hello! Thanks for your excellent work !

But could you please tell me what these variables mean: rays_rgb, rays_inp,rays_rgb_clf(I think this means the region of image out of the mask)

resize factor

did you use resize factor 2 or 4 for your own dataset?

Colmap output of spin-nerf-data

Thanks for the great work.
I want to ask why both the data of the removed object (GT) and the data of the unremoved object are sent to Colmap to generate the camera pose at the same time.
Shouldn't these two sets of data be sent separately to Colmap to generate the camera pose?
So when we run the data in the spin nerf dataset, we have to re feed the data of the unmodified object into the colmap for preprocessing once before we can continue running the spin nerf?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.