Giter Site home page Giter Site logo

switch-nerf's Introduction

Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields (ICLR 2023)

Demo

Updation

  • 2023-04-13, move ckpts to onedrive
  • 2023-03-30, stable release.
  • 2023-03-28, release the checkpoints and codes for three datasets.

Installation

The main dependencies are in the requirements.txt. We use this version of Tutel in for MoE layers. The Tutel has changed a lot so make sure to install the version of the correct commit. Please follow the instructions in Tutel to install it. We give an instruction on the Tutel installation.

Dataset

We have performed experiments on the datasets from the Mega-NeRF, Block-NeRF and Bungee-NeRF.

Mega-NeRF

Please follow the instructions in the code of Mega-NeRF to download and process the Mill 19 and UrbanScene 3D datasets.

Block-NeRF

Please follow the website of Block-NeRF to download the raw Mission Bay dataset.

Bungee-NeRF

Please follow the BungeeNeRF to download its two scenes.

Training

Mega-NeRF scenes

We provide the example commands to train the model on Building scene.

We should first generate data chunks. The dataset_path should be set to the scene folder processed above. The exp_name is used for logging results. If it does not exit, the program will make a new one. The chunk_paths is used to store the generate the data chunks. The chunks will be reused in later experiments.

Generate chunks. Please edit the exp_name, dataset_path and chunk_paths.

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch \ 
--use_env --master_port=12345 --nproc_per_node=1 -m \
switch_nerf.train \
--config=switch_nerf/configs/switch_nerf/building.yaml \
--use_moe \
--exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/building-pixsfm \
--chunk_paths=/your/absolute/chunk/path/building_chunk_factor_1_bg \
--generate_chunk

Train the model on the Building scene and the generated chunks. The chunk_paths is reused after generating chunks.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=8 -m \
switch_nerf.train \
--config=switch_nerf/configs/switch_nerf/building.yaml \
--use_moe \
--exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/building-pixsfm \
--chunk_paths=/your/absolute/chunk/path/building_chunk_factor_1_bg \
--use_balance_loss \
--i_print=1000 \
--batch_size=8192 \
--moe_expert_type=expertmlp \
--moe_train_batch \
--moe_test_batch \
--model_chunk_size=131072 \
--moe_capacity_factor=1.0 \
--batch_prioritized_routing \
--moe_l_aux_wt=0.0005 \
--amp_use_bfloat16 \
--use_moe_external_gate \
--use_gate_input_norm \
--use_sigma_noise \
--sigma_noise_std=1.0

Block-NeRF scenes

We adapt a data interface mainly based on the UnboundedNeRFPytorch. We first generate data chunks from the raw tf_records in Block-NeRF dataset.

Please edit the exp_name, dataset_path and chunk_paths.

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=1 -m \
switch_nerf.train \
--config=switch_nerf/configs/switch_nerf/mission_bay.yaml \
--use_moe \
--exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/Mission_Bay/v1.0 \
--block_train_list_path=switch_nerf/datasets/lists/block_nerf_train_val.txt \
--block_image_hash_id_map_path=switch_nerf/datasets/lists/block_nerf_id_map.json \
--chunk_paths=/your/absolute/chunk/path/mission_bay_chunk_radii_1 \
--no_bg_nerf --near=0.01 --far=10.0 --generate_chunk

Then we train the model on the Mission Bay scene and the generated chunks. The batch_size is set according to the memory of RTX 3090.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=8 -m \
switch_nerf.train \
--config=switch_nerf/configs/switch_nerf/mission_bay.yaml \
--use_moe --exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/Mission_Bay/v1.0 \
--block_train_list_path=switch_nerf/datasets/lists/block_nerf_train_val.txt \
--block_image_hash_id_map_path=switch_nerf/datasets/lists/block_nerf_id_map.json \
--chunk_paths=/your/absolute/chunk/path/mission_bay_chunk_radii_1 \
--no_bg_nerf --near=0.01 --far=10.0 \
--use_balance_loss \
--i_print=1000 \
--batch_size=13312 \
--moe_expert_type=expertmlp \
--moe_train_batch \
--moe_test_batch \
--model_chunk_size=212992 \
--coarse_samples=257 \
--fine_samples=257 \
--moe_capacity_factor=1.0 \
--batch_prioritized_routing \
--moe_l_aux_wt=0.0005 \
--amp_use_bfloat16 \
--use_moe_external_gate \
--use_gate_input_norm \
--use_sigma_noise \
--sigma_noise_std=1.0

Bungee-NeRF scenes

We need not to generate chunks for Bungee-NeRF scenes. We provide the example commands to train the model on Transamerica scene.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=4 -m \
switch_nerf.train_nerf_moe \
--config=switch_nerf/configs/switch_nerf/bungee.yaml \
--use_moe --exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/multiscale_google_Transamerica \
--use_balance_loss \
--i_print=1000 \
--batch_size=4096 \
--moe_expert_type=expertmlp \
--moe_train_batch \
--moe_test_batch \
--model_chunk_size=65536 \
--moe_capacity_factor=1.0 \
--batch_prioritized_routing \
--moe_l_aux_wt=0.0005 \
--no_amp \
--use_moe_external_gate \
--use_gate_input_norm \
--use_sigma_noise \
--sigma_noise_std=1.0 \
--moe_expert_num=4

The two scenes in Bungee-NeRF use the same configure file.

Testing

We provide checkpoints in onedrive.

Test on the Building scene in Mega-NeRF dataset.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=8 -m \
switch_nerf.eval_image \
--config=switch_nerf/configs/switch_nerf/building.yaml \
--use_moe --exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/building-pixsfm \
--i_print=1000 \
--moe_expert_type=seqexperts \
--model_chunk_size=131072 \
--ckpt_path=/your/absolute/ckpt/path/building.pt \
--expertmlp2seqexperts \
--use_moe_external_gate \
--use_gate_input_norm

Test on the the Mission Bay scene in Block-NeRF dataset.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=8 -m \
switch_nerf.eval_image_blocknerf \
--config=switch_nerf/configs/switch_nerf/mission_bay.yaml \
--use_moe \
--exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/Mission_Bay/v1.0 \
--block_val_list_path=switch_nerf/datasets/lists/block_nerf_val.txt \
--block_train_list_path=switch_nerf/datasets/lists/block_nerf_train_val.txt \
--block_image_hash_id_map_path=switch_nerf/datasets/lists/block_nerf_id_map.json \
--i_print=1000 \
--near=0.01 --far=10.0 \
--moe_expert_type=seqexperts \
--model_chunk_size=212992 \
--coarse_samples=513 \
--fine_samples=513 \
--ckpt_path=/your/absolute/ckpt/path/mission_bay.pt \
--expertmlp2seqexperts \
--use_moe_external_gate \
--use_gate_input_norm \
--set_timeout \
--image_pixel_batch_size=8192

You can also use less GPUs.

Test on the Transamerica scene in Bungee-NeRF dataset.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=4 -m \
switch_nerf.eval_nerf_moe \
--config=switch_nerf/configs/switch_nerf/bungee.yaml \
--use_moe \
--exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/multiscale_google_Transamerica \
--i_print=1000 \
--batch_size=4096 \
--moe_expert_type=seqexperts \
--model_chunk_size=65536 \
--ckpt_path=/your/absolute/ckpt/path/transamerica.pt \
--expertmlp2seqexperts \
--no_amp \
--use_moe_external_gate \
--use_gate_input_norm \
--moe_expert_num=4

Visualization

We provide a simple point cloud visualizer in this repository. You can use the commands below to create point clouds and visualize them with transparency. You can use Meshlab to visualize the point clouds without transparency. Meshlab can also visualize the transparency with "Shading: Dot Decorator" selected but the visualization is not clear enough.

Generate point clouds:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--use_env --master_port=12345 --nproc_per_node=8 -m \
switch_nerf.eval_points \
--config=switch_nerf/configs/switch_nerf/building.yaml \
--use_moe --exp_name=/your/absolute/experiment/path \
--dataset_path=/your/absolute/scene/path/building-pixsfm \
--i_print=1000 \
--moe_expert_type=seqexperts \
--model_chunk_size=131072 \
--ckpt_path=/your/absolute/ckpt/path/500000.pt \
--expertmlp2seqexperts \
--use_moe_external_gate \
--use_gate_input_norm \
--moe_return_gates \
--return_pts \
--return_pts_rgb \
--return_pts_alpha \
--render_test_points_sample_skip=4 \
--val_scale_factor=8 \
--render_test_points_image_num=20

Other scenes in Mega-NeRF use --render_test_points_image_num=21.

Merge point clouds from different validation images.

python -m switch_nerf.scripts.merge_points \
--data_path=/your/absolute/experiment/path/0/eval_points \
--merge_all \
--image_num=20 \
--model_type=switch \
-r=0.2

Other scenes in Mega-NeRF use --image_num=21. -r is used to randomly downsample point clouds by a ratio so that it can be visualized on our desktop.

License

Our code is distributed under the MIT License. See LICENSE file for more information.

Citation

@inproceedings{mi2023switchnerf,
  title={Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields},
  author={Zhenxing Mi and Dan Xu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2023},
  url={https://openreview.net/forum?id=PQ2zoIZqvm}
}

Contact

If you have any questions, please raise an issue or email to Zhenxing Mi ([email protected]).

Acknowledgments

Our code follows several awesome repositories. We appreciate them for making their codes available to public.

switch-nerf's People

Contributors

mizhenxing avatar xu-vision-group avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

switch-nerf's Issues

Can't run on Bungee-NeRF scenes

Hi, I followed readme and tried this example code, but got an error: 'Error while finding module specification for 'mega_nerf.train_nerf_moe' (ModuleNotFoundError: No module named 'mega_nerf')'. And I didn't find the mega_nerf directory nor train_nerf_moe.py file. Are there any files missed?

tutel installation error

I cannot build tutel correctly, did you ever met the problem that cannot find 'tutel_custom_kernel'?

About render rate

Hello author, thank you for opening source the code for this work. When I tested the model, I found that it went through a round of iterations for about 113s, with a total of 20 rounds. May I ask if this 113s is the time used to render a picture? My device is RTX 3090

The ckpt provided on the building dataset yielded weak results

I downloaded the provided ckpt of building,then ran the following command:
CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --use_env --master_port=12345 --nproc_per_node=1 -m switch_nerf.eval_image --config=switch_nerf/configs/switch_nerf/building.yaml --use_moe --exp_name=output/building-pixsfm/ --dataset_path=data/building-pixsfm/ --i_print=1000 --moe_expert_type=seqexperts --model_chunk_size=131072 --ckpt_path=output/building-pixsfm/ckpt/building.pt --expertmlp2seqexperts --use_moe_external_gate --use_gate_input_norm

The evaluation indicators obtained are as follows:
INFO:root:Average val/time: 44.57523021697998 2023-11-29 19:59:28,001 root INFO: Average val/time: 44.57523021697998 INFO:root:Average val/memory: 5856.643969726562 2023-11-29 19:59:28,001 root INFO: Average val/memory: 5856.643969726562 INFO:root:Average val/psnr: 21.543164134025574 2023-11-29 19:59:28,001 root INFO: Average val/psnr: 21.543164134025574 INFO:root:Average val/ssim: 0.5792561799287796 2023-11-29 19:59:28,001 root INFO: Average val/ssim: 0.5792561799287796 INFO:root:Average val/lpips/vgg: 0.47358196824789045 2023-11-29 19:59:28,001 root INFO: Average val/lpips/vgg: 0.47358196824789045 INFO:root:Average val/lpips/alex: 0.39700327515602113 2023-11-29 19:59:28,001 root INFO: Average val/lpips/alex: 0.39700327515602113 INFO:root:Average val/lpips/squeeze: 0.29448841512203217 2023-11-29 19:59:28,001 root INFO: Average val/lpips/squeeze: 0.29448841512203217

These results are consistent with the results presented in the paper on the building data.However, when I looked at the rendered image, I felt that it was not very good and there were some artifacts and blurring.
0_gt
0_pred
14_gt
14_pred

Is this caused by parameter settings? Thank you for your help.

Why the visualization results are bad

Hi,
Thank you for the exceptional code you provided! However, after running 500,000 iterations on the Building dataset and attempting to visualize the merged ".ply" file, I encountered unsatisfactory results in Meshlab, as illustrated below.
1689732112722
Can you assist me in identifying the reason behind the occurrence of this problem, please? Thanks!

Experiments on forward view scenes

Thanks for your excellent work!
I noticed the visualized results are all experiments of the drone view scenes.
Is there any visualized results on scenes like block-nerf data?

source code request

I was working on the research of NeRF. I've noticed your submission on 2023 ICLR and it is a great honour for me to read your article, Switch-NeRF: Learning Scene Decomposition with Mixture of Experts for Large-scale Neural Radiance Fields. It is impressive that you bring MoE into the application of NeRF. The work makes me refreshed and I want to induct it into other fields. I am very interested in your research and want to further explore the research. But unfortunately, I can not access the codes on Github URL you leave when I try to implement your algorithm. Could you please upload the source code of your project?

How to create the scene decomposes image in your paper?

作者你好!
我想利用你的方法对自己的数据集做重建,想生成场景分解中的各个子模块训练的局部场景可视化结果。就类似于你在论文中的这张配图,请问怎么做呢?
分割

Runtime Error

[E ProcessGroupNCCL.cpp:587] [Rank 2] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801171 milliseconds before timing out.
[E ProcessGroupNCCL.cpp:587] [Rank 1] Watchdog caught collective operation timeout: WorkNCCL(OpType=ALLREDUCE, Timeout(ms)=1800000) ran for 1801172 milliseconds before timing out.

Why is this?

This seems to indicate that a timeout error occurred while performing some kind of parallel computing task. Specifically, when using NCCL (NVIDIA Collective Communications Library) for communication, a certain operation took longer than the set threshold (1800000 milliseconds, or 30 minutes) and was therefore interrupted.

Have you ever encountered such a problem? Can you give me some solutions?

Out of memory

I use 2 RTX3090 for block-nerf dataset training, and I find that it runs out of memory in moe module. Is that normal?
image
Althought I decrease the batch_size, even to batch_size=2312, it still went out of memory.
And my scripts are like:

CUDA_VISIBLE_DEVICES=0,1 python -m torch.distributed.launch \
    --use_env --master_port=12345 --nproc_per_node=2 -m \
    switch_nerf.train \
    --config=switch_nerf/configs/switch_nerf/mission_bay.yaml \
    --use_moe --exp_name=data/outputs \
    --dataset_path=data/datasets/waymo-block-nerf/v1.0 \
    --block_train_list_path=switch_nerf/datasets/lists/block_nerf_train_val.txt \
    --block_image_hash_id_map_path=switch_nerf/datasets/lists/block_nerf_id_map.json \
    --chunk_paths=data/chunks/mission_bay_chunk_radii_1 \
    --no_bg_nerf --near=0.01 --far=10.0 \
    --use_balance_loss \
    --i_print=200 \
    --batch_size=12312 \
    --moe_expert_type=expertmlp \
    --moe_train_batch \
    --moe_test_batch \
    --model_chunk_size=212992 \
    --coarse_samples=257 \
    --fine_samples=257 \
    --moe_capacity_factor=1.0 \
    --batch_prioritized_routing \
    --moe_l_aux_wt=0.0005 \
    --amp_use_bfloat16 \
    --use_moe_external_gate \
    --use_gate_input_norm \
    --use_sigma_noise \
    --sigma_noise_std=1.0

Problems on example codes

Hello!
Thank you for sharing your code.
I am trying to reproduce your result on Mega-NeRF building dataset.
I am using one RTX A6000 GPU.
During generating chunks, I have an error.

Traceback (most recent call last):
  File "/home/user/anaconda3/envs/nerf/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/user/anaconda3/envs/nerf/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/user/switchnerf/Switch-NeRF/switch_nerf/train.py", line 28, in <module>
    main(_get_train_opts())
  File "/home/user/anaconda3/envs/nerf/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 345, in wrapper
    return f(*args, **kwargs)
  File "/home/user/switchnerf/Switch-NeRF/switch_nerf/train.py", line 24, in main
    Runner(hparams).train()
  File "/home/user/switchnerf/Switch-NeRF/switch_nerf/runner.py", line 532, in train
    dataset = FilesystemDataset(self.train_items, self.near, self.far, self.ray_altitude_range,
  File "/home/user/switchnerf/Switch-NeRF/switch_nerf/datasets/filesystem_dataset.py", line 55, in __init__
    append_arrays = self._check_existing_paths(chunk_paths, center_pixels, scale_factor,
  File "/home/user/switchnerf/Switch-NeRF/switch_nerf/datasets/filesystem_dataset.py", line 296, in _check_existing_paths
    assert (chunk_path / 'metadata.pt').exists(), \
AssertionError: Could not find metadata file (did previous writing to this directory not complete successfully?)

where can I generate metadata.pt?

I thought that the example script generates the data chunk, but somehow it requires me to give it metadata of the data chunk.

I am trying to run Mega-NeRF too, but it is hard to find.

Please let me know if you have any solution.

Also, in the bungee-nerf training code,

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --use_env --master_port=12345 --nproc_per_node=1 -m mega_nerf.train_nerf_moe --config=switch_nerf/configs/switch_nerf/bungee.yaml

So there comes "mega_nerf.train_nerf_moe FAILED" error.

I guess it should be changed to "switch_nerf.train_nerf_moe" ! and it works well.

Thank you!

How to train custom dataset

Hi.
Thank you for your work.

What should I prepare for training my custom dataset?

I have converted my dataset to the mega_nerf format and used meta_nerf style commands(building.yaml) to generate chunks, but the results is so bad and i think i miss some important steps when i prepare custom dataset. Do you have any scripts or tips for preparing custom dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.