wu-cvgl / mvcontrol-threestudio Goto Github PK

View Code? Open in Web Editor NEW

152.0 18.0 6.0 154.5 MB

Official implementation of "Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting"

Home Page: https://lizhiqi49.github.io/MVControl/

License: MIT License

Jupyter Notebook 1.57% Python 97.79% Dockerfile 0.17% Shell 0.47%

3d-generation 3d-geometry controllable-generation multi-view-networks gaussian-splatting

mvcontrol-threestudio's Introduction

MVControl-threestudio

ArXiv | Paper | Project Page

Official implementation of Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting

Zhiqi Li, Yiming Chen, Lingzhe Zhao, Peidong Liu

Abstract: While text-to-3D and image-to-3D generation tasks have received considerable attention, one important but under-explored field between them is controllable text-to-3D generation, which we mainly focus on in this work. To address this task, 1) we introduce Multi-view ControlNet (MVControl), a novel neural network architecture designed to enhance existing pre-trained multi-view diffusion models by integrating additional input conditions, such as edge, depth, normal, and scribble maps. Our innovation lies in the introduction of a conditioning module that controls the base diffusion model using both local and global embeddings, which are computed from the input condition images and camera poses. Once trained, MVControl is able to offer 3D diffusion guidance for optimization-based 3D generation. And, 2) we propose an efficient multi-stage 3D generation pipeline that leverages the benefits of recent large reconstruction models and score distillation algorithm. Building upon our MVControl architecture, we employ a unique hybrid diffusion guidance method to direct the optimization process. In pursuit of efficiency, we adopt 3D Gaussians as our representation instead of the commonly used implicit representations. We also pioneer the use of SuGaR, a hybrid representation that binds Gaussians to mesh triangle faces. This approach alleviates the issue of poor geometry in 3D Gaussians and enables the direct sculpting of fine-grained geometry on the mesh. Extensive experiments demonstrate that our method achieves robust generalization and enables the controllable generation of high-quality 3D content.

Method Overview

Installation

Install threestudio

~~This part is the same as original threestudio repository, skip it if you have already installed the environment.~~

!!! The requirement.txt we use is slightly different from the original threestudio repository (the version of diffusers and gradio). If error occurs with the original threestudio env, please use our configuration file.

See installation.md for additional information, including installation via Docker.

The following steps have been tested on Ubuntu20.04.

You must have an NVIDIA graphics card with at least 6GB VRAM and have CUDA installed.
Install Python >= 3.8.
(Optional, Recommended) Create a virtual environment:

python3 -m virtualenv venv
. venv/bin/activate

# Newer pip versions, e.g. pip-23.x, can be much faster than old versions, e.g. pip-20.x.
# For instance, it caches the wheels of git packages to avoid unnecessarily rebuilding them later.
python3 -m pip install --upgrade pip

~~- Install PyTorch >= 1.12. We have tested on torch1.12.1+cu113 and torch2.0.0+cu118, but other versions should also work fine.~~

Install PyTorch == 2.2.1 since xformers requires newest torch version.

# newest torch version under cuda11.8
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions:

pip install ninja

Install dependencies:

pip install -r requirements.txt

(Optional) tiny-cuda-nn installation might require downgrading pip to 23.0.1
(Optional, Recommended) The best-performing models in threestudio use the newly-released T2I model DeepFloyd IF, which currently requires signing a license agreement. If you would like to use these models, you need to accept the license on the model card of DeepFloyd IF, and login into the Hugging Face hub in the terminal by huggingface-cli login.
For contributors, see here.

Install 3D Gaussian dependencies

git clone --recursive https://github.com/ashawkey/diff-gaussian-rasterization
git clone https://github.com/DSaurus/simple-knn.git
pip install ./diff-gaussian-rasterization
pip install ./simple-knn

Install SuGaR dependencies

pip install open3d
# Install pytorch3d
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

Install LGM dependencies

pip install -r requirements-lgm.txt

Download pre-trained models

For LGM, following the instructions in their official repository.

mkdir pretrained && cd pretrained
wget https://huggingface.co/ashawkey/LGM/resolve/main/model_fp16.safetensors
cd ..

For MVDream, we use our diffusers implementation. The weights will be downloaded automatically via huggingface hub.
Our pre-trained multi-view ControlNets have been uploaded to huggingface hub, and they will also be automatically downloaded.
Or you can also manually download the MVDream and MVControls' checkpoints from here.

Quickstart

Stage 1. Generate coarse 3D Gaussians via MVControl + LGM

The following command will launch a GUI powered by gradio. You should fill in asset_name box with the name of current experiment, and the results will be saved in directory workspace/mvcontrol_[condition_type]/[asset_name]. The input image can be both condition image or a RGB image. When RGB image is input, the option image need preprocess on top left of the UI should be tagged, so that condition image, mask, and RGBA images will be saved in the output directory.

condition_type=depth  # canny/depth/normal/scribble
python app_stage1.py big --resume path/to/LGM/model_fp16.safetensors --condition_type $condition_type
# The generated coarse Gaussians will be saved to workspace/mvcontrol_{condition_type}/{asset_name}/coarse_gs.ply

Stage 2. Gaussian Optimizaiton

### Taking 'fatcat' as example
asset_name=fatcat
exp_root_dir=workspace/mvcontrol_$condition_type/$asset_name
hint_path=load/conditions/fatcat_depth.png  # path/to/condition.png
mask_path=load/conditions/fatcat_mask.png   # path/to/mask.png
prompt="A fat cat, standing with hands in ponts pockets"  # prompt
coarse_gs_path=$exp_root_dir/coarse_gs.ply # path/to/saved/coarse_gs.ply

python launch.py --config custom/threestudio-3dgs/configs/mvcontrol-gaussian.yaml --train --gpu 0 \
system.stage=gaussian \
system.hint_image_path=$hint_path \
system.hint_mask_path=$mask_path \
system.control_condition_type=$condition_type \
system.geometry.geometry_convert_from=$coarse_gs_path \
system.prompt_processor.prompt='$prompt' \
system.guidance_control.pretrained_controlnet_name_or_path='lzq49/mvcontrol-4v-${condition_type}' \
name=$asset_name \
tag=gaussian_refine

# # If use only coarse Gaussians' positions for initialization
# # Add the following two options in the command 
# system.geometry.load_ply_only_vertex=ture \
# system.geometry.load_vertex_only_position=true


### Extract coarse SuGaR from refined Gaussians
refined_gs_path=$exp_root_dir/gaussian_refine@LAST/save/exported_gs_step3000.ply
coarse_sugar_output_dir=$exp_root_dir/coarse_sugar

python extern/sugar/extract_mesh.py -s extern/sugar/load/scene \
-c $refined_gs_path -o $coarse_sugar_output_dir --use_vanilla_3dgs

Stage 3. SuGaR refinement

sugar_mesh_path=$coarse_sugar_output_dir/sugarmesh_vanilla3dgs_level0.3_decim200000_pd6.ply

python launch.py --config custom/threestudio-3dgs/configs/mvcontrol-sugar-vsd.yaml --train --gpu 0 \
system.stage=sugar \
system.hint_image_path=$hint_path \
system.hint_mask_path=$mask_path \
system.control_condition_type=$condition_type \
system.geometry.surface_mesh_to_bind_path=$sugar_mesh_path \
system.prompt_processor.prompt='$prompt' \
system.guidance_control.pretrained_controlnet_name_or_path='lzq49/mvcontrol-4v-${condition_type}' \
name=$asset_name \
tag=sugar_refine

### Textured mesh extraction
sugar_out_dir=$exp_root_dir/sugar_refine@LAST
python launch.py --config $sugar_out_dir/configs/parsed.yaml --export --gpu 0 resume=$sugar_out_dir/ckpts/last.ckpt

Easy way

We also provide a script running stage2 and stage3 from generated coarse Gaussians automatically:

python run_from_coarse_gs.py -n $asset_name -c $condition_type -p '$prompt' -cp $hint_path -mp $mask_path

Tips

Our method relies on coarse Gaussian initialization. So in the first stage, it's OK to try different random seeds to get a good LGM output, since the coarse Gaussian generation procedure is very fast (several seconds).
For better Gaussian optimization in stage 2, longer optimization steps can be used. We use 3000 steps in our paper for efficiency.

Todo

Release the inference code.
Reorgenize the code.
Improve the quality (texture & surface) of SuGaR refinement stage.
Provide more examples for test.

Credits

This project is built upon the awesome project threestudio and thanks to the open-source of these works: LGM, MVDream, ControlNet and SuGaR.

BibTeX

@misc{li2024controllable,
      title={Controllable Text-to-3D Generation via Surface-Aligned Gaussian Splatting}, 
      author={Zhiqi Li and Yiming Chen and Lingzhe Zhao and Peidong Liu},
      year={2024},
      eprint={2403.09981},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

mvcontrol-threestudio's People

Contributors

Stargazers

Watchers

Forkers

peytontolbert urbanist-ai ai-jie01 nevermorecy jackzhousz

mvcontrol-threestudio's Issues

Recently, most people tend to regard GS as the 3D representation. I'm not quite sure, even doubtful, that it's the best approach. It's preferable to have representations akin to multi-plane, continuous, and efficient representation.

runtime error: cannot fit 'int' into an index-sized integer

File "/opt/anaconda3/envs/v3d/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3682, in _pad
encoded_inputs["attention_mask"] = encoded_inputs["attention_mask"] + [0] * difference
OverflowError: cannot fit 'int' into an index-sized integer

Confusion about input image types.

个人感觉：
照片(rgb,rgba): 90%，草图(手工简单绘制): 10%，这两种是比较自然的输入形态。

为什么要将depth(难道是3D sensor)，normal，canny, scribble这种作为输入呢？

Thanks

Issues when run the code

Traceback (most recent call last):
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/gradio/queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/gradio/route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/gradio/blocks.py", line 1550, in process_api
result = await self.call_function(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/gradio/blocks.py", line 1185, in call_function
prediction = await anyio.to_thread.run_sync(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/anyio/to_thread.py", line 56, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 2144, in run_sync_in_worker_thread
return await future
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/anyio/_backends/_asyncio.py", line 851, in run
result = context.run(func, *args)
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/gradio/utils.py", line 661, in wrapper
response = f(*args, **kwargs)
File "/home/dubaiprince/Projects/MVControl-threestudio/app_stage1.py", line 237, in process
image = model.gs.render(gaussians, cam_view.unsqueeze(0), cam_view_proj.unsqueeze(0), cam_pos.unsqueeze(0), scale_modifier=1)['image']
File "/home/dubaiprince/Projects/MVControl-threestudio/extern/lgm/gs.py", line 76, in render
rendered_image, radii, rendered_depth, rendered_alpha = rasterizer(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/diff_gaussian_rasterization/init.py", line 213, in forward
return rasterize_gaussians(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/diff_gaussian_rasterization/init.py", line 32, in rasterize_gaussians
return _RasterizeGaussians.apply(
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/torch/autograd/function.py", line 598, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/diff_gaussian_rasterization/init.py", line 92, in forward
num_rendered, color, depth, alpha, radii, geomBuffer, binningBuffer, imgBuffer = _C.rasterize_gaussians(*args)
RuntimeError: means3D must have dimensions (num_points, 3)

I successfully launch the exp, but I get this errors, what's the issues

Request for Assistance: ModuleNotFoundError in Running Python Script

Hi, there is a problem on running the code:

python app_stage1.py big --resume pretrained/model_fp16.safetensors --condition_type $condition_type
/home/dubaiprince/Projects/MVControl-threestudio/extern/lgm/attention.py:22: UserWarning: xFormers is available (Attention)
  warnings.warn("xFormers is available (Attention)")
[INFO] Loaded checkpoint from pretrained/model_fp16.safetensors
unet/diffusion_pytorch_model.safetensors not found
Loading pipeline components...:  17%|██          | 1/6 [00:00<00:00, 125.69it/s]
Traceback (most recent call last):
  File "/home/dubaiprince/Projects/MVControl-threestudio/app_stage1.py", line 63, in <module>
    pipe_mvcontrol = load_mvcontrol_pipeline(
  File "/home/dubaiprince/Projects/MVControl-threestudio/extern/mvcontrol/pipeline_mvcontrol.py", line 966, in load_mvcontrol_pipeline
    pipe = MVControlPipeline.from_pretrained(
  File "/home/dubaiprince/miniconda3/envs/threestudio/lib/python3.9/site-packages/diffusers/pipelines/pipeline_utils.py", line 1093, in from_pretrained
    loaded_sub_model = load_sub_model(
  File "/home/dubaiprince/miniconda3/envs/threestudio/lib/python3.9/site-packages/diffusers/pipelines/pipeline_utils.py", line 386, in load_sub_model
    class_obj, class_candidates = get_class_obj_and_candidates(
  File "/home/dubaiprince/miniconda3/envs/threestudio/lib/python3.9/site-packages/diffusers/pipelines/pipeline_utils.py", line 317, in get_class_obj_and_candidates
    library = importlib.import_module(library_name)
  File "/home/dubaiprince/miniconda3/envs/threestudio/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 984, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'camera_proj'

Can you suggest anyway to fix it?

Will the code and the trained model ever be published?

Share your work to make other researchers life easier please? :)

Why are there many colorful dots on the GS and Mesh in the demo?

ValueError: Custom Code Execution Required and Deprecation Warnings in app_stage1.py

I'm encountering multiple issues when running the app_stage1.py script with the provided model. Here is a summary of the bugs:

ValueError: Custom Code Execution Required

While trying to load the lzq49/mvdream-sd21-diffusers model pipeline using MVControlPipeline.from_pretrained, I receive the following error:

bash
Copy code
ValueError:
The repository for lzq49/mvdream-sd21-diffusers contains custom code in unet/unet.py, camera_proj/camera_proj which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/lzq49/mvdream-sd21-diffusers/unet/unet.py, https://hf.co/lzq49/mvdream-sd21-diffusers/camera_proj/camera_proj.py.
Please pass the argument trust_remote_code=True to allow custom code to be run.
Suggested Fix:
Adding the trust_remote_code=True argument to the pipeline loading functions resolves the issue. Here is an example modification in app_stage1.py:

python
Copy code
from pipeline_mvcontrol import load_mvcontrol_pipeline

pipe_mvcontrol = load_mvcontrol_pipeline(
"lzq49/mvdream-sd21-diffusers",
revision="main",
torch_dtype=torch.float16,
trust_remote_code=True # Add this argument here
)
Or, directly with MVControlPipeline.from_pretrained:

python
Copy code
from diffusers import MVControlPipeline

pipe_mvcontrol = MVControlPipeline.from_pretrained(
"lzq49/mvdream-sd21-diffusers",
torch_dtype=torch.float16,
trust_remote_code=True # Add this argument here
)
2. Deprecation Warnings:

During execution, the following warnings are displayed:

vbnet
Copy code
/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
/home/dubaiprince/miniconda3/envs/mvcontrol/lib/python3.9/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: resume_download is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use force_download=True.
warnings.warn(
Suggested Fixes:

Update the usage of _register_pytree_node in diffusers/utils/outputs.py:
python
Copy code
torch.utils._pytree.register_pytree_node( # Replace _register_pytree_node
Update the resume_download usage in huggingface_hub/file_download.py:
python
Copy code
force_download=True # Instead of resume_download
3. xFormers Warning:

A warning related to xFormers is also raised:

arduino
Copy code
/home/dubaiprince/Projects/MVControl-threestudio/extern/lgm/attention.py:22: UserWarning: xFormers is available (Attention)
warnings.warn("xFormers is available (Attention)")
Suggested Fix:
Ensure that xFormers is compatible with the current environment by aligning the versions of Python, PyTorch, and CUDA.

Reproduction Steps:
Run the command python app_stage1.py big --resume pretrained/model_fp16.safetensors --condition_type $condition_type
Observe the error and warnings.
Environment:
OS: Ubuntu 20.04
Python Version: 3.9.19
Torch Version: 2.3.0
Torchvision Version: 0.17.1+cu118
Let me know if you need more details, or if I can assist with testing any fixes.