thu-ml / crm Goto Github PK

View Code? Open in Web Editor NEW

498.0 19.0 39.0 2.18 MB

[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

Home Page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/

License: MIT License

Python 100.00%

3d aigc diffusion-models generative-model multiview reconstruction

crm's Issues

Hi, image as guide

Hi , I want to use image as guidance to generate the texture , do u have some ideas

CUDA out of memory

I am running CRM in RTX 2080 Ti GPU with 10.76 GiB memory.
However, I am getting CUDA running out of memory error.
May I ask any ways to minimize GPU memory usage while running CRM?

Is there any plan to release training script?

Hi, thanks for your wonderful paper and the released inference code!!! I am wondering if there is any plan to release the training script. Your reply will be highly appreciated~

Seeking Clarification on VAE and CRM Performance in CCM Diffusion

Outstanding Work!
I have a question regarding the performance of the VAE in your CCM Diffusion model. As far as I understand, VAE typically struggle to reconstruct precise CCM. Since the performance of the VAE sets the upper limit for the quality of the CCM Diffusion, it follows that the CCM produced by the diffusion process might not be perfectly accurate.

However, I noticed that the CRM module manages to output accurate meshes. This raises a couple of questions:

Is there a specific trick or method you used during training to enable the CRM to refine and correct the inaccuracies in the CCM?
How does the CRM achieve such high accuracy in the final mesh outputs despite the initial limitations of the VAE?

I would greatly appreciate any insights or details you could provide on these points. Thank you for your time and for sharing your work with the community.

RuntimeError: Ninja is required to load C++ extensions

CRM.pth: 100%|███████████████████████████████████████████████████████████████████████| 476M/476M [13:35<00:00, 583kB/s]
Traceback (most recent call last):
File "E:\project\CRM\app.py", line 129, in
model = CRM(specs).to(args.device)
File "E:\project\CRM\model\crm\model.py", line 59, in init
self.renderer = Renderer(tet_grid_size=self.tet_grid_size, camera_angle_num=self.camera_angle_num,
File "E:\project\CRM\util\renderer.py", line 15, in init
self.glctx = dr.RasterizeCudaContext()
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 177, in init
self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx)
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 118, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
return _jit_compile(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1793, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1842, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
E:\project\CRM>python\python.exe -m pip install Ninja
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: Ninja in e:\project\crm\python\lib\site-packages (1.11.1.1)

finetune code?

Will be release the finetune code?

关于demo的lego-style

感谢作者。
我想问下，demo里面的lego-style 3d，是怎么做的，可否分享下

谢谢

Upscaled RBG and CCM ,Tile-Based generation

Hi ,
i wanted to ask if let's say i have taken the 256x6 MV images and generated a Higher resolution MV sheet
is it possible for CRM to generate a better 3d model with more details ?

My tests ideas are :
-Regular Upscale , (CCM won't be that good probably just change resolutions no upscale , still can't figure out if the CCM are used for texturing or generating the 3d mesh ... or both )
or
-run a Tile-Based Algorithm:
first do a regular CRM image generation RGB and CCM 256x6 then upscale them as follows
Algorithm will split the input image into multiple Tiles and generate RGB and CCM for each tile , then blend them all together into one High resolution MV RGB CCM images .

the Tile code is ready and only need some modifications , it showed some great results with Depth map blending
i did some modifications to the code and the models config files and changed the size of the input tensors(array images), the generated RGB and CCM are just garbage using the regular workflow at high resolutions so i can't really tell .
what i need to know :

1-will the Decoder Works with resolutions Higher that 256x6 example 512x3,072 ? or the model is just trained on that and wont work ?
2-i read the paper multiple times , but can't understand CCM , can we skip generating those and just use RGB ? are CCM essential for Mesh generation or used just for texturing ?
3-let's say we have extremely detailed Depth maps , like 4k ultra sharp Maps even skin pores will be present... can we in anyway introduce those depth maps into the workflow of CRM ? (this one is very important)

do let me know ,and many thanks in advance , much love and respect for your work , cheers

gradio app.py error

Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
--- using zero snr---
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:117: RuntimeWarning: divide by zero encountered in divide
"sqrt_recip_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod))
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:120: RuntimeWarning: divide by zero encountered in divide
"sqrt_recipm1_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod - 1))
Killed

After some time, while running the code, I encountered a 'divide by zero' error. It seems to occur during a division operation, which is likely causing the program to terminate unexpectedly. I suspect there might be an issue with the calculations or data processing logic in the code. The error message indicates that a division by zero was encountered, which is mathematically undefined. Could you please provide guidance on how to resolve this issue?

problems about the random seed in the code

hello, thank you for your brilliant work in the fast feed-forward 3d generation model!
i have tried your HuggingFace demo and it works well. I noticed that users can manipulate the random seed in the demo. but when I run your code, it seems no command arguments for the random seed. i wonder how can I set the seed properly. np? torch? torch.cuda?

thank you :)

Why can't I successfully download the weights file？

Could you share the filter list of Objaverse?

Greets, could you share the filter list like LGM? Thanks in advance.

Error with Xformers newest version

I'm trying to run this using Xformers 0.0.25 because I have to run latest Torch 2.2.1 which Google Colab just updated to, and xformers 0.0.24 only works with Torch 2.2.0, so installing 0.0.24 takes ~5 minutes and a Restart to downgrade to torch 2.2.0. I got it working in my app at DiffusionDeluxe.com using the recommended 0.0.24 (although keeps running out of RAM), but with 0.0.25 I'm getting this error:

Traceback (most recent call last):
  File "/content/sdd_colab.py", line 46547, in run_crm
    crm_model = CRM(specs).to(torch_device)
  File "/content/CRM/model/crm/model.py", line 46, in __init__
    self.unet2 = UNetPP(in_channels=self.dec.c_dim)
  File "/content/CRM/model/archs/unet.py", line 43, in __init__
    self.unet.enable_xformers_memory_efficient_attention()
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 295, in enable_xformers_memory_efficient_attention
    self.set_use_memory_efficient_attention_xformers(True, attention_op)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 259, in set_use_memory_efficient_attention_xformers
    fn_recursive_set_mem_eff(module)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
    fn_recursive_set_mem_eff(child)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 252, in fn_recursive_set_mem_eff
    module.set_use_memory_efficient_attention_xformers(valid, attention_op)
  File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 253, in set_use_memory_efficient_attention_xformers
    raise ModuleNotFoundError(
ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers

I'm hoping you can figure out a solution to fix the breaking change to make it compatible with both, but always nice to be using the newest versions. Thanks, tried to trace the problem down myself, but didn't understand it enough.

About CCM

Hi ! i want to use CCM on my own dataset. Can you tell me how to get the GT of CCM，or can you provide the relevant scripts for six orthogonal projected image of CCM?

Multi GPUs

Hi @thuwzy , does CRM support multiple GPUs for inference? I tried CUDA_VISIBLE_DIVICES=0,1, but it does not seem to work (2*2080Ti).

显存最低要求是多少？8GB报错内存不足

显存最低要求是多少，有没有配置可以在低显存上跑？

How do you render an image from flexicube geometry?

Great work! Can you please share more details and ideally code of how do you render a render rgb image from the flexicubes geometry? I see the following code, but it only renders masks and depth. Thanks in advance!

    def render_mesh(self, mesh_v_nx3, mesh_f_fx3, camera_mv_bx4x4, resolution=256, hierarchical_mask=False):
        return_value = dict()
        if self.render_type == 'neural_render':
            tex_pos, mask, hard_mask, rast, v_pos_clip, mask_pyramid, depth = self.renderer.render_mesh(
                mesh_v_nx3.unsqueeze(dim=0),
                mesh_f_fx3.int(),
                camera_mv_bx4x4,
                mesh_v_nx3.unsqueeze(dim=0),
                resolution=resolution,
                device=self.device,
                hierarchical_mask=hierarchical_mask
            )

            return_value['tex_pos'] = tex_pos
            return_value['mask'] = mask
            return_value['hard_mask'] = hard_mask
            return_value['rast'] = rast
            return_value['v_pos_clip'] = v_pos_clip
            return_value['mask_pyramid'] = mask_pyramid
            return_value['depth'] = depth
        else:
            raise NotImplementedError

        return return_value

checkpoint loading size mismatch

Thanks for your awesome work and contribution!
I tried to run your codes locally after downloading model checkpoints from huggingface, but I encountered a size mismatch error when doing so:

Traceback (most recent call last):
  File "/CRM/local_inference.py", line 152, in <module>    
    pipeline = TwoStagePipeline(  
  File "/CRM/pipelines.py", line 31, in __init__
    self.stage1_model.load_state_dict(torch.load(stage1_model_config.resume, map_location="cpu"), strict=False)
  File "/envs/crm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusionInterface:
        size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).

Interestingly, when I swap the checkpoints for ccm_diffusion and pixel_diffusion, both loading and inference work pretty well. But the results are definitely not correct after swapping the checkpoints.

I have not changed anything to the codes.

environment problem

hello,when I set up the environment, I ran into a problem with the error message:
"ERROR: Could not build wheels for xformers, which is required to install pyproject.toml-based projects.
how can I fix it?

version error

请问pytorch的版本具体是什么，使用torch 2.0.1报错torch缺少compiler。

WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.0.1+cu117)
    Python  3.10.11 (you have 3.10.13)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\utils\import_utils.py", line 684, in _get_module
    return importlib.import_module("." + module_name, self.__name__)
  File "E:\ProgramData\anaconda3\envs\CRM\lib\importlib\__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d.py", line 24, in <module>
    from .unet_2d_blocks import UNetMidBlock2D, get_down_block, get_up_block
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 23, in <module>
    from .attention import AdaGroupNorm
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention.py", line 22, in <module>
    from .attention_processor import Attention
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention_processor.py", line 31, in <module>
    import xformers
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\__init__.py", line 12, in <module>
    from .checkpoint import (  # noqa: E402, F401
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 437, in <module>
    class SelectiveCheckpointWrapper(ActivationWrapper):
  File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 449, in SelectiveCheckpointWrapper
    @torch.compiler.disable
AttributeError: module 'torch' has no attribute 'compiler'

thu-ml / crm Goto Github PK

crm's Issues

Recommend Projects

Recommend Topics

Recommend Org