thu-ml / crm Goto Github PK
View Code? Open in Web Editor NEW[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
Home Page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
License: MIT License
[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.
Home Page: https://ml.cs.tsinghua.edu.cn/~zhengyi/CRM/
License: MIT License
Hi , I want to use image as guidance to generate the texture , do u have some ideas
Could you provide the training code?
I am running CRM in RTX 2080 Ti GPU with 10.76 GiB memory.
However, I am getting CUDA running out of memory error.
May I ask any ways to minimize GPU memory usage while running CRM?
Hi, thanks for your wonderful paper and the released inference code!!! I am wondering if there is any plan to release the training script. Your reply will be highly appreciated~
Outstanding Work!
I have a question regarding the performance of the VAE in your CCM Diffusion model. As far as I understand, VAE typically struggle to reconstruct precise CCM. Since the performance of the VAE sets the upper limit for the quality of the CCM Diffusion, it follows that the CCM produced by the diffusion process might not be perfectly accurate.
However, I noticed that the CRM module manages to output accurate meshes. This raises a couple of questions:
I would greatly appreciate any insights or details you could provide on these points. Thank you for your time and for sharing your work with the community.
CRM.pth: 100%|███████████████████████████████████████████████████████████████████████| 476M/476M [13:35<00:00, 583kB/s]
Traceback (most recent call last):
File "E:\project\CRM\app.py", line 129, in
model = CRM(specs).to(args.device)
File "E:\project\CRM\model\crm\model.py", line 59, in init
self.renderer = Renderer(tet_grid_size=self.tet_grid_size, camera_angle_num=self.camera_angle_num,
File "E:\project\CRM\util\renderer.py", line 15, in init
self.glctx = dr.RasterizeCudaContext()
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 177, in init
self.cpp_wrapper = _get_plugin().RasterizeCRStateWrapper(cuda_device_idx)
File "E:\project\CRM\python\lib\site-packages\nvdiffrast\torch\ops.py", line 118, in _get_plugin
torch.utils.cpp_extension.load(name=plugin_name, sources=source_paths, extra_cflags=opts, extra_cuda_cflags=opts+['-lineinfo'], extra_ldflags=ldflags, with_cuda=True, verbose=False)
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1306, in load
return _jit_compile(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1710, in _jit_compile
_write_ninja_file_and_build_library(
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1793, in _write_ninja_file_and_build_library
verify_ninja_availability()
File "E:\project\CRM\python\lib\site-packages\torch\utils\cpp_extension.py", line 1842, in verify_ninja_availability
raise RuntimeError("Ninja is required to load C++ extensions")
RuntimeError: Ninja is required to load C++ extensions
E:\project\CRM>python\python.exe -m pip install Ninja
Looking in indexes: https://mirrors.aliyun.com/pypi/simple/
Requirement already satisfied: Ninja in e:\project\crm\python\lib\site-packages (1.11.1.1)
Will be release the finetune code?
感谢作者。
我想问下,demo里面的lego-style 3d,是怎么做的,可否分享下
谢谢
Hi ,
i wanted to ask if let's say i have taken the 256x6 MV images and generated a Higher resolution MV sheet
is it possible for CRM to generate a better 3d model with more details ?
My tests ideas are :
-Regular Upscale , (CCM won't be that good probably just change resolutions no upscale , still can't figure out if the CCM are used for texturing or generating the 3d mesh ... or both )
or
-run a Tile-Based Algorithm:
first do a regular CRM image generation RGB and CCM 256x6 then upscale them as follows
Algorithm will split the input image into multiple Tiles and generate RGB and CCM for each tile , then blend them all together into one High resolution MV RGB CCM images .
the Tile code is ready and only need some modifications , it showed some great results with Depth map blending
i did some modifications to the code and the models config files and changed the size of the input tensors(array images), the generated RGB and CCM are just garbage using the regular workflow at high resolutions so i can't really tell .
what i need to know :
1-will the Decoder Works with resolutions Higher that 256x6 example 512x3,072 ? or the model is just trained on that and wont work ?
2-i read the paper multiple times , but can't understand CCM , can we skip generating those and just use RGB ? are CCM essential for Mesh generation or used just for texturing ?
3-let's say we have extremely detailed Depth maps , like 4k ultra sharp Maps even skin pores will be present... can we in anyway introduce those depth maps into the workflow of CRM ? (this one is very important)
do let me know ,and many thanks in advance , much love and respect for your work , cheers
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is None and using 5 heads.
Setting up MemoryEfficientCrossAttention. Query dim is 320, context_dim is 1024 and using 5 heads.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla-xformers' with 512 in_channels
building MemoryEfficientAttnBlock with 512 in_channels...
--- using zero snr---
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:117: RuntimeWarning: divide by zero encountered in divide
"sqrt_recip_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod))
/home/harshad/Downloads/CRM/imagedream/ldm/interface.py:120: RuntimeWarning: divide by zero encountered in divide
"sqrt_recipm1_alphas_cumprod", to_torch(np.sqrt(1.0 / alphas_cumprod - 1))
Killed
After some time, while running the code, I encountered a 'divide by zero' error. It seems to occur during a division operation, which is likely causing the program to terminate unexpectedly. I suspect there might be an issue with the calculations or data processing logic in the code. The error message indicates that a division by zero was encountered, which is mathematically undefined. Could you please provide guidance on how to resolve this issue?
hello, thank you for your brilliant work in the fast feed-forward 3d generation model!
i have tried your HuggingFace demo and it works well. I noticed that users can manipulate the random seed in the demo. but when I run your code, it seems no command arguments for the random seed. i wonder how can I set the seed properly. np? torch? torch.cuda?
thank you :)
Greets, could you share the filter list like LGM? Thanks in advance.
I'm trying to run this using Xformers 0.0.25 because I have to run latest Torch 2.2.1 which Google Colab just updated to, and xformers 0.0.24 only works with Torch 2.2.0, so installing 0.0.24 takes ~5 minutes and a Restart to downgrade to torch 2.2.0. I got it working in my app at DiffusionDeluxe.com using the recommended 0.0.24 (although keeps running out of RAM), but with 0.0.25 I'm getting this error:
Traceback (most recent call last):
File "/content/sdd_colab.py", line 46547, in run_crm
crm_model = CRM(specs).to(torch_device)
File "/content/CRM/model/crm/model.py", line 46, in __init__
self.unet2 = UNetPP(in_channels=self.dec.c_dim)
File "/content/CRM/model/archs/unet.py", line 43, in __init__
self.unet.enable_xformers_memory_efficient_attention()
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 295, in enable_xformers_memory_efficient_attention
self.set_use_memory_efficient_attention_xformers(True, attention_op)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 259, in set_use_memory_efficient_attention_xformers
fn_recursive_set_mem_eff(module)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 255, in fn_recursive_set_mem_eff
fn_recursive_set_mem_eff(child)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/modeling_utils.py", line 252, in fn_recursive_set_mem_eff
module.set_use_memory_efficient_attention_xformers(valid, attention_op)
File "/usr/local/lib/python3.10/dist-packages/diffusers/models/attention_processor.py", line 253, in set_use_memory_efficient_attention_xformers
raise ModuleNotFoundError(
ModuleNotFoundError: Refer to https://github.com/facebookresearch/xformers for more information on how to install xformers
I'm hoping you can figure out a solution to fix the breaking change to make it compatible with both, but always nice to be using the newest versions. Thanks, tried to trace the problem down myself, but didn't understand it enough.
Hi ! i want to use CCM on my own dataset. Can you tell me how to get the GT of CCM,or can you provide the relevant scripts for six orthogonal projected image of CCM?
Hi @thuwzy , does CRM support multiple GPUs for inference? I tried CUDA_VISIBLE_DIVICES=0,1
, but it does not seem to work (2*2080Ti).
显存最低要求是多少,有没有配置可以在低显存上跑?
Great work! Can you please share more details and ideally code of how do you render a render rgb image from the flexicubes geometry? I see the following code, but it only renders masks and depth. Thanks in advance!
def render_mesh(self, mesh_v_nx3, mesh_f_fx3, camera_mv_bx4x4, resolution=256, hierarchical_mask=False):
return_value = dict()
if self.render_type == 'neural_render':
tex_pos, mask, hard_mask, rast, v_pos_clip, mask_pyramid, depth = self.renderer.render_mesh(
mesh_v_nx3.unsqueeze(dim=0),
mesh_f_fx3.int(),
camera_mv_bx4x4,
mesh_v_nx3.unsqueeze(dim=0),
resolution=resolution,
device=self.device,
hierarchical_mask=hierarchical_mask
)
return_value['tex_pos'] = tex_pos
return_value['mask'] = mask
return_value['hard_mask'] = hard_mask
return_value['rast'] = rast
return_value['v_pos_clip'] = v_pos_clip
return_value['mask_pyramid'] = mask_pyramid
return_value['depth'] = depth
else:
raise NotImplementedError
return return_value
Thanks for your awesome work and contribution!
I tried to run your codes locally after downloading model checkpoints from huggingface, but I encountered a size mismatch error when doing so:
Traceback (most recent call last):
File "/CRM/local_inference.py", line 152, in <module>
pipeline = TwoStagePipeline(
File "/CRM/pipelines.py", line 31, in __init__
self.stage1_model.load_state_dict(torch.load(stage1_model_config.resume, map_location="cpu"), strict=False)
File "/envs/crm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for LatentDiffusionInterface:
size mismatch for model.diffusion_model.input_blocks.0.0.weight: copying a param with shape torch.Size([320, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([320, 4, 3, 3]).
Interestingly, when I swap the checkpoints for ccm_diffusion and pixel_diffusion, both loading and inference work pretty well. But the results are definitely not correct after swapping the checkpoints.
I have not changed anything to the codes.
hello,when I set up the environment, I ran into a problem with the error message:
"ERROR: Could not build wheels for xformers, which is required to install pyproject.toml-based projects.
how can I fix it?
请问pytorch的版本具体是什么,使用torch 2.0.1报错torch缺少compiler。
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
PyTorch 2.2.0+cu121 with CUDA 1201 (you have 2.0.1+cu117)
Python 3.10.11 (you have 3.10.13)
Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
Memory-efficient attention, SwiGLU, sparse and more won't be available.
Set XFORMERS_MORE_DETAILS=1 for more details
Traceback (most recent call last):
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\utils\import_utils.py", line 684, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "E:\ProgramData\anaconda3\envs\CRM\lib\importlib\__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d.py", line 24, in <module>
from .unet_2d_blocks import UNetMidBlock2D, get_down_block, get_up_block
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\unet_2d_blocks.py", line 23, in <module>
from .attention import AdaGroupNorm
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention.py", line 22, in <module>
from .attention_processor import Attention
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\diffusers\models\attention_processor.py", line 31, in <module>
import xformers
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\__init__.py", line 12, in <module>
from .checkpoint import ( # noqa: E402, F401
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 437, in <module>
class SelectiveCheckpointWrapper(ActivationWrapper):
File "E:\ProgramData\anaconda3\envs\CRM\lib\site-packages\xformers\checkpoint.py", line 449, in SelectiveCheckpointWrapper
@torch.compiler.disable
AttributeError: module 'torch' has no attribute 'compiler'
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.