Giter Site home page Giter Site logo

postech-ami / paint-it Goto Github PK

View Code? Open in Web Editor NEW
160.0 16.0 6.0 10.54 MB

[CVPR'24] Official PyTorch Implementation of "Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering"

Home Page: https://kim-youwang.github.io/paint-it

License: MIT License

Python 61.42% Cuda 19.00% C 5.13% C++ 14.45%

paint-it's Introduction

Paint-it (CVPR 2024)

This repository contains the official implementation of the CVPR 2024 paper, "🎨 Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering".

teaser_camready

Highlights

Paint-it is a text-driven high-quality PBR texture map synthesis method.

🌟 Our texture maps are ready for practical use in popular graphics engines like Blender and Unity, thanks to our Physics-based Rendering (PBR) parameterization, which includes diffuse, roughness, metalness, and normal information.

🎨 With our approach, the resulting texture maps are not only of superior quality but also offer the flexibility of relighting and material editing.

🔥 We've achieved impressive results without modifying the well-known Score-Distillation Sampling (SDS), instead focusing on optimizing variables through our texture map parameterization.

🔊 While many researchers are working on denoising the gradients from SDS, our work leverages the power of architectural bias, specifically Deep Image Prior, to robustly learn from noisy SDS gradients, even when dealing with PBR representations.

Getting started

This code was developed on Ubuntu 18.04 with Python 3.8, CUDA 11.3 and PyTorch 1.12.0, using NVIDIA RTX A6000 (48GB) GPU. Later versions should work, but have not been tested.

Environment setup

conda create -n paint_it python=3.8
conda activate paint_it

# pytorch installation
pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113

# for pytorch3d installation
conda install -c fvcore -c iopath -c conda-forge fvcore iopath
# for python3.8, cuda 11.3, pytorch 1.12 (py38_cu113_pyt1120) -> need to install pytorch3d-0.7.2 
pip install --no-index --no-cache-dir pytorch3d -f https://dl.fbaipublicfiles.com/pytorch3d/packaging/wheels/py38_cu113_pyt1120/download.html

pip install git+https://github.com/NVlabs/nvdiffrast/
pip install diffusers==0.12.1 huggingface-hub==0.11.1 transformers==4.21.1 sentence-transformers==2.2.2
pip install PyOpenGL PyOpenGL_accelerate accelerate rich ninja scipy trimesh imageio matplotlib chumpy opencv-python
pip install numpy==1.23.1

Preparing 3D mesh data

Currently, this repository contains a sample mesh from Objaverse dataset. To download a subset of Objaverse, you can refer to the scripts provided here.

Generate PBR texture maps for 3D mesh

Given a 3D mesh in .obj format and the text prompt, you can run below code to generate PBR texture maps.

# Generate PBR textures for .obj meshes
python paint_it.py

When generating PBR texture maps for a subset of Objaverse meshes, you can modify below dictionary (paint_it.py, L294) to handle multiple mesh object IDs and corresponding text prompts.

mesh_dicts = {
    '9ce8ab24383c4c93b4c1c7c3848abc52': 'a pretzel',
}

Coming soon

  • Code release for general 3D object meshes (Objaverse)
  • Code release for 3D human meshes
  • Improved diffusion guidance (VSD, Stable-Diffusion XL, ControlNet-Depth, etc...)

Citation

If you find our code or paper helps, please consider citing:

@inproceedings{youwang2024paintit,
    title = {Paint-it: Text-to-Texture Synthesis via Deep Convolutional Texture Map Optimization and Physically-Based Rendering},
    author = {Youwang, Kim and Oh, Tae-Hyun and Pons-Moll, Gerard},
    booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2024}
}

Contact

Kim Youwang ([email protected])

Acknowledgement

We thank the members of AMILab and RVH group for their helpful discussions and proofreading.

The implementation of Paint-it is largely inspired and fine-tuned from the seminal projects. We would like to express our sincere gratitude to the authors for making their code public.

The project was made possible by funding from the Carl Zeiss Foundation. This work is funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - 409792180 (Emmy Noether Programme, project: Real Virtual Humans), and the German Federal Ministry of Education and Research (BMBF): Tübingen AI Center, FKZ: 01IS18039A. Gerard Pons-Moll is a Professor at the University of Tübingen endowed by the Carl Zeiss Foundation, at the Department of Computer Science and a member of the Machine Learning Cluster of Excellence, EXC number 2064/1 – Project number 390727645. Kim Youwang and Tae-Hyun Oh were supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) (No.RS-2023-00225630, Development of Artificial Intelligence for Text-based 3D Movie Generation; No.2022-0-00290, Visual Intelligence for SpaceTime Understanding and Generation based on Multi-layered Visual Common Sense; No.2021-0-02068, Artificial Intelligence Innovation Hub).

paint-it's People

Contributors

youwang-kim avatar

Stargazers

Junhyeong Cho avatar Yuan Shen avatar  avatar Hyeontae Son avatar Yunhan Yang avatar Yannan He avatar takuya_goto avatar Euron ZC avatar Yu Zihao avatar XingyuRen avatar  avatar  avatar  avatar  avatar  avatar jay vaghasiya avatar Fan Fei avatar  avatar Chipaang Wong avatar Daniel123 avatar  avatar Garimella Hari Pawan Kishore avatar tjnuwjm avatar Hengkai Guo avatar cjeen avatar Suho Noh avatar Min Hyeok Lee avatar coolcoolのyisuanwang avatar Weny He avatar Pranav Manu avatar Yujin Chen avatar  avatar Meimingwei Li avatar jET avatar YANHONG ZENG avatar Basma Elhoseny avatar Mohamed Shawky avatar Yuseung (Phillip) Lee avatar  avatar Nymph avatar  avatar  avatar  avatar  avatar Jing Yang avatar Minseok avatar  avatar Reza Armandpour avatar Yuan-Man avatar Jorge Bustos avatar LiangChao avatar Bruce Fan avatar Yongjia Ma avatar Alvin avatar Dana Cohen-Bar avatar dengchcs avatar Liu Ziling avatar Stevezanto avatar Li Jie avatar Zhuojiang Cai avatar rodrigo figueroa avatar Zeyi Sun avatar SeungWoo Yoo avatar  avatar lan avatar  avatar Hyoseok Lee avatar  avatar  avatar cloud avatar  avatar Fan-Yun Sun avatar  avatar Vidhi Waghela avatar Ng Kam Woh avatar Yuxuan Xue avatar Jing Yang avatar Raphaël avatar Vanessa Sklyarova avatar Amir Sarfi avatar Sauradip Nag avatar Yuliang Xiu avatar  avatar Zeren Xiong avatar Divano avatar  avatar Gero Doll avatar Ecneics avatar sumyyyyy avatar zhanghe avatar Yang Fu avatar ZonkyPop avatar  avatar Janne Wolterbeek avatar Konrad Gnat avatar George Adamopoulos avatar  avatar John H avatar Jeff Carpenter avatar Jonathan Clark avatar

Watchers

 avatar PeterZs avatar kevin zhou avatar Phong Nguyen Ha avatar Snow avatar  avatar Yu Zihao avatar Hyun avatar  avatar KIHONG KIM avatar ZonkyPop avatar Shaune Jan avatar Francesco Fugazzi avatar  avatar  avatar  avatar

paint-it's Issues

replace the uv random noise

Hi, if I take this input uv and replace the random noise with a texture that is locally close to the 3D template model, what other parameters do I need to change to make the result more robust?

how to change the size ,if the UV size is 1024 or 2048

The DC-PBR representation uses U-Net as its architecture. Thus, if you increase the resulting texture map resolution, you would have to modify some architectural hyperparameters such as num_channels_down, num_channels_up, num_channels_skip, filter_size_up, filter_size_down. You can adjust those parameters here.

Paint-it/paint_it.py

Lines 81 to 87 in 407c55b

net = skip(input_depth, 9,
num_channels_down=[128] * 5,
num_channels_up=[128] * 5,
num_channels_skip=[128] * 5,
filter_size_up=3, filter_size_down=3,
upsample_mode='nearest', filter_skip_size=1,
need_sigmoid=True, need_bias=True, pad='reflection', act_fun='LeakyReLU').type(torch.cuda.FloatTensor)

Please let us know if you have more questions.
Thanks.

Originally posted by @Youwang-Kim in #8 (comment)

Will there be any issues increasing the resolution of the texture

So far I haven't tried this for this technique, but so far the biggest issue I've run into with text to texture is not being able to generate fine enough textures (detailed 2048x2048 or larger for example). I expect I'll run into memory issues, but is there any fundamental limitation? Do you see any issues with this using paint-it?

Code human meshes?

Hello,
Great work!

When is the expected release date for the code related to 3D human meshes?

Thanks.

CODE?

Would love to try this, do you plan on releasing the code?

Some questions about Text-to-Texture task

Very nice work!
Resolution: What's the resolution of UV map? When I zoom in, I observe some blurry results. Is this from low resolution UV map or low power of Stable Diffusion v1.5 or others (I cannot find what model used in the paper)? I think SDXL can improve a lot so I'm looking forward to the code.
SDS vs TEXTure: Is SDS a good choice for Text-to-Texture task? Although the quality of official TEXTure is not good enough, this text-to-image-to-texture method have been optimized in industry such as Meshy. It can generate very high quailty texture in 2 minutes. In contrast, SDS needs 15-30 min using A6000. In my experiments, Fantasia3D+SDXL can generate better results than Meshy in some cases but it takes 36 min using A6000.

how I can get the normal texture with the ks,kd

When you import the mesh and the generated texture maps in Blender, you have to define the same shading pipeline as NVDiffrast (that we used to render & train our texture).

You can refer to this #5 (comment). Here, you can find the Blender Python script to import your mesh and texture, similar to the NVDiffrast.

Please let us know if you have more questions.
Thanks.

Originally posted by @Youwang-Kim in #7 (comment)

how to get the depth image

Hi, now I want to use the depth controlnet as a guide, so how can I get the depth form the render material, I want to get such image as it
iter_100_depth

Running out of VRAM

I'm trying to run the code on a cloud machine with an NVIDIA A10 gpu (24gb vram) and it is getting the CUDA out of memory error. Do you have any suggestions for running this with less gpu memory usage?

Here is the full error:
Traceback (most recent call last): File "/app/paint-it/paint_it.py", line 320, in <module> main(args, guidance) File "/app/paint-it/paint_it.py", line 226, in main sd_loss = guidance.batch_train_step(text_embedding, obj_image, File "/app/paint-it/sd.py", line 135, in batch_train_step noise_pred = self.unet(latent_model_input, tt, encoder_hidden_states=text_embeddings).sample File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unets/unet_2d_condition.py", line 1121, in forward sample, res_samples = downsample_block( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unets/unet_2d_blocks.py", line 1199, in forward hidden_states = attn( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/transformers/transformer_2d.py", line 391, in forward hidden_states = block( File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 400, in forward ff_output = self.ff(norm_hidden_states, scale=lora_scale) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/attention.py", line 672, in forward hidden_states = module(hidden_states, scale) File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/usr/local/lib/python3.8/dist-packages/diffusers/models/activations.py", line 103, in forward return hidden_states * self.gelu(gate) torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 160.00 MiB (GPU 0; 23.73 GiB total capacity; 20.89 GiB already allocated; 53.62 MiB free; 21.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Viewing in Blender?

Although the results look great with the nvdiff_renderer, not so much inside Blender. Greens appearing as florescent etc... I suspect this has to do with enabling the correct options inside blender or perhaps it isn't using one of the textures. Anybody else seen this and know how to fix it?

Default fp16 inference and potentially missing arguments

Hi, thank you for releasing the code. I have a few issues encountered during testing.
One is that, could you please cast the model weight and input to fp16 by default so the model will work on consumer GPUs with around 20 GB memory? It will be more friendly for users without NV AI cards.
Another thing is that seems that some arguments are not set when running paint_it.py, such as objaverse_id (is it the same thing as obj_id), and the learn_lights argument. May I ask if you have tested your program by purely installing from the git repo, especially when some files are excluded from uploading.

About generated textures

Hi, could you please explain how the rgb channels in Ks texture should be interpreted? If I understand the paper correctly, there are roughness and metallic in that texture, but I am not sure which channels stores which property. In addition, are there any processing needed, before these textures can be used as albedo, roughness and metallic input for shaders in common softwares like Blender or Unity?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.