Giter Site home page Giter Site logo

htyjers / strdiffusion Goto Github PK

View Code? Open in Web Editor NEW
41.0 2.0 5.0 19.4 MB

[CVPR 2024] Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting

License: Apache License 2.0

Python 92.88% C++ 1.01% Cuda 5.89% Shell 0.22%

strdiffusion's Introduction

StrDiffusion

This repository is the official code for the paper "Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting" by Haipeng Liu ([email protected]), Yang Wang (corresponding author: [email protected]), Biao Qian, Meng Wang, Yong Rui. CVPR 2024, Seattle, USA

Introduction

In this paper, we propose a novel structure-guided diffusion model for image inpainting (namely StrDiffusion), which reformulates the conventional texture denoising process under the guidance of the structure to derive a simplified denoising objective (Eq.11) for inpainting, while revealing: 1) the semantically sparse structure is beneficial to tackle the semantic discrepancy in the early stage, while the dense texture generates the reasonable semantics in the late stage; 2) the semantics from the unmasked regions essentially offer the time-dependent guidance for the texture denoising process, benefiting from the time-dependent sparsity of the structure semantics. For the denoising process, a structure-guided neural network is trained to estimate the simplified denoising objective by exploiting the consistency of the denoised structure between masked and unmasked regions. Besides, we devise an adaptive resampling strategy as a formal criterion on whether the structure is competent to guide the texture denoising process, while regulate their semantic correlations.

Figure 1. Illustration of the proposed StrDiffusion pipeline.

Figure 2. Illustration of the adaptive resampling strategy.

In summary, our StrDiffusion reveals:

  • the semantically sparse structure encourages the consistent semantics for the denoised results in the early stage, while the dense texture carries out the semantic generation in the late stage;
  • The semantics from the unmasked regions essenially offer the time-dependent guidance for the texture denoising process, benefiting from the time-dependent sparsity of the structure semantics.
  • We remark that whether the structure guides the texture well greatly depends on the semantic correlation between them. As inspired, an adaptive resampling strategy comes up to monitor the semantic correlation and regulate it via the resampling iteration

Dependencies

  • OS: Ubuntu 20.04.6
  • nvidia :
    • cuda: 12.3
    • cudnn: 8.5.0
  • python3
  • pytorch >= 1.13.0
  • Python packages: pip install -r requirements.txt

Train-[Structure Denoising Model]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/structure directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/structure/config/inpainting/train.py

Train-[Texture Denoising Model]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/texture directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/texture/config/inpainting/train.py

Train-[Discriminator Network]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/train/discriminator directory and modify the dataset paths in option files in /config/inpainting/options/train/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Run the following command:

Python3 ./train/discriminator/config/inpainting/train.py

Test-[StrDiffusion]

  1. Dataset Preparation:

    Download mask and image datasets, then get into the StrDiffusion/test/texture directory and modify the dataset paths in option files in /config/inpainting/options/test/ir-sde.yml

    • You can set the mask path in here
    • You can set the image path in here
  2. Pre-trained models:

    Download the pre-trained model of Places2, T=400, Places2, T=100, then get into the StrDiffusion/test/texture directory and modify the model paths in option files in /config/inpainting/options/test/ir-sde.yml

    • You can set the path of Texture Denoising Model in here
    • You can set the path of Structure Denoising Model in here
    • You can set the path of Discriminator Network in here
  3. For different T, you can set the corresponding hyperparameters of adaptive resampling strategy in here

  4. Run the following command:

Python3 ./test/texture/config/inpainting/test.py

Example Results

  • Visual comparison between our method and the competitors.

  • Visualization of the denoised results for IR-SDE and StrDiffusion during the denoising process,

Citation

If any part of our paper and repository is helpful to your work, please generously cite with:

@InProceedings{Liu_2024_CVPR,
    author    = {Liu, Haipeng and Wang, Yang and Qian, Biao and Wang, Meng and Rui, Yong},
    title     = {Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2024},
    pages     = {8038-8047}
}

This implementation is based on / inspired by:

strdiffusion's People

Contributors

eltociear avatar htyjers avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

strdiffusion's Issues

time

how long it takes you to train

train问题

今天是6月9日,我发现trian的时候gpu并没有运行,这样可以吗?
abb32d573055981a9bd5cee79db9bed

3b5b4dd321748602c5abc314e3e17dc

about test parameter and time

Hello, thank you for your contribution.
May I ask whether the parameter under sde in ir-sde.yml should follow the default T=100 or change T to 400 when testing?
When changing to T=400, I found that it took several minutes to test one image. Is this normal?(Maybe it's because under windows)

运行错误

您好,在运行train.py时出现该错误,这个问题可能是由于什么导致的?
image

dataset

Thanks to the authors for their work. Is there a link to the dataset used in this article?

train epoch

Hello, I would like to ask, during training, how many epochs do you recommend training for?

CUDA VISIBLE DEVICES=0

6971629fa112e7c01ea13c8781cd8c3
I run python3 ./train/structure/config/inpainting/train.py, but it happened it. what should i do?

插入条件

我注意到在test/texture/config/inpainting/models/modules/DenoisingUNet_arch.py有两个类:ConditionalUNet和ConditionalUNets。这两个类有什么区别?

DATASET

您好,我想知道,Train-[Structure Denoising Model],Train-[Texture Denoising Model],Train-[Discriminator Network]这三个部分的训练是采用相同情况数据集的吗?不需要将其处理为灰度图吗?

About the cuda version

Dear the authors, thanks for your wonderful work! I'm curious about whether I could run your code under other CUDA versions, for example like CUDA 11.3 and CUDA 11.8?

Really looking forward to your reply!!!^^

关于structure分支的测试

您好,对您优秀的工作非常感兴趣。有一个问题想问一下,就是我只训练structure分支,我该怎么测试呀?

结果

谢谢您如此好的文章!感谢您的耐心。
我想问一下作者跑出来的结果是results
2024-06-22 10-37-50屏幕截图
里面的哪一种,我设置为epoch=100,bachsize=2,2700张自己的数据集,感觉结果不好,作者有什么建议呢,感谢您耐心的回答,谢谢

关于mask数据集的问题

您好!我对您优秀的工作感到十分赞赏,我想跟随您的工作,请问您可以公布您论文中用于测试的mask数据集作为我后续工作的参考吗?十分感谢

有关训练时mask的使用

您好!我想请问下,您在训练模型时采用的那种策略?mask是在整个数据集上训练好得到一个模型,测试时在不同孔洞率10-20%,20%-30%等等上进行测试;还是在不同孔洞率的mask直接训练,得到对应于不同孔洞率的模型,再分别测试

train time

如果我想训练27000张图片,是不是会需要很多天,不知道多少张图片用来训练比较好?

condition add

如果我想添加一个条件特征比如canny、mask和输入一起嵌入进去,我需要修改输入维度和权重吗?

代码运行问题

作者你好,感谢您的分享!我对您的研究成果非常感兴趣,想要复现一下结果。请问您所提供的代码可以直接进行测试那一步吗,还是说要先经历三个部分的训练之后才能进行测试?

Baidu cloud link

Hello, Thank you for your great work.

I would like to try your pretrained model.
but unfortunately, I am living outside china so I cannot download model weights from Baidu Cloud.
would you mind uploading the model weight to an another cloud service like google drive?

Thank you.

val image

训练多少张图片才会有val image呢,训练完了之后val imge没有,这个需要在哪里添加val路径吗

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.