Giter Site home page Giter Site logo

windvchen / diff-harmonization Goto Github PK

View Code? Open in Web Editor NEW
90.0 8.0 6.0 322.56 MB

A novel zero-shot image harmonization method based on Diffusion Model Prior.

License: Apache License 2.0

Python 100.00%
diffusion-models image-harmonization image-to-image-translation optimization textual-inversion zero-shot-learning image-composition

diff-harmonization's Introduction

DiffHarmon's preface

Watch the video

Share us a ⭐ if this repo does help

This is the official repository of Diff-Harmonization. If you encounter any question, please feel free to contact us. You can create an issue or just send email to me [email protected]. Also welcome for any idea exchange and discussion.

BTW: You may wish to pay attention to our another work 😊INR-Harmonization. It is the first dense pixel-to-pixel method applicable to high-resolution (~6K) images without any hand-crafted filter design, based on Implicit Neural Representation,.

Updates

[03/10/2024] Release the version 2 of our paper (access it from here, previous paper can still be accessed from here), together with the code! 🧐🧐 In this new version, we mainly have these updates:

  • Prompts (object name and imaging conditions) can now be automatically generated by incorporating Google Genmini VLM into our pipeline. (See Sec. 3.1).
  • Content structure is further preserved by leveraging edge maps. (See Sec. 3.2)
  • Imaging descriptions can support more than one word now. (See Sec. 3.2)
  • An automatic performance evaluation and result selection process is incorporated. (See Sec. 3.3)

[09/05/2023] Code has been publicly accessible.πŸ‘‹πŸ‘‹ We are workingπŸƒπŸƒ on further improvements to the method (see Appendix D of the paper) to provide a better user experience, so stay tuned for more updates.

[07/18/2023] Repository init.

TODO

  • Code release
  • Support multiple words' description.
  • Automate the generation of the initial environmental text
  • Further improve the content preservation

Possible future work (See Limitation of the paper-v2):

  • Speed up

Table of Contents

Abstract

DiffHarmon's framework

We propose a zero-shot approach to image harmonization, aiming to overcome the reliance on large amounts of synthetic composite images in existing methods. These methods, while showing promising results, involve significant training expenses and often struggle with generalization to unseen images. To this end, we introduce a fully modularized framework inspired by human behavior. Leveraging the reasoning capabilities of recent foundation models in language and vision, our approach comprises three main stages. Initially, we employ a pretrained vision-language model (VLM) to generate descriptions for the composite image. Subsequently, these descriptions guide the foreground harmonization direction of a text-to-image generative model (T2I). We refine text embeddings for enhanced representation of imaging conditions and employ self-attention and edge maps for structure preservation. Following each harmonization iteration, an evaluator determines whether to conclude or modify the harmonization direction. The resulting framework, mirroring human behavior, achieves harmonious results without the need for extensive training. We present compelling visual results across diverse scenes and objects, along with a user study validating the effectiveness of our approach.

Requirements

  1. Hardware Requirements

    • GPU: 1x high-end NVIDIA GPU with at least 20GB memory
  2. Software Requirements

    • Python: 3.9 or above
    • CUDA: 11.3
    • cuDNN: 8.4.1

    To install other requirements, please check requirements.txt, or directly run the following command:

    pip install -r requirements.txt
    
  3. Data preparation

    • There have been demo data in demo, you can directly run the code below to see the results.
    • If you want to test your own data, please follow the format of the demo data. Specifically, you need to prepare composite image and mask image, and caption.
    • For automatically generating captions, please run gemini_mini_vision.py. Remember to modify the variables like api_key, images_root, masks_root, etc., in advance.
  4. Pre-trained Models

    • We adopt Stable Diffusion 2.0 as our diffusion model, you can load the pretrained weight by setting pretrained_diffusion_path="stabilityai/stable-diffusion-2-base" in main.py.

Harmonizing

The code supports either harmonize a single image, or harmonize a bunch of images. When the harmonization loop is finished, you can manually select the best one among a number of harmonized results, or directly use the result named final_output which is automatically selected.

(Note: Since Diff-Harmonization is a Zero-Shot method, the results are not always good. If generating bad results, we recommend you to try different initial environmental text to get the best results.)

Harmonize a single image

python main.py --harmonize_iterations 10 --save_dir "./output" --is_single_image --image_path "./demo/girl_comp.jpg" --mask_path "./demo/girl_mask.jpg" --foreground_prompt "girl autumn" --background_prompt "girl winter" --pretrained_diffusion_path "stabilityai/stable-diffusion-2-base" --use_edge_map
  • --harmonize_iterations: the iterations' number of harmonization loop. This will be aligned with the results' number saved in the output directory.
  • --save_dir: the directory to save the harmonized image.
  • --is_single_image: harmonize a single image.
  • --image_path: the path of the composite image.
  • --mask_path: the path of the mask image.
  • --foreground_prompt: the prompt describing foreground environment.
  • --background_prompt: the prompt describing background environment.
  • --pretrained_diffusion_path: the path of the pretrained diffusion model.
  • --use_edge_map: whether to use edge maps to preserve structure.
  • (optional) --use_evaluator: whether automatically select image.
  • ... (Please refer to main.py for more options.)

Harmonize a bunch of images

python main.py --harmonize_iterations 10 --save_dir "./output" --images_root "./demo/composite" --mask_path "./demo/mask" --caption_txt "./demo/caption.txt" --pretrained_diffusion_path "stabilityai/stable-diffusion-2-base" --use_edge_map
  • --harmonize_iterations: the iterations' number of harmonization loop. This will be aligned with the results' number saved in the output directory.
  • --save_dir: the directory to save the harmonized image.
  • --images_root: the root directory of the composite images.
  • --mask_path: the path of the mask image.
  • --caption_txt: the path of the caption file.
  • --pretrained_diffusion_path: the path of the pretrained diffusion model.
  • --use_edge_map: whether to use edge maps to preserve structure.
  • (optional) --use_evaluator: whether automatically select image.
  • ... (Please refer to main.py for more options.)

Results

Visual comparisons3
Visual comparisons3
Visual comparisons3
Visual comparisons3

Citation & Acknowledgments

If you find this paper useful in your research, please consider citing:

@article{chen2023zero,
  title={Zero-Shot Image Harmonization with Generative Model Prior},
  author={Chen, Jianqi and Zou, Zhengxia and Zhang, Yilan and Chen, Keyan and Shi, Zhenwei},
  journal={arXiv preprint arXiv:2307.08182},
  year={2023}
}

Also thanks for the open source code of Prompt-to-Prompt. Some of our codes are based on them.

License

This project is licensed under the Apache-2.0 license. See LICENSE for details.

diff-harmonization's People

Contributors

windvchen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

diff-harmonization's Issues

Stable Diffusion Version

Hi, I downloaded the Stable Diffusion pretrained models from here: https://huggingface.co/stabilityai/stable-diffusion-2-base/tree/main

However, I think that the version is slightly different, as I had to change the module lookup to the following, otherwise none of the forward functions were being swapped out:

    #if net_.__class__.__name__ == 'CrossAttention':
    if 'CrossAttn' in net_.__class__.__name__:
        net_.forward = ca_forward(net_, place_in_unet)
        return count + 1

This causes other errors in the code, so I was wondering if you could provide the link to the pre-trained stable diffusion model you are using.

Thanks!

Code Release

Hi,

Thanks for the great work and the results are impressive! Are you planning to release the code soon? If so, what's the expected timeline/date? Thanks in advance!

KeyError: 'up_cross'

I got an error KeyError: 'up_cross' ,
here is github copilot suggestion, my knowledge adbout transformer is very poor , may be it had something should be fix?
Screenshot from 2024-05-04 13-31-25

unrecognized arguments error

thanks for this aswesome work . when I run this code , I got unrecognized arguments error, what is the problem?
I had mannually download the pretrained molel in ./stabilityai/stable-diffusion-2-base/512-base-ema.safetensors

python main.py --harmonize_iterations 10 --save_dir "./output" --is_single_image --image_path "./demo/girl_comp.jpg" --mask_path "./demo/girl_mask.jpg" --foreground_prompt "girl autumn" --background_prompt "girl winter" --pretrained_diffusion_path "./stabilityai/stable-diffusion-2-base/512-base-ema.safetensors" --use_edge_map
usage: main.py [-h] [--pidinet_model PIDINET_MODEL] [--sa] [--dil] [--config CONFIG] [--evaluate EVALUATE] [--gpu GPU]
main.py: error: unrecognized arguments: --harmonize_iterations 10 --save_dir ./output --is_single_image --image_path ./demo/girl_comp.jpg --mask_path ./demo/girl_mask.jpg --foreground_prompt girl autumn --background_prompt girl winter --pretrained_diffusion_path ./stabilityai/stable-diffusion-2-base/512-base-ema.safetensors --use_edge_map

KeyError: 'up_cross'

Hi, when I do this with pretrained model :
python main.py --harmonize_iterations 10 --save_dir "./output" --is_single_image --image_path "./demo/girl_comp.jpg" --mask_path "./demo/girl_mask.jpg" --foreground_prompt "girl autumn" --background_prompt "girl winter" --pretrained_diffusion_path "stabilityai/stable-diffusion-2-base"
it says no keys 'up_cross' in attention_maps (utils.py, line 115) during first iteration. It seems attention_map is empty. Can you please help me figure out? Many thanks~

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.