Giter Site home page Giter Site logo

cv-synthesis / attention_regulation Goto Github PK

View Code? Open in Web Editor NEW

This project forked from yangzhang-v5/attention_regulation

0.0 0.0 0.0 19.01 MB

Attention Regulation on T2I Diffusion Models

Home Page: https://yangzhang-v5.github.io/attention_regulation/

License: MIT License

Python 100.00%

attention_regulation's Introduction

Attention Regulation

This repository contains the PyTorch inference code for the paper "Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models". Paper link: arXiv

Environment Setup

Clone this repository

git clone https://github.com/YaNgZhAnG-V5/attention_regulation.git
cd attention_regulation

Install the required dependencies with the supported versions

pip install -r requirements.txt

Usage

We provide a script (txt2img.py) for inference. You can use it to generate images from text using our Attention Regulation approach.

Example usage:

python txt2img.py --prompt "A painting of a bag and a apple" --target "bag apple"

The full list of options is as follows:

usage: txt2img.py [-h] --prompt PROMPT [--target TARGET] [--workdir WORKDIR] [--cuda-id CUDA_ID] [-n N] [-s STEPS] [--guidance-scale GUIDANCE_SCALE] [--seed SEED] [--edit-steps EDIT_STEPS] [--layers [LAYERS ...]]
                  [--pipeline-id PIPELINE_ID] [--scheduler SCHEDULER]

options:
  -h, --help            show this help message and exit
  --prompt PROMPT       Prompt
  --target TARGET       Target phrase for editing, separated by space
  --workdir WORKDIR     Working directory
  --cuda-id CUDA_ID     CUDA device id
  -n N                  Number of images to generate per prompt
  -s STEPS, --steps STEPS
                        Number of inference steps
  --guidance-scale GUIDANCE_SCALE
                        Guidance scale
  --seed SEED           Random seed
  --edit-steps EDIT_STEPS
                        Number of edit steps
  --layers [LAYERS ...]
                        Layers to edit. Select from: ['down_blocks.0','down_blocks.1','down_blocks.2', 'mid_block', 'up_blocks.1', 'up_blocks.2', 'up_blocks.3']
  --pipeline-id PIPELINE_ID
                        Pipeline ID from Diffusers. We support SD 1.4 SD 1.5 SD 2 and SD 2.5
  --scheduler SCHEDULER
                        Scheduler to use

Acknowledgements

If you find our work useful for your work, please consider citing our paper:

@misc{zhang2024enhancing,
      title={Enhancing Semantic Fidelity in Text-to-Image Synthesis: Attention Regulation in Diffusion Models}, 
      author={Yang Zhang and Teoh Tze Tzun and Lim Wei Hern and Tiviatis Sim and Kenji Kawaguchi},
      year={2024},
      eprint={2403.06381},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

attention_regulation's People

Contributors

joseph31416 avatar nrehiew avatar yangzhang-v5 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.