Giter Site home page Giter Site logo

robust-diffusion's Introduction

Source code for project report "On The Robustness of Diffusion-Based Text-to-Image Generation" in CV-2022-Fall.

Members: Liang Chen, Zhe Yang, Zheng Li

Model Training

cd ModelTraining

Data Preparing

Before doing this experiment, please download images of MSCOCO 2017 dataset from https://cocodataset.org/#download

Environment Setting

We use the same environment as stable-diffusion https://github.com/CompVis/stable-diffusion A suitable conda environment named ldm can be created and activated with:

conda env create -f environment.yaml
conda activate ldm

Training

Specify which GPU (or GPUs) you want to use to train the model with:

accelerate config

Set proper hyper-parameters in tune.sh. Do remember to set train_data_dir to the directory of your training set ! Training the model with:

bash tune.sh

If you want to use text augment methods like back translation,text crop and swap, you can add this augment to tune.sh:

--text_augment="bt" or --text_augment="crop_swap"

If you want to use text interpolation augment method to train the model, you can first run python encode_text.py to generate interpolation text vectors, and then add --text_embed_dir="./text_embed_linear_p_beta1_n5.bin"to tune.sh

inference

After training, we can generate images with trained model under the control of the texts in test set. You can generate images with:

bash generate.sh

Before generating images, set proper hyper-parameters in generate.sh: --model_name is the name of directory of your trained model; --output_dir is the directory of generated images.

Interpolation

cd Interpolation

Text Data Augmentation Method Implementation

Hidden States Interpolation

Note that this method needs the hidden states of text after clip encoder.

python HiddenStatesInterpolation.py

Other Augmentation Method(Random Deleting and Back-Translation)

we provide a jupyter notebook, please run Interpolation.ipynb

RobustnessAnalysis

cd RobustnessAnalysis
  1. The first similarity "Image Similarity among random seeds " is in "Similarity_with_seed.ipynb", and it's similarity between the images generated from the same texts but with different seeds.
  2. The second similarity "Similarity2: within similar texts" is in "similarity.ipynb", and its Chinese name is"组内相似度".
  3. The third similarity "Faithfulness : between image and text" is in "text_img_similarity.ipynb", and it's similarity between the images and the texts.

robust-diffusion's People

Contributors

chenllliang avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.