Giter Site home page Giter Site logo

tianwen-fu / shape-guided-diffusion Goto Github PK

View Code? Open in Web Editor NEW

This project forked from shape-guided-diffusion/shape-guided-diffusion

0.0 0.0 0.0 7.13 MB

Home Page: https://shape-guided-diffusion.github.io

Shell 0.01% Python 5.94% Jupyter Notebook 94.05%

shape-guided-diffusion's Introduction

Shape-Guided Diffusion with Inside-Outside Attention

This repository contains the code and data accompanying the paper Shape-Guided Diffusion with Inside-Outside Attention. The code implements Shape-Guided Diffusion, a training-free method which produces shape-faithful, text-aligned, realistic objects, by using a novel Inside-Outside Attention mechanism to align the generated content with the target silhouette.

teaser

Setup

This code was tested with Python 3.8, Pytorch 1.12 using a pretrained Stable Diffusion from Hugging Face / Diffusers. To install the necessary packages, please run:

conda env create -f environment.yaml
conda activate shape-guided-diffusion
pip install git+https://github.com/facebookresearch/detectron2.git@d1e04565d3bec8719335b88be9e9b961bf3ec464

Getting Started

To get started, we recommend running the notebook: shape-guided-diffusion. The notebook contains examples of using our method for diverse applications including mask-based inside editing, outside editing, or both.

Evaluating Results

To compute mIOU, FID, and CLIP scores run ./evaluate.sh.

export SRC={path to folder with synthetic images}
export REF={path to folder with real images}
export META={path to mscoco_shape_prompts/val.json OR mscoco_shape_prompts/test.json}
./evaluate.sh

MS-COCO ShapePrompts

We also provide our MS-COCO ShapePrompts benchmark in the same json format as MS-COCO 2017 instance segmentations (i.e. object masks). Each json file contains a subset of the MS-COCO source file where the object area is between [2%, 50%] of the image area. For overall statistics, please refer to the table below:

File Name MS-COCO Source File Number of Annotations
mscoco_shape_prompts/val.json instances_train2017.json 1,000
mscoco_shape_prompts/test.json instances_val2017.json 1,149

The json files are structured as follows:

  • images
  • categories
  • annotations
    • segmentation: Segmentation / object mask in RLE format.
    • area: Area of the object.
    • iscrowd: Binary indicator whether the instance is a crowd.
    • image_id: ID of the source image that corresponds to an id in images.
    • bbox: Coordinates of the bounding box of the object.
    • category_id: ID of the object class that corresponds to an id in categories, where the string name of the object class corresponds to the source prompt.
    • id: ID of the annotation / object.
    • text: A prompt describing an edit to the object, corresponding to the edit prompt.

The keys contained in the json files are identical to those in the source values, except for the added text field in the annotations list. As such these files can be used with standard MS-COCO dataloaders. To convert segmentation in RLE format into a standard mask, you can use the following code:

from pycocotools import mask
from PIL import Image

def get_segm(instances, idx):
  ann = instances["annotations"][idx]
  segm = ann["segmentation"]
  
  image_id_mapping = {image["id"]: image for image in instances["images"]}
  image_meta = image_id_mapping[ann["image_id"]]
  h, w = image_meta["height"], image_meta["width"]

  rles = mask.frPyObjects(segm, h, w)
  if type(rles) is dict:
      rles = [rles]
  rle = mask.merge(rles)
  segm = mask.decode(rle)
  segm = segm * 255
  segm = Image.fromarray(segm)
  return segm

Citing

@article{park2022shape,
  author    = {Park, Dong Huk and Luo, Grace and Toste, Clayton and Azadi, Samaneh and Liu, Xihui and Karalashvili, Maka and Rohrbach, Anna and Darrell, Trevor},
  title     = {Shape-Guided Diffusion with Inside-Outside Attention},
  journal   = {arXiv},
  year      = {2022},
}

shape-guided-diffusion's People

Contributors

g-luo avatar seth-park avatar tianwen-fu avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.