This repository contains the code and data accompanying the paper Shape-Guided Diffusion with Inside-Outside Attention. The code implements Shape-Guided Diffusion, a training-free method which produces shape-faithful, text-aligned, realistic objects, by using a novel Inside-Outside Attention mechanism to align the generated content with the target silhouette.
This code was tested with Python 3.8, Pytorch 1.12 using a pretrained Stable Diffusion from Hugging Face / Diffusers. To install the necessary packages, please run:
conda env create -f environment.yaml
conda activate shape-guided-diffusion
pip install git+https://github.com/facebookresearch/detectron2.git@d1e04565d3bec8719335b88be9e9b961bf3ec464
To get started, we recommend running the notebook: shape-guided-diffusion. The notebook contains examples of using our method for diverse applications including mask-based inside editing, outside editing, or both.
To compute mIOU, FID, and CLIP scores run ./evaluate.sh
.
export SRC={path to folder with synthetic images}
export REF={path to folder with real images}
export META={path to mscoco_shape_prompts/val.json OR mscoco_shape_prompts/test.json}
./evaluate.sh
We also provide our MS-COCO ShapePrompts benchmark in the same json format as MS-COCO 2017 instance segmentations (i.e. object masks). Each json file contains a subset of the MS-COCO source file where the object area is between [2%, 50%] of the image area. For overall statistics, please refer to the table below:
File Name | MS-COCO Source File | Number of Annotations |
---|---|---|
mscoco_shape_prompts/val.json | instances_train2017.json | 1,000 |
mscoco_shape_prompts/test.json | instances_val2017.json | 1,149 |
The json files are structured as follows:
- images
- categories
- annotations
- segmentation: Segmentation / object mask in RLE format.
- area: Area of the object.
- iscrowd: Binary indicator whether the instance is a crowd.
- image_id: ID of the source image that corresponds to an
id
inimages
. - bbox: Coordinates of the bounding box of the object.
- category_id: ID of the object class that corresponds to an
id
incategories
, where the string name of the object class corresponds to the source prompt. - id: ID of the annotation / object.
- text: A prompt describing an edit to the object, corresponding to the edit prompt.
The keys contained in the json files are identical to those in the source values, except for the added text
field in the annotations
list. As such these files can be used with standard MS-COCO dataloaders. To convert segmentation
in RLE format into a standard mask, you can use the following code:
from pycocotools import mask
from PIL import Image
def get_segm(instances, idx):
ann = instances["annotations"][idx]
segm = ann["segmentation"]
image_id_mapping = {image["id"]: image for image in instances["images"]}
image_meta = image_id_mapping[ann["image_id"]]
h, w = image_meta["height"], image_meta["width"]
rles = mask.frPyObjects(segm, h, w)
if type(rles) is dict:
rles = [rles]
rle = mask.merge(rles)
segm = mask.decode(rle)
segm = segm * 255
segm = Image.fromarray(segm)
return segm
@article{park2022shape,
author = {Park, Dong Huk and Luo, Grace and Toste, Clayton and Azadi, Samaneh and Liu, Xihui and Karalashvili, Maka and Rohrbach, Anna and Darrell, Trevor},
title = {Shape-Guided Diffusion with Inside-Outside Attention},
journal = {arXiv},
year = {2022},
}