Giter Site home page Giter Site logo

bannergen's Introduction

BannerGen - A Library for Multi-Modality Banner Generation

Chia-Chih Chen*, Ning Yu*, Zeyuan Chen, Shu Zhang, Ran Xu
*Equal contribution
Salesforce Research

Introduction

Salesforce BannerGen library aims to help graphical designers

  • generate ad banners given a background image and multiple types of foreground texts
  • simpilfy workflow
  • scale produtivity
  • bring forward creative ideas

which are achieved by leveraging advanced generative AI technologies. Specifically, BannerGen is composed of three proprietary multi-modal banner generation methods in parallel, namely

Table of Contents

Library Design (blog)

Getting Started

Environment Installation

This library has been tested on Ubuntu 20.04 including Python 3.8 and PyTorch 2.1.0 environment. A single A100 GPU is employed for banner generation. Nevertheless, the peak GPU memory usage is 18GB, any NVIDIA GPU with larger memory should suffice.

To install the environment, use the following command lines:

git clone https://github.com/salesforce/BannerGen.git
cd BannerGen
conda env create -f environment.yaml
conda activate bannergen
chmod +x setup.sh
./setup.sh

Model Weights Download

You can login to your google account to download BannerGen models here. Please point banner_gen.py using --model_path to the local directory where you downloaded the models, e.g., ./weights/. The purpose of each model file can be looked up in BANNER_GEN_MODEL_MAPPERdictionary in banner_gen.py.

Usage

BannerGen targets to generate ad banners given a background image and multiple types of foreground texts. banner_gen.py serves as a demo file to illustrate how to initialize headless browser for rendering and how to import, configure, and call the two essential fuctions in each of the three banner generation methods. These two functions are load_model and generate_banners. To test a specific method simply assign --model_name and point --model_path to where you downloaded the model files. Rest of the arguments will be set to the default values and data stored in the repo ./test/ directory.

  • Test LayoutDETR
    python banner_gen.py --model_name=LayoutDETR --model_path=./weights/
    
  • Test LayoutInstructPix2Pix
    python banner_gen.py --model_name=InstructPix2Pix --model_path=./weights/
    
  • Test RetrieveAdapter
    python banner_gen.py --model_name=RetrieveAdapter --model_path=./weights/
    
  • Check the resulting banner HTML files and PNG images in ./result/. We provide the rendered banner in HTML format to facilitate further manual layout manipulation. Simultaneously, we screenshot the HTML banner and save it as a PNG image, representing the final output.

To test with your own background images and/or different types of foreground texts, simply assign image path --image_path and the corresponding text types. Here we support header text input using --header_text, body text input using --body_text, and button text input using --button_text. You can customize the number of output banners using --num_result and the output directory path using --output_path.

  • For example,
    python banner_gen.py --model_name=LayoutDETR --model_path=./weights/ \
    --image_path=test/data/example1/burning.jpg \
    --header_text='The problem with burning' \
    --body_text='Exploring the science behind combustion.' \
    --button_text='LEARN ALL ABOUT IT' \
    --num_result=6 \
    --output_path=./result/
    

Quick Result

    
Layout generation and text rendering results. Top left: LayoutDETR. Top right: LayoutInstructPix2Pix. Bottom two: Framed Template Retrieve Adapter.

License

This work refers to the Apache License 2.0. For LayoutDETR, refer to their license here. For LayoutInstructPix2Pix, refer to InstructPix2Pix's license here. We do NOT own the licenses to the fonts stored in RetrieveAdapter/templates/css/fonts. To use the fonts in your own work, please acquire the employed font licenses from the respective owners.

Citation

@article{yu2023layoutdetr,
    title={LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer},
    author={Yu, Ning and Chen, Chia-Chih and Chen, Zeyuan and Meng, Rui and Wu, Gang and Josel, Paul and Niebles, Juan Carlos and Xiong, Caiming and Xu, Ran},
    journal={arXiv preprint arXiv:2212.09877},
    year={2023}
}

Contact Us

If you have any questions, comments or suggestions, please do not hesitate to contact Ning Yu at [email protected] and Ran Xu at [email protected].

bannergen's People

Contributors

ningyu1991 avatar stanleyran avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

bannergen's Issues

Distribute Workload Across GPUs possible for Instructpix2pix ?

Describe the bug
Using 4 GPUs, each with 16GB memory. But running out of memory with Instructpix2pix model.

To Reproduce
Using the script on 16gb GPU would give the following error:

InstructPix2Pix model loaded.
Loading background image from "test/data/example1/burning.jpg"...
InstructPix2Pix bbox generation...
Instructions:  Add diverse header texts saying \"The problem with burning\" in 24 characters covering 30% area. Add diverse body texts saying \"Exploring the science behind combustion.\" in 40 characters covering 30% area. Add diverse button texts saying \"LEARN ALL ABOUT IT\" in 18 characters covering 10% area.
  0%|                                                                                                                                                                                                                                                                         | 0/25 [00:00<?, ?it/s]
Traceback (most recent call last):

...
...
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.58 GiB. GPU 0 has a total capacty of 14.75 GiB of which 1.18 GiB is free. Including non-PyTorch memory, this process has 13.57 GiB memory in use. Of the allocated memory 13.21 GiB is allocated by PyTorch, and 228.72 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior
Normal inference

Note:
Running on AWS. Other models work great, only this model runs OOM

Meaning of RetrieveAdapter

Let me ask what is the meaning of passing the background image when using RetrieveAdapter. When I check the code, it doesn't seem to be used. And the function below doesn't make any sense either

def extract_ftemp(model_supres, model_saliency, model_text, model_face, ftemp_meta_retr, img):
    img_framed = np.array(img.copy())
    saliencies = smart_crop_fast.boxes_of_saliencies_onetime(img_framed, model_saliency)
    ftemp_extr = []

    for ftemp_meta in ftemp_meta_retr:
        bg_path = os.path.join(PATH_FTEMP_BG, ftemp_meta['background'])
        bg_ftemp = cv2.imread(bg_path)
        with open(bg_path.replace('.png', '.json'), 'r') as fp:
            ftemp = json.load(fp)
        x1, y1, x2, y2 = ftemp[0]['xyxy']
        w = int(max(0, x2 - x1))
        h = int(max(0, y2 - y1))
        img_framed = img_framed[:, :, :3]
        img_crop = smart_crop_fast.smart_crop(img_framed, saliencies, w, h, True, False, False, model_supres, model_saliency)
        bg_ftemp[int(y1): int(y1+h), int(x1): int(x1+w), :3]
        # smart crop
        ftemp[0]['image'] = bg_ftemp
        ftemp_extr.append(ftemp)
    return ftemp_extr

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.