Giter Site home page Giter Site logo

tiger's Introduction

TIGER: A Unified Generative Model Framework for Multimodal Dialogue Response Generation

paper YouTube

๐Ÿ“ข Latest Updates

  • 2024 May-18 : Our paper is available at here.
  • 2024 Feb-20 : This work has been accepted by COLING 2024.
  • 2023 Oct-13 : Updated the demo interface.
  • 2023 Aug-02 : Released the demo video. [YouTube]

TIGER Framework ๐Ÿ’ก

model

Figure 1: The overview of TIGER. Given the dialogue context, response modal predictor $\mathcal{M}$ determines the timing to respond with images. If the predicted response modal is text, textual dialogue response generator $\mathcal{G}$ generates the text response. Conversely, $\mathcal{G}$ produces an image description, and Text-to-Image translator $\mathcal{F}$ leverages this description to generate an image as the visual response.

Contributions ๐Ÿ†

  • We propose mulTImodal GEnerator for dialogue Response (TIGER), a unified generative model framework designed for multimodal dialogue response generation. Notably, this framework is capable of handling conversations involving any combination of modalities.
  • We implement a system for multimodal dialogue response generation, incorporating both text and images, based on TIGER.
  • Extensive experiments show that TIGER achieves new state-of-the-art results on both automatic and human evaluations, which validate the effectiveness of our system in providing a superior multimodal conversational experience.

Demo ๐ŸŒ

demo

โ˜ We implemented a multimodal dialogue system based on TIGER, as depicted in figure above.

Our system offers various modifiable components:

  • For the textual dialogue response generator, users can can choose decoding strategies and adjust related parameters.
  • For the Text-to-Image translator, users can freely modify prompt templates and negative prompt to suit different requirements. Default prompt templates and negative prompts are provided, enhancing the realism of generated images.

โ—Note: It's worth mentioning that our research focuses on open-domain multimodal dialogue response generation. However, the system may not possess perfect instruction-following capabilities. Users can treat it as a companion or listener, but using it as a QA system or AI painting generator is not recommended.

Examples ๐Ÿชง

conv1 conv2
conv3 conv4

Supplementary Instructions ๐Ÿ”

Restricted by the limited number of pages, we only give a clear and easy-to-understand introduction of our method in the paper. More implementation details, experimental results and discussions can be found in supplement.

Getting Start โณ

Hardware โš™๏ธ

โญ A GPU with 24GB memory (18GB at runtime) is enough for the demo.

Installation ๐Ÿ”ง

1. Prepare the code and the environment

cd TIGER/
conda env create -f environment.yml
conda activate tiger

2. Prepare the model weights

โœจ Please download our model weights from here (Google Drive). For Text-to-Image Translator's weights, we have already uploaded it to Hugging Face, so you don't need to download it locally now. More details can be sourced from friedrichor/stable-diffusion-2-1-realistic.

The final weights would be in a single folder in a structure similar to the following:

TIGER
โ”œโ”€โ”€ demo
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ model_weights
โ”‚   โ”œโ”€โ”€ tiger_response_modal_predictor.pth
โ”‚   โ”œโ”€โ”€ tiger_textual_dialogue_response_generator.pth
โ”‚   โ””โ”€โ”€ tiger_text2image_translator
โ”‚       โ”œโ”€โ”€ feature_extractor
โ”‚       โ”‚   โ””โ”€โ”€ preprocessor_config.json
โ”‚       โ”œโ”€โ”€ scheduler
โ”‚       โ”‚   โ””โ”€โ”€ scheduler_config.json
โ”‚       โ”œโ”€โ”€ text_encoder
โ”‚       โ”‚   โ”œโ”€โ”€ config.json
โ”‚       โ”‚   โ””โ”€โ”€ pytorch_model.bin
โ”‚       โ”œโ”€โ”€ tokenizer
โ”‚       โ”œโ”€โ”€ merges.txt
โ”‚       โ”œโ”€โ”€ special_tokens_map.json
โ”‚       โ”œโ”€โ”€ tokenizer_config.json
โ”‚       โ”‚   โ””โ”€โ”€ vocab.json
โ”‚       โ”œโ”€โ”€ unet
โ”‚       โ”‚   โ”œโ”€โ”€ config.json
โ”‚       โ”‚   โ””โ”€โ”€ diffusion_pytorch_model.bin
โ”‚       โ”œโ”€โ”€ vae
โ”‚       โ”‚   โ”œโ”€โ”€ config.json
โ”‚       โ”‚   โ””โ”€โ”€ diffusion_pytorch_model.bin
โ”‚       โ””โ”€โ”€ model_index.json
โ”œโ”€โ”€ tiger
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ utils
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ demo.py
...

Launching Demo Locally ๐Ÿ’ป

python demo.py --config demo/demo_config.yaml

Citation

If you find our work useful in your research, please consider citing us:

@inproceedings{kong-etal-2024-tiger-unified,
    title = "{TIGER}: A Unified Generative Model Framework for Multimodal Dialogue Response Generation",
    author = "Kong, Fanheng  and
      Wang, Peidong  and
      Feng, Shi  and
      Wang, Daling  and
      Zhang, Yifei",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italy",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1403",
    pages = "16135--16141",
}

tiger's People

Contributors

friedrichor avatar

Stargazers

 avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.