Giter Site home page Giter Site logo

baochi0212 / misca Goto Github PK

View Code? Open in Web Editor NEW

This project forked from vinairesearch/misca

0.0 0.0 0.0 2.67 MB

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention (EMNLP 2023 - Findings)

License: GNU Affero General Public License v3.0

Python 100.00%

misca's Introduction

MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention

We propose a joint model named MISCA for multi-intent detection and slot filling. Our MISCA introduces an intent-slot co-attention mechanism and an underlying layer of label attention mechanism. These mechanisms enable MISCA to effectively capture correlations between intents and slot labels, eliminating the need for graph construction. They also facilitate the transfer of correlation information in both directions: from intents to slots and from slots to intents, through multiple levels of label-specific representations, without relying on token-level intent information. Experimental results show that MISCA outperforms previous models, achieving new state-of-the-art overall accuracies of 59.1% on MixATIS and 86.2% on MixSNIPS.

model

Please CITE our paper whenever our MISCA implementation is used to help produce published results or incorporated into other software.

@inproceedings{MISCA,
    title     = {{MISCA: A Joint Model for Multiple Intent Detection and Slot Filling with Intent-Slot Co-Attention}},
    author    = {Thinh Pham and Chi Tran and Dat Quoc Nguyen},
    booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2023},
    year      = {2023},
    pages     = {12641โ€“-12650}
}

Model installation, training and evaluation

Installation

  • Python version >= 3.8
  • PyTorch version >= 1.8.0
    git clone https://github.com/VinAIResearch/MISCA.git
    cd MISCA/
    pip3 install -r requirements.txt

Training and evaluation

In our experiments, we train the base model without coattention first and initialize MISCA with this base model. To train the base model, you can run the experiments by the following command.

python main.py --token_level word-level \
            --model_type roberta \
            --model_dir dir_base \
            --task <mixatis or mixsnips> \
            --data_dir data \
            --attention_mode label \
            --do_train \
            --do_eval \
            --num_train_epochs 100 \
            --intent_loss_coef <lambda> \
            --learning_rate 1e-5 \
            --train_batch_size 32 \
            --num_intent_detection \
            --use_crf

Then, once we have pre-trained base model, we can train MISCA by the following command:

python main.py --token_level word-level \
            --model_type roberta \
            --model_dir misca \
            --task <mixatis or mixsnips> \
            --data_dir data \
            --attention_mode label \
            --do_train \
            --do_eval \
            --num_train_epochs 100 \
            --intent_loss_coef <lambda> \
            --learning_rate 1e-5 \
            --num_intent_detection \
            --use_crf \ 
            --base_model dir_base \
            --intent_slot_attn_type coattention

We also provide model checkpoints of MISCA for MixATIS and MixSNIPS. Please download the checkpoint if you want to make inference without training from scratch.

Prediction

We provide a script to predict intents and slots from utterances. To run it, please prepare a raw text file with one utterance per line, and run the following command:

python predict.py --token_level word-level \
            --model_type roberta \
            --model_dir misca \
            --task <mixatis or mixsnips> \
            --data_dir data \
            --attention_mode label \
            --num_intent_detection \
            --use_crf \
            --intent_slot_attn_type coattention \
            --input_file input_file.txt \
            --output_file output_file.txt

If you have any questions, please issue the project or email me ([email protected] or [email protected]) and we will reply soon.

Acknowledgement

Our code is based on the implementation of the JointIDSF paper from https://github.com/VinAIResearch/JointIDSF

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.