Giter Site home page Giter Site logo

cross-lingual-dialog-clf's Introduction

Cross-Lingual Domain Classification of Task-Oriented Dialog (EN-FR)

Accurate identification of the domain of user commands is a crucial first step in digital voice assistant systems. Consider domain identification as a multi-class classification problem where the inputs are transcribed utterances. Supervised learning is limited by the availability of annotated datasets, an issue that is exacerbated for non-English languages.

In this project, I tackle the problem of domain classification for French, a relatively lower resource language than English. With access to a parallel annotated dataset in French, I set out to compare the performance gap between fully-supervised training in the target language and cross-lingual zero-shot transfer from the source language using massively pre-trained Transformer-based masked language models (PLMs).

The results reinforce that given the recent advancements in PLMs, domain identification is a somewhat trivial task with fully-supervised fine-tuning in the target language achieving near-perfect results (~0.98 F1) and even zero-shot transfer from the source not lagging far behind (~0.95 F1).

See Technical Report for more details on methodology and results.

Dataset

MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark (Li et al., EACL 2021) is a multi-lingual, multi-domain task-oriented dialog dataset consisting of synthetic utterances in 6 languages across 11 domains with fine-grained intent labels. Different from the authors' work, this project will focus on the more coarse-grained domain prediction. Dataset with domain labels is available on HuggingFace mteb/mtop_domain.

cross-lingual-dialog-clf's People

Contributors

anusha-pathuri avatar anupath-ds avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.