Giter Site home page Giter Site logo

pariajm / awesome-disfluency-detection Goto Github PK

View Code? Open in Web Editor NEW
70.0 4.0 5.0 121 KB

A curated list of awesome disfluency detection publications along with the released code and bibliographical information

disfluency-detection disfluency-detection-code disfluency-detection-papers disfluency-detection-paper-list awesome-list deep-disfluency-code deep-disfluency-detection speech-repairs conversational-speech-translation conversational-speech-recognition

awesome-disfluency-detection's Introduction

Awesome Disfluency Detection

A curated list of awesome disfluency detection publications along with their released code (if available) and bibliography. A chronological order of the published papers is available here.

Contributing

Please feel free to send me pull requests or email me to add a new resource.

Table of Contents

Papers

Studies on disfluency detection are categorized as follows (some papers belong to more than one category):

Noisy Channel Models

The main idea behind a noisy channel model of speech disfluency is that we assume there is a fluent source utterance x to which some noise has been added, resulting in a disfluent utterance y. Given y, the goal is to find the most likely source fluent sentence such that p(x|y) is maximized.

Sequence Tagging Models

The task of disfluency detection is framed as a word token classification problem, where each word token is classified as being disfluent/fluent or by using a begin-inside-outside (BIO) based tagging scheme.

Translation Based Models

Translation-based approaches for disfluency detection are commonly formulated as encoder-decoder systems, where the encoder learns the representation of input sentence containing disfluencies and the decoder learns to generate the underlying fluent version of the input.

Parsing Based Models

Parsing-based approaches detect disfluencies while simultaneously identifying the syntactic or semantic structure of the sentence. Training a parsing-based model requires large annotated treebanks that contain both disfluencies and syntactic/semantic structures.

Using Acoustic/Prosodic Cues

Speech signal carries extra information beyond the words which can provide useful cues for disfluency detection models. Some studies have explored integrating acoustic/prosodic cues to lexical features for detecting disfluencies.

Data Augmenatation Techniques

Disfluency detection models are usually trained and evaluated on Switchboard corpus. Switchboard is the largest disfluency annotated dataset; however, only about 6% of the words in the Switchboard are disfluent. Some studies have suggested new data augmentation techniques to mitigate the scarcity of gold disfluency-labeled data.

Incremental Disfluency Detection

Most disfluency detection models are developed based on the assumptions that a full sequence context as well as rich transcriptions including pre-segmentation information are available. These assumptions, however, are not valid in real-time scenarios where the input to the disfluency detector is live transcripts generated by a streaming ASR model. In such cases, a disfluency detector is expected to incrementally label input transcripts as it receives token-by-token data. Some studies have proposed new incremental disfluency detectors.

E2E Speech Recognition and Disfluency Removal

Most disfluency detectors are applied as an intermediate step between a speech recognition and a downstream task. Unlike the conventional pipeline models, some studies have explored end-to-end speech recoginition and disfluency removal.

E2E Speech Translation and Disfluency Removal

While most of the end-to-end speech translation studies have explored translating read speech, there are a few studies that examine the end-to-end conversational speech translation, where the task is to directly translate source disfluent speech into target fluent texts.

Others

Theses

Contact

Paria Jamshid Lou [email protected]

awesome-disfluency-detection's People

Contributors

jacob-lewis avatar pariajm avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.