Giter Site home page Giter Site logo

eend_pytorch's Introduction

EEND_PyTorch

A PyTorch implementation of End-to-End Neural Diarization.

This repo is largely based on the original chainer implementation EEND by Hitachi Ltd., who holds the copyright.

This repo only includes the training/inferring part. If you are looking for data preparation, please refer to the original authors' repo.

Note

Only Transformer model with PIT loss is implemented here. And I can only assure the main pipeline is correct. Some side stuffs (such as save_attn_weight, BLSTM model, deep clustering loss, etc.) are either not implemented correctly or not implemented.

Actually the orignal chainer code reserves the pytorch interface, I may consider make a merge request after the code is well-polished.

Run

  1. Prepare your kaldi-style data and modify run.sh according to your own directories.
  2. Check configuration file. The default conf/large/train.yaml configuration uses a 4 layer Transformer with 100k warmsteps, which is different from their paper in ASRU2019. This configuration comes from their paper submitted to TASLP. As larger model yeilds better performance.
  3. ./run.sh

Pretrained Models

Pretrained models are offerred here.

model_simu.th is trained on simulation data (beta=2), and model_callhome.th is adapted on callhome data. They are all 4-layer Transformer models trained with conf/large/train.yaml.

Results

We miss the SwitchBoard Phase 1 for training data, so the results can be a little worse.

Type Transformer Layer Noam Warmup Steps DER on simu DER on callhome
Chainer (ASRU2019) 2 25k 7.36 12.50
Chainer (TASLP) 4 100k 4.56 9.54
Chainer (run on our data) 2 25k 9.78 14.85
PyTorch (epoch 50 on simu) 2 25k 10.14 15.72
PyTorch 4 100k 6.76 11.21
PyTorch* 4 100k - 9.35

(* run on full training data, credit to my great colleague!)

Citation

Cite their great papers!

@inproceedings={fujita2019endtoend2,
    title={End-to-End Neural Speaker Diarization with Permutation-Free Objectives},
    author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Nagamatsu, Kenji and Watanabe, Shinji},
    booktitle={INTERSPEECH},
    year={2019},
    pages={4300--4304},
}
@inproceedings={fujita2019endtoend,
    title={End-to-End Neural Speaker Diarization with Self-Attention},
    author={Fujita, Yusuke and Kanda, Naoyuki and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji and Watanabe, Shinji},
    booktitle={IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
    pages={296--303},
    year={2019},
}
@article={fujita2020endtoend,
    title={End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification},
    author={Fujita, Yusuke and Watanabe, Shinji and Horiguchi, Shota and Xue, Yawen and Nagamatsu, Kenji},
    journal={arXiv:2003.02966},
    year={2020},
}

eend_pytorch's People

Contributors

xflick avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.