Giter Site home page Giter Site logo

lattice_rnn's Introduction

Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation

This repository is the code used in our paper:

Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation

Qiujia Li*, Preben Ness*, Anton Ragni, Mark Gales (* indicates equal contribution)

ICASSP 2019

Model

In short, this model is an extension of classical LSTM network that runs on linear sequences to BiCNRNN that runs on confusion networks (a.k.a. sausages) or BiLatRNN that runs on lattices.

This allows intermediate information produced by the speech recogniser to be fully utilised to predict not only confidence scores on one-best sequences, but on all alternative word hypotheses, which are useful in many confidence-related tasks.

The model gives significant improvement on confidence scores. Precision-recall curves are shown below.

one-best paths lattices
onebest lattice

For more details, please refer to the paper or the thesis.

Usage

Dependencies

  • python 3.6.3
  • pytorch 0.3.1
  • numpy 1.14.0
  • matplotlib 2.1.2
  • scikit-learn 0.19.1
  • tqdm 4.19.5

Commands

To train the model,

OMP_NUM_THREADS=1 python3 main.py

For detailed options,

python3 main.py --help

Note that the environment variable OMP_NUM_THREADS is essential for CPU parallelisation.

Data Pre-processing

This repository assumes a root directory with pre-processed data organised as follows:

root/
  |
  |-- data/
  |     |
  |     |-- lattices/
  |     |-- target/
  |     |   train.txt
  |     |   train_debug.txt (if in debug mode)
  |     |   cv.txt
  |     |   test.txt
  |     |   stats.npz
  |
  |-- exp/
  |     |-- (saved models)
  |     |-- ...

In the data/ directory:

  • lattices/ contains the pre-processed data structures from one-best sequences, or confusion networks, or lattices, which are stored as zipped archive files by numpy with suffix .npz. Each one has the following attributes:

    • topo_order - a list of node indices that follows a topological order;
    • child_2_parent - a dictionary that maps from a node index to a dictionary, whose key is the index of the parent node and the value is the index of the connecting edge for lattices or a list indices of the connecting edges for confusion networks. This is used for the forward recursion;
    • parent_2_child – a dictionary that maps from a node index to a dictionary, whose key is the index of the child node and the value is the index of the connecting edge for lattices or a list indices of the connecting edges for confusion networks. This is used for the backward recursion;
    • edge_data – a numpy 2d array (matrix) containing all relevant information from the source file where the row index is the edge index. For an arc in a lattice, the information could include the word, the start time and the end time, LM and AM scores. For an arc in a confusion network, the arc posterior probability, the start and the end time should be available;
    • ignore – a list of edge indices whose corresponding word is one of the following <s>, </s>, !NULL, <hes>, which are due to be skipped during training of the network.
  • target/ contains the pre-processed training targets which correspond to the ones in lattices/. They are also stored in .npz format. Each one has the following attributes:

    • target - a list of target confidence scores for each arc in the corresponding lattice, with each element being either 0(incorrect) or 1(correct);
    • indices - a list of arc indices on one-best sequences in the structure;
    • ref - a list of target confidence scores on one-best sequences.
  • *.txt stores the absolute paths of data where each line corresponds to one sample in lattices/ directory.

  • stats.npz stores the statistics of the input feature, which is used for data normalisation upon loading. It has the following attributes:

    • mean - the mean of input feature vectors across the dataset;
    • std - the standard deviation of input feature vectors across the dataset.

References

@inproceedings{Li2019BiLatRNN,
  title={Bi-directional Lattice Recurrent Neural Networks for Confidence Estimation},
  author={Li, Qiujia and Ness, Preben M. and Ragni, Anton and Gales, Mark J. F.},
  booktitle={ICASSP},
  year={2019},
  address={Brighton}
}

@inproceedings{Ragni2018Confidence,
  title={Confidence Estimation and Deletion Prediction Using Bidirectional Recurrent Neural Networks},
  author={Ragni, Anton and Li, Qiujia and Gales, Mark J. F. and Wang, Yu},
  booktitle={SLT},
  year={2018},
  address={Athens}
}

lattice_rnn's People

Contributors

alecokas avatar qiujiali avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.