Giter Site home page Giter Site logo

csikasote / slt.kit Goto Github PK

View Code? Open in Web Editor NEW

This project forked from isl-mt/slt.kit

0.0 0.0 0.0 246 KB

Spoken Language Translation System

License: MIT License

Python 5.14% Shell 20.03% Perl 16.15% Makefile 0.84% C++ 34.83% M4 22.75% Emacs Lisp 0.02% JavaScript 0.23%

slt.kit's Introduction

SLT.KIT

This repository contains a toolkit for speech translation. It provides a Docker container with a ready to use pipeline containing the following components:

  • a neural speech recognition system
  • a sentence segmentation system
  • an attention-based translation system

The speech recognition system processes the audio files and creates the transcription in the source language. Afterwords the sentence segmentation system adds punctuation and recases the output. Finally the output is translated by the machine translation system. We provide pipelines to train these model as well as pre-trained models for all components for the task of translating English lectures to German.

The system uses the following software:

Requirements:

Updates

Installation

    git clone https://github.com/isl-mt/SLT.KIT.git
    cd SLT.KIT
    docker build --build-arg CUDA=$CUDAVERSION -t slt.kit -f Dockerfile.ST-Baseline .
    with CUDAVERSION = 8.0 or 9.0 or 9.1

Run

  • Starting the docker container (e.g. source language English (en) and target language German (de))
    docker run -ti --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=$gpuid slt.kit
    export sl=en
    export tl=de

File Structure

  • The general file structure used by all models and systems is described in File structure

System

  • This repository contains different systems that can be used to do speech translation
    • Cascaded systems: Systems that combine an ASR, sentence segmentation/puncation and MT component

      • ctc-tedlium2.smallTED: Combination of the ctc-tedlium2 ASR system and the smallTED system for sentence segmentation and MT
      • ctc-tedlium2.midSize: Combination of the ctc-tedlium2 ASR system and the midSize system for sentence segmentation and MT
    • ASR systems: Systems to transcribe the audio

      • ctc-tedlium2: Simple LSTM network trained with the CTC loss that outputs BPE units
      • las-tedlium2: Attention-based ASR system
    • Sentence segmentation/MT

      • ted: System trained on the TED corpus
      • midSize: System trained on TED and EPPS corpus

Test sets

  • English to German
    • dev2010
    • tst2010
    • tst2013
    • tst2014
    • tst2015

Results

The results reported here are generated by Rover'ing the output of the three ASR systems (CTC 300, CTC 10k and the attention-based ASR system) and using the MT system trained on the TED corpus.

English to German

SET BLEU TER BEER CharacTER BLEU(ci) TER(ci)
dev2010 13.98 71.78 45.88 78.50 15.05 69.68
tst2010 14.08 71.66 44.40 77.66 15.12 69.36
tst2013 13.73 72.81 44.02 71.45 14.61 70.78
tst2014 13.28 74.34 42.43 78.38 14.01 72.62

Furthermore, results for the MT system can be found here.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.