tran-khoa / joint-training-cascaded-st Goto Github PK

View Code? Open in Web Editor NEW

Code for the paper "Does Joint Training Really Help Cascaded Speech Translation?" (EMNLP 2022)

License: MIT License

Python 82.77% C++ 0.45% Cuda 0.82% Cython 0.29% Shell 15.58% Lua 0.09%

joint-training-cascaded-st's Introduction

Does Joint Training Really Help Cascaded Speech Translation?

This repository contains code for the paper "Does Joint Training Really Help Cascaded Speech Translation?" (arXiv) in EMNLP 2022, based on fairseq.

Cite This Work

To cite this work, please use the following .bib:

@InProceedings{tran22:joint_training_cascaded_speech_translation,
	author={Tran, Viet Anh Khoa and Thulke, David and Gao, Yingbo and Herold, Christian and Ney, Hermann},  	
	title={Does Joint Training Really Help Cascaded Speech Translation?},  
	booktitle={Conference on Empirical Methods in Natural Language Processing},
	year=2022,  
	address={Abu Dhabi, United Arab Emirates},  
	month=nov,  
	booktitlelink={https://2022.emnlp.org/},
}

Requirements and Installation (adapted from fairseq)

PyTorch version 1.7.1
torchaudio 0.7.2
Python version >= 3.7
To install fairseq and develop locally:

git clone https://github.com/tran-khoa/joint-training-cascaded-st
cd joint-training-cascaded-st
pip install --editable ./
cd projects/speech_translation
pip install -r requirements.txt

# on MacOS:
# CFLAGS="-stdlib=libc++" pip install --editable ./

For faster training install NVIDIA's apex library:

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" \
  --global-option="--deprecated_fused_adam" --global-option="--xentropy" \
  --global-option="--fast_multihead_attn" ./

Running experiments

The implementation is located in projects/speech_translation. Please refer to the scripts in projects/speech_translation/experiments. The term joint-seq refers to Top-K-Train in the paper, tight refers to 'Tight-Integration' as introduced in Tight integrated end-to-end training for cascaded speech translation.