kevinduh / sockeye-recipes Goto Github PK

Training scripts and recipes for Sockeye Neural Machine Translation toolkit

Shell 68.70% Python 31.30%

sockeye-recipes's Introduction

sockeye-recipes

Training scripts and recipes for the Sockeye Neural Machine Translation (NMT) toolkit

The original Sockeye codebase is at AWS Labs
This version is based off a stable fork. The current sockeye version that sockeye-recipes is built on is: 1.18.57.

This package contains scripts that makes it easy to run NMT experiments. The way to use this package is to specify settings in a file like "hyperparams.txt", then run the following scripts:

scripts/preprocess-bpe.sh: Preprocess bitext via BPE segmentation
scripts/train.sh: Train the NMT model given bitext
scripts/translate.sh: Translates a tokenized input file using an existing model

Installation

First, clone this package:

git clone https://github.com/kevinduh/sockeye-recipes.git sockeye-recipes

We assume that Anaconda for Python virtual environments is available on the system. Run the following to install Sockeye in two Anaconda environments, sockeye_cpu and sockeye_gpu:

cd path/to/sockeye-recipes
bash ./install/install_sockeye_cpu.sh
bash ./install/install_sockeye_gpu.sh
bash ./install/install_tools.sh

The training scripts and recipes will activate either the sockeye_cpu or sockeye_gpu environment depending on whether CPU or GPU is specified. Currently we assume CUDA 9.0 is available for GPU mode; this can be changed if needed. The third install_tools.sh script simply installs some helper tools, such as BPE preprocesser.

Re-Install

When the sockeye version is updated, it is recommended to re-run the installation scripts in a clean conda environment:

conda remove --name sockeye_cpu --all
conda remove --name sockeye_gpu --all
bash ./install/install_sockeye_cpu.sh
bash ./install/install_sockeye_gpu.sh

If you want to back-up an existing version of the conda environment, run this before re-installing:

conda create --name sockeye_gpu_bkup --clone sockeye_gpu
conda create --name sockeye_cpu_bkup --clone sockeye_cpu

Environment Setup

Depending on your computer setup, you may want add the following configurations in the ~/.bashrc file.

Configure CUDA and CuDNN for the GPU version of Sockeye:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/NVIDIA/cuda-9.0/lib64

Set up a clean UTF-8 environment to avoid encoding errors:

export LANG=en_US.UTF-8

Recipes

The egs subdirectory contains recipes for various datasets.

egs/quickstart: For first time users, this recipe explains how sockeye-recipe works.
egs/ted: Recipes for training various NMT models, using a TED Talks dataset consisting of 20 different languages.
egs/wmt14-en-de: Recipe for training a baseline that compares with the Luong EMNLP2015 paper.
egs/curriculum: Recipe for curriculum learning. Also explains how to use sockeye-recipes in conjunction with a custom sockeye installation.
egs/optimizers: Example of training with different optimizers (e.g. ADAM, EVE, Nesterov ADAM, SGD, ...)
egs/iarpa-material: Recipe for low-resource NMT using data from the IARPA MATERIAL program

The hpm subdirectory contains hyperparameter (hpm) file templates. Besides NMT hyerparameters, the most important variables in this file to set are below:

rootdir: location of your sockeye-recipes installation, used for finding relevant scripts (i.e. this is current directory, where this README file is located.)
modeldir: directory for storing a single Sockeye model training process
workdir: directory for placing various modeldirs (i.e. a suite of experiments with different hyperparameters) corresponding to the same dataset
train_tok and valid_tok: prefix of tokenized training and validation bitext file path
train_bpe_{src,trg} and valid_bpe_{src,trg}: alternatively, prefix of the above training and validation files already processed by BPE

Auto-Tuning

This package also provides tools for auto-tuning, where one can specify the hyperparameters to search over and a meta-optimizer automatically attempts to try different configurations that it believes will be promising. For more information, see the auto-tuning folder.

Design Principles and Suggested Usage

Building NMT systems can be a tedious process involving lenghty experimentation with hyperparameters. The goal of sockeye-recipes is to make it easy to try many different configurations and to record best practices as example recipes. The suggested usage is as follows:

Prepare your training and validation bitext beforehand with the necessary preprocessing (e.g. data consolidation, tokenization, lower/true-casing). Sockeye-recipes simply assumes pairs of train_tok and valid_tok files.
Set the working directory to correspond to a single suite of experiments on the same dataset (e.g. WMT17-German-English)
The only preprocessing handled here is BPE. Run preprocess-bpe.sh with different BPE vocabulary sizes (bpe_symbols_src, bpe_symbols_trg). These can be saved all to the same datadir.
train.sh is the main training script. Specify a new modeldir for each train.sh run. The hyperparms.txt file used in training will be saved in modeldir for future reference.
At the end, your workingdir will have a single datadir containing multiple BPE'ed versions of the bitext, and multiple modeldir's. You can run tensorboard on all these modeldir's concurrently to compare learning curves.

There are many options in Sockeye. Currently not all of them are used in sockeye-recipes; more will be added. See sockeye/arguments.py for detailed explanations.

Alternatively, directly call sockeye with the help option as below. Note that sockeye-recipe hyperameters have the same name as sockeye hyperparameters, except that sockeye-recipe hyperparameters replace the hyphen with underscore (e.g. --num-embed in sockeye becomes $num_embed in sockeye-recipes):

source activate sockeye_cpu
python -m sockeye.train --help

sockeye-recipes's People

Contributors

Stargazers

Watchers

Forkers

marvinzh este1le khayrallah aryamccarthy happythenewsad jsedoc bikegirl mitchellgordon95 rekriz11 taylorj7 ykl7 arendu-zz noisychannel aarichburg mgwillia asteria25

sockeye-recipes's Issues

add support for different subword preprocessing

e.g. joint bpe and sentencepiece

Add egs for Sockeye.autopilot

Make install_sockeye_custom accept a device type which defaults to GPU

Update get-gpu to support COE GPU policy

some problems

I am using the command: ../../../scripts/train-curriculum.sh -p rs1.hpm -e sockeye-curriculum

There are two problems:
1.
[ERROR:root] Uncaught exception
Traceback (most recent call last):
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/prepare_data.py", line 93, in
main()
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/prepare_data.py", line 32, in main
prepare_data(args)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/prepare_data.py", line 90, in prepare_data
curriculum_score_file=args.curriculum_score_file)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/data_io.py", line 669, in prepare_data
sample_scores = [int(line.strip()) for line in curriculum_score_file if line.strip()]
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/data_io.py", line 669, in
sample_scores = [int(line.strip()) for line in curriculum_score_file if line.strip()]
ValueError: invalid literal for int() with base 10: ''
2.
[ERROR:root] Uncaught exception
Traceback (most recent call last):
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/train.py", line 969, in
main()
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/train.py", line 835, in main
train(args)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/train.py", line 889, in train
output_folder=output_folder)
File "/opt/anaconda3/envs/sockeye-curriculum/lib/python3.6/site-packages/sockeye/train.py", line 260, in create_data_iters_and_vocabs
fill_up=args.fill_up,
AttributeError: 'Namespace' object has no attribute 'fill_up'

Can someone help me fix these two errors?
Thanks.

train-alloptions.sh is missing from scripts.

get-gpu for no CUDA_VISIBLE_DEVICES is flaky (0%)

Questions about the version

If Cuda is 11.8, how can I change the version of other packages to make the code run properly. I tried to change the version of another package, but ran../..// When exporting mxnet packages using the scripts/train curriculum.sh - p rs1.hpm - e sockeye curriculum command, an error about cuda may occur

Add egs for continued training (for domain adaptation)

continue-train.sh already exists and works. Just need to provide an example recipe.

No module named sockeye.train

What is causing the error below, can someone advise?

python -m sockeye.train --help
/home/user/anaconda2/envs/sockeye_gpu/bin/python: No module named sockeye.train

Unable to locate a modulefile for 'cuda100/toolkit'

Sir
I installed CUDA 10.0 toolkit for TeslaK20 GPU card. And accordingly I changed install_sockeye_custom.sh to install requirements for it. Content of requirement_cu100.txt is
pyyaml==3.12
mxnet-cu100mkl==1.3.1
numpy>=1.8.2
typing
portalocker.

Installation process worked fine. But while training Error is thrown by ModuleCmd_Load.c(213):ERROR:105: Unable to locate a modulefile for 'cuda100/toolkit'.

Cuda toolkit is properly installed at /usr/local/cuda-10.0

Please help what could be the possible wrong here.

Make LR explicit

This varies from version to version. Make this fixed in the config.

Update (and fix) version of sockeye in INSTALL scripts

Update all scripts to support CUDA 9.0 by default

What is the right way to use translate.sh?

I am currently calling translate.sh with the following arguments:
bash ~/sockeye-recipes/scripts/translate.sh -p anon_glove_zhanglapata.hpm -i data/newsela_Zhang_Lapata_splits/V0V4_V1V4_V2V4_V3V4_V0V3_V0V2_V1V3.aner.ori.test.src -o output/bpe.1best_greedy -e sockeye_gpu -b 1

and I get the message below:

translate.py: error: argument --output-type: expected one argument
Traceback (most recent call last):
File "/srv/disk01/ggarbace/TSGen/sockeye-recipes/tools/subword-nmt//apply_bpe.py", line 308, in
BrokenPipeError: [Errno 32] Broken pipe
Exception ignored in: <_io.TextIOWrapper name='' encoding='utf-8'>
BrokenPipeError: [Errno 32] Broken pipe
End translating: 2020-03-06 15:10:32 on zen

After adding --output-type as argument, I get:
Usage: translate.sh -p hyperparams.txt -i input -o output -e ENV_NAME [-d DEVICE] [-s]
Input is a source text file to be translated
Output is filename for target translations
ENV_NAME is the sockeye conda environment name
Device is optional and inferred from ENV
-s is optional and skips BPE processing on input source

I don't think I am missing any of the mandatory parameters. What is going wrong here? Thanks!

MXBoard warning

Hi,

I'm getting the following warning (see below). Does anyone know if this is serious? Should I use a compatible version? Which one?

[WARNING:root] The currently installed MXNet version 1.1.0 is less than 1.2.0. Some functionality of MXBoard may not work.

Update egs/ted to be consistent with new scripts

Error message in the egs/pretrained_embeddings Tutorial

Dear Kevin, thanks for your great work. :)

I am on sockeye version 1.18.72.
When executing...

python3 -m sockeye.init_embedding
-w sample-de-en/emb/small.cln.de.vec.npy sample-de-en/emb/small.cln.en.vec.npy
-i sample-de-en/emb/small.cln.de.vec.vocab sample-de-en/emb/small.cln.en.vec.vocab
-o model/vocab.src.0.json model/vocab.tgt.0.json
-n source_embed_weight target_embed_weight
-f params.init

...it returns the error message:
sockeye.utils.SockeyeError: Vocabulary sample-de-en/emb/small.cln.de.vec.vocab not valid.

Can you imagine why?

I appreciate your help.
Dominik