Giter Site home page Giter Site logo

sincerass / powernorm Goto Github PK

View Code? Open in Web Editor NEW
119.0 8.0 17.0 695 KB

[ICML 2020] code for "PowerNorm: Rethinking Batch Normalization in Transformers" https://arxiv.org/abs/2003.07845

License: GNU General Public License v3.0

Python 96.36% C++ 0.89% Cuda 1.75% Shell 1.00%

powernorm's Introduction

Introduction

Here we present the instructions to reproduce the machine translation results from our ICML 2020 paper PowerNorm: Rethinking Batch Normalization in Transformers, video. The PowerNorm is implemented here.

Here is the illustration plot of batch/power normalization (left) and layer normalization (right). The entries colored in blue show the components used for calculating the statistics.

The codes are based on open-sourced fairseq (v0.8.0). Follow this link for a detailed document about the original code base and this link for some examples of training baseline Transformer models for machine translation with fairseq.

We also provide pre-trained models for several benchmark translation datasets.

Requirements and Installation

The fairseq library we use requires PyTorch version >= 1.2.0. Please follow the instructions here.

After PyTorch is installed, you can install fairseq with:

conda env create --file env.yml
python setup.py build develop

Reproduction

The scripts for training and testing PowerNorm is located at trans-scripts folder. Please refer to this page to preprocess and get binarized data or use the data we provided in the next section. To reproduce the results for Table.1 by yourself:

# IWSLT14 De-En
## To train the model
./trans-scripts/train/train-iwslt14.sh encoder_norm_self_attn encoder_norm_ffn decoder_norm_self_attn decoder_norm_ffn
example:
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/train/train-iwslt14.sh power power layer layer
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/train/train-iwslt14.sh batch batch layer layer
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/train/train-iwslt14.sh layer layer layer layer

## To test a checkpoint
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/test/test-iwslt14.sh output_directory checkpoint_best.pt

# WMT14 En-De big
## To train the model, we are using 128 GPUs for our experiments.
./trans-scripts/train/train-wmt-big.sh encoder_norm_self_attn encoder_norm_ffn decoder_norm_self_attn decoder_norm_ffn
example:
$ CUDA_VISIBLE_DEVICES=0,1,2,3 ./trans-scripts/train/train-wmt14-big.sh power power layer layer

## To test a checkpoint
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/test/test-wmt14.sh output_directory checkpoint_best.pt

Pre-trained Models

We provide following pre-trained models and pre-processed, binarized datasets for reproduction:

Description Dataset Model Test set(s)
Transformer-PN small IWSLT14 German-English download (.tbz2) IWSLT14 test set (shared vocab):
download (.tbz2)

Example usage:

# IWSLT14 De-En
## at trans-net/translation/, after download the tbz2 file
$ tar xf powernorm_pretrain_iwslt.tbz2 
$ OUTPUT_DIR=iwslt14_de_en/powernorm_pretrain_iwslt
$ CKPT=averaged_model.pt
$ CUDA_VISIBLE_DEVICES=0 ./trans-scripts/test/test-iwslt14.sh $OUTPUT_DIR $CKPT
...
| Generate test with beam=5: BLEU4 = 35.87, 69.5/44.2/30.1/20.9 (BP=0.961, ratio=0.962, syslen=126196, reflen=131156)

Citation

PowerNorm has been developed as part of the following paper. We appreciate it if you would please cite the following paper if you found the library useful for your work:

@inproceedings{shen2020powernorm,
  title={PowerNorm: Rethinking Batch Normalization in Transformers},
  author={Shen, Sheng and Yao, Zhewei and Gholami, Amir and Mahoney, Michael and Keutzer, Kurt},
  booktitle={ICML},
  year={2020}
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.