Giter Site home page Giter Site logo

nmt's Introduction

nmt-adequacy

Attention-based NMT with Coverage, Context Gate, and Reconstruction

This is a Theano-based RNNSearch, which integrates the following techniques to alleviate the problem of fluent but inadequate translations that NMT suffers from.

Coverage to indicate whether a source word is translated or not, which proves to alleviate over-translation and under-translation.

Context Gate to dynamically control the ratios at which source and target contexts contribute to the generation of target words, which enhances the adequacy of NMT while keeping the fluency unchanged.

Reconstruction to reconstruct the input source sentence from the hidden layer of the output target sentence, to ensure that the information in the source side is transformed to the target side as much as possible.

If you use the code, please cite our papers:


@InProceedings{Tu:2016:ACL,
      author    = {Tu, Zhaopeng and Lu, Zhengdong and Liu, Yang and Liu, Xiaohua and Li, Hang},
      title     = {Modeling Coverage for Neural Machine Translation},
      booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics},
      year      = {2016},
}
@Article{Tu:2017:TACL,
      author    = {Tu, Zhaopeng and Liu, Yang and Lu, Zhengdong and Liu, Xiaohua and Li, Hang},
      title     = {Context Gates for Neural Machine Translation},
      booktitle = {Transactions of the Association for Computational Linguistics},
      year      = {2017},
}
@InProceedings{Tu:2017:AAAI,
      author    = {Tu, Zhaopeng and Liu, Yang and Shang, Lifeng and Liu, Xiaohua and Li, Hang},
      title     = {Neural Machine Translation with Reconstruction},
      booktitle = {Proceedings of the 31st AAAI Conference on Artificial Intelligence},
      year      = {2017},
}

For any comments or questions, please email the first author.

nmt's People

Contributors

tuzhaopeng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nmt's Issues

numpy.dtype has the wrong size

I've got the following error after running experiment/nmt/train.sh

2017-01-22 22:15:55,327: experiments.nmt.encdec: DEBUG: Create input variables
2017-01-22 22:15:55,335: experiments.nmt.encdec: DEBUG: Create encoder
2017-01-22 22:15:55,335: experiments.nmt.encdec: DEBUG: _create_embedding_layers
2017-01-22 22:15:59,077: experiments.nmt.encdec: DEBUG: _create_transition_layers
2017-01-22 22:16:05,818: experiments.nmt.encdec: DEBUG: _create_inter_level_layers
2017-01-22 22:16:05,821: experiments.nmt.encdec: DEBUG: _create_representation_layers
2017-01-22 22:16:06,360: experiments.nmt.encdec: DEBUG: Build encoding computation graph
2017-01-22 22:16:06,603: experiments.nmt.encdec: DEBUG: Create backward encoder
2017-01-22 22:16:06,603: experiments.nmt.encdec: DEBUG: _create_embedding_layers
2017-01-22 22:16:09,537: experiments.nmt.encdec: DEBUG: _create_transition_layers
2017-01-22 22:16:15,764: experiments.nmt.encdec: DEBUG: _create_inter_level_layers
2017-01-22 22:16:15,764: experiments.nmt.encdec: DEBUG: _create_representation_layers
2017-01-22 22:16:16,270: experiments.nmt.encdec: DEBUG: Build backward encoding computation graph
2017-01-22 22:16:16,293: experiments.nmt.encdec: DEBUG: Create decoder
2017-01-22 22:16:16,293: experiments.nmt.encdec: DEBUG: _create_embedding_layers
2017-01-22 22:16:19,108: experiments.nmt.encdec: DEBUG: _create_transition_layers
2017-01-22 22:16:19,108: experiments.nmt.encdec: DEBUG: RecurrentLayerWithSearch is used
2017-01-22 22:16:26,440: experiments.nmt.encdec: DEBUG: _create_inter_level_layers
2017-01-22 22:16:26,440: experiments.nmt.encdec: DEBUG: _create_initialization_layers
2017-01-22 22:16:26,685: experiments.nmt.encdec: DEBUG: _create_decoding_layers
2017-01-22 22:16:28,107: experiments.nmt.encdec: DEBUG: _create_readout_layers
2017-01-22 22:16:31,290: experiments.nmt.encdec: DEBUG: Build log-likelihood computation graph
2017-01-22 22:16:31,458: groundhog.layers.cost_layers: DEBUG: Get grads
2017-01-22 22:16:34,369: groundhog.layers.cost_layers: DEBUG: Got grads
2017-01-22 22:16:34,369: experiments.nmt.encdec: DEBUG: Build sampling computation graph
2017-01-22 22:16:35,146: experiments.nmt.encdec: DEBUG: Create auxiliary variables
2017-01-22 22:16:35,147: experiments.nmt.encdec: DEBUG: Compile sampler
Traceback (most recent call last):
File "/nethome/yzhang3151/project/NMT/experiments/nmt/train.py", line 100, in
main()
File "/nethome/yzhang3151/project/NMT/experiments/nmt/train.py", line 82, in main
lm_model = enc_dec.create_lm_model()
File "/home/yzhang3151/project/NMT/experiments/nmt/encdec.py", line 1981, in create_lm_model
sample_fn=self.create_sampler(),
File "/home/yzhang3151/project/NMT/experiments/nmt/encdec.py", line 2034, in create_sampler
name="sample_fn")
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function.py", line 320, in function
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/pfunc.py", line 479, in pfunc
output_keys=output_keys)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1777, in orig_function
defaults)
File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 1641, in create
input_storage=input_storage_lists, storage_map=storage_map)
File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 690, in make_thunk
storage_map=storage_map)[:3]
File "/usr/local/lib/python2.7/dist-packages/theano/gof/vm.py", line 1003, in make_all
no_recycling))
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_op.py", line 913, in make_thunk
from . import scan_perform_ext
File "/usr/local/lib/python2.7/dist-packages/theano/scan_module/scan_perform_ext.py", line 141, in
from scan_perform.scan_perform import *
File "init.pxd", line 155, in init theano.scan_module.scan_perform (/home/yzhang3151/.theano/compiledir_Linux-3.13--generic-x86_64-with-Ubuntu-14.04-trusty-x86_64-2.7.6-64/scan_perform/mod.cpp:9984)
ValueError: ('The following error happened while compiling the node', forall_inplace,cpu,layer_enc_transition_0}(Shape_i{0}.0, Subtensor{int64:int64:int8}.0, Subtensor{int64:int64:int8}.0, Subtensor{int64:int64:int8}.0, IncSubtensor{InplaceSet;:int64:}.0, G_enc_transition_0, R_enc_transition_0, W_enc_transition_0), '\n', 'numpy.dtype has the wrong size, try recompiling')

SyntaxError: invalid syntax

File "/home/jimmy/Desktop/Attention-Coverage/NMT/models.py", line 496
def run_pipeline(self, state_below, mask_below, init_context=None, c=None, c_mask=None)

SyntaxError: invalid syntax

you forgot ":" sign after define function

data_state.py

Hello @tuzhaopeng ,

In data_state.py file you named a variable dim which is equal to 1000. Is this the dimension of the word vectors or the hidden layer size?

And also could you please tell me what are the meaning/usages of these variables ; could not manage to find their meaning :

null_sym_source=16000,
null_sym_target=16000,
n_sym_source=16001,
n_sym_target=16001,
use_linguistic_coverage=False,

B.R.
Kadir

运行experiment/nmt/train.py时报错:TypeError

Using gpu device 0: GeForce GT 740M (CNMeM is disabled, cuDNN not available)
2018-03-07 14:59:15,233: main: DEBUG: State:
{'activ': 'lambda x: TT.tanh(x)',
'adaeps': 1e-06,
'adarho': 0.95,
'algo': 'SGD_adadelta',
'backward': False,
'bias': 0.0,
'bias_code': True,
'bigram': True,
'bs': 64,
'check_first_word': True,
'cutoff': 1.0,
'cutoff_rescale_length': 0.0,
'dec_rec_gater': 'lambda x: TT.nnet.sigmoid(x)',
'dec_rec_gating': True,
'dec_rec_layer': 'RecurrentLayer',
'dec_rec_reseter': 'lambda x: TT.nnet.sigmoid(x)',
'dec_rec_reseting': True,
'decoder_stack': 1,
'decoding_inputs': True,
'deep_out': True,
'dim': 1000,
'divide_lr': 2,
'dropout': 1.0,
'dropout_rec': 1.0,
'enc_rec_gater': 'lambda x: TT.nnet.sigmoid(x)',
'enc_rec_gating': True,
'enc_rec_layer': 'RecurrentLayer',
'enc_rec_reseter': 'lambda x: TT.nnet.sigmoid(x)',
'enc_rec_reseting': True,
'encoder_stack': 1,
'eps': 1e-10,
'forward': False,
'hookFreq': 20,
'indx_word': None,
'indx_word_target': None,
'last_backward': False,
'last_forward': True,
'level': 'DEBUG',
'loopIters': 3000000,
'lr': 1.0,
'maintain_coverage': False,
'maxout_part': 2.0,
'minerr': -1,
'minlr': 0,
'n_examples': 3,
'n_samples': 3,
'n_sym_source': None,
'n_sym_target': None,
'null_sym_source': None,
'null_sym_target': None,
'on_nan': 'raise',
'oov': 'UNK',
'overwrite': 1,
'patience': 1,
'prefix': 'phrase_',
'rank_n_activ': 'lambda x: x',
'rank_n_approx': 100,
'rec_gater': 'lambda x: TT.nnet.sigmoid(x)',
'rec_gating': True,
'rec_layer': 'RecurrentLayer',
'rec_reseter': 'lambda x: TT.nnet.sigmoid(x)',
'rec_reseting': True,
'rec_weight_init_fn': 'sample_weights_orth',
'rec_weight_scale': 1.0,
'reload': True,
'reset': -1,
'saveFreq': 10,
'search': False,
'seed': 1234,
'seqlen': 30,
'shuffle': False,
'sort_k_batches': 10,
'source': [None],
'take_top': True,
'target': [None],
'timeStop': 44640,
'trainFreq': 1,
'trim_batches': True,
'unary_activ': 'Maxout(2)',
'unk_sym_source': 1,
'unk_sym_target': 1,
'use_context_gate': False,
'use_current_context_for_context_gate': True,
'use_decoding_state_for_context_gate': True,
'use_infinite_loop': True,
'use_nce': False,
'use_previous_target_word_for_context_gate': True,
'validFreq': 500,
'weight_init_fn': 'sample_weights_classic',
'weight_noise': False,
'weight_noise_amount': 0.01,
'weight_noise_rec': False,
'weight_scale': 0.01,
'word_indx': None,
'word_indx_trgt': None}
2018-03-07 14:59:15,233: experiments.nmt.encdec: DEBUG: Create input variables
2018-03-07 14:59:15,233: experiments.nmt.encdec: DEBUG: Create encoder
2018-03-07 14:59:15,234: experiments.nmt.encdec: DEBUG: _create_embedding_layers
Traceback (most recent call last):
File "train.py", line 100, in
main()
File "train.py", line 81, in main
enc_dec.build()
File "/home/yanmengqi/下载/beifen/NMT-master/experiments/nmt/encdec.py", line 1882, in build
self.encoder.create_layers()
File "/home/yanmengqi/下载/beifen/NMT-master/experiments/nmt/encdec.py", line 1124, in create_layers
self._create_embedding_layers()
File "/home/yanmengqi/下载/beifen/NMT-master/experiments/nmt/encdec.py", line 1016, in _create_embedding_layers
**self.default_kwargs)
File "/home/yanmengqi/下载/beifen/NMT-master/groundhog/layers/ff_layers.py", line 174, in init
self._init_params()
File "/home/yanmengqi/下载/beifen/NMT-master/groundhog/layers/ff_layers.py", line 205, in _init_params
self.rng)
File "/home/yanmengqi/下载/beifen/NMT-master/groundhog/utils/utils.py", line 108, in sample_weights_classic
sizeX = int(sizeX)
TypeError: int() argument must be a string or a number, not 'NoneType'

Transliteration in preprocessing

Did you used transliteration in preprocessing ? In the paper you did not mention about it. I wonder did you used it or not.

B.R.
Kadir

ERROR:stream:file [./data/train_src.shuffle] do not exist

I had already run prepare.py and copy folder named data form prepare_data/ to NMT/
here is my configurations.py :
``
#configurations for rnnsearch with coverage
def get_config_search_coverage():

config = {}

# added by Zhaopeng Tu, 2016-11-08
# coverage penalty in GNMT
# alpha in the paper
config['length_penalty_factor'] = 0.0
# beta in the paper
config['coverage_penalty_factor'] = 0.0
# pruning
config['with_decoding_pruning'] = False
# beamsize in the paper
config['decoding_pruning_beam'] = 3

# added by Zhaopeng Tu, 2016-06-09
config['with_attention'] = True
# added by Zhaopeng Tu, 2016-07-29
config['output_kbest'] = False

# added by Zhaopeng Tu, 2016-05-02
# ACL 2016: Modeling Coverage for Neural Machine Translation
# configurations for coverage model
config['with_coverage'] = False
# the coverage_dim for linguistic coverage is always 1
config['coverage_dim'] = 100
# coverage type: 'linguistic' or 'neural'
config['coverage_type'] = 'neural' 
# max value of fertility, the value of N in the paper
config['max_fertility'] = 2

# added by Zhaopeng Tu, 2016-05-30
# TACL 2017: Context Gates for Neural Machine Translation
# configurations for context gate
config['with_context_gate'] = False

# added by Zhaopeng Tu, 2016-07-11
# AAAI 2017: Neural Machine Translation with Reconstruction
# the reconstruction work
config['with_reconstruction'] = False
config['reconstruction_weight'] = 1.

# added by Zhaopeng Tu, 20170-04-18
config['with_cache'] = False
config['train_cache_parameters_only'] = False
# added by Zhaopeng Tu, 2017-05-11
# use cache only when the number of its values is greater than ``cache_fine''
config['cache_fine'] = 1
config['cache_size'] = 1000
config['cache_nhids'] = 1000
# query: s_i, c_i, y_i-1, readout
config['cache_query'] = ['c_i']
config['cache_query_dim'] = 2000

# Sequences longer than this will be deleted
config['seq_len_src'] = 80
config['seq_len_trg'] = 80

# Number of hidden units in GRU/LSTM
config['nhids_src'] = 1000
config['nhids_trg'] = 1000

# Dimension of the word embedding matrix
config['nembed_src'] = 620
config['nembed_trg'] = 620

# Batch size of train data
config['batch_size'] = 80

# This many batches will be read ahead and sorted
config['sort_k_batches'] = 20

# BeamSize
config['beam_size'] = 10

# Where to save model
config['saveto'] = './model.npz'
config['saveto_best'] = './model_best.npz'

# Dropout ratio, applied only after readout maxout
config['dropout'] = 0.5

# Maxout, set maxout_part=1 to turn off
config['maxout_part'] = 1

# vocabulary size, include '</S>'
config['src_vocab_size'] = 30001
config['trg_vocab_size'] = 30001

# Special tokens and indexes
config['unk_id'] = 1
config['unk_token'] = '<UNK>'
config['bos_token'] = '<S>'
config['eos_token'] = '</S>'

# Root directory for dataset
datadir = './data/'

# added by Zhaopeng Tu, 2016-07-21
config['replace_unk'] = False
config['unk_dict'] = datadir + 'unk_dict'

# Vocabularies
config['vocab_src'] = datadir + 'vocab_src.pkl'
config['vocab_trg'] = datadir + 'vocab_trg.pkl'

# Datasets
config['train_src'] = datadir + 'train_src'
config['train_trg'] = datadir + 'train_trg'
config['valid_src'] = datadir + 'valid_src'
config['valid_trg'] = datadir + 'valid_trg'
config['valid_out'] = datadir + 'valid_out'

# Bleu script that will be used
config['bleu_script'] = datadir + 'mteval-v11b.pl'
config['res_to_sgm'] = datadir + 'plain2sgm.py'

# Maxmum number of epoch
config['finish_after'] = 20

# Reload model from files if exist
config['reload'] = True

# Save model after this many updates
config['save_freq'] = 5000

# Sample frequence
config['sample_freq'] = 50
# Hook samples
config['hook_samples'] = 3

# Valid frequence
config['valid_freq'] = 10000
config['valid_freq_fine'] = 5000

# Start bleu validation after this many updates
config['val_burn_in'] = 100000
config['val_burn_in_fine'] = 150000

# GRU, LSTM
config['method'] = 'GRU'

# Gradient clipping
config['clip_c'] = 1.

return config

``

I run train.sh and got these errors
ERROR:stream:file [./data/train_src.shuffle] do not exist
ERROR:stream:file [./data/train_trg.shuffle] do not exist
INFO:main:final result: epoch -1 updates -1 valid_bleu_best -1.0000

how do I generate these two files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.