samlynnevans / transformer Goto Github PK

Transformer seq2seq model, program that can build a language translator from parallel corpus

License: Apache License 2.0

Python 87.62% Jupyter Notebook 12.38%

transformer's Introduction

Transformer

This is a pytorch implementation of the transformer model. If you'd like to understand the model, or any of the code better, please refer to my tutorial.

Using the Europarl dataset plus the dataset in the data folder, I was able to achieve a BLEU score of 0.39 on the test set (current SOTA is around 0.42), after 4/5 days of training on a single 8gb GPU. For more results see the tutorial again.

Train the model immediately on FloydHub

Launch a FloydHub Workspace to start training this model with 1 click. Workspace is a GPU-enabled cloud IDE for machine learning. It provides a fully configured environment so you can start hacking right away, without worrying about dependencies, data sets, etc.

Once you've started the workspace, run the 'start_here' notebook or type 'floyd run' into the workspace terminal. This will begin to train the model on the sample dataset.

Usage

Two text files containing parallel sentences (seperated by '\n' characters) in two languages are required to train the model. See an example of this in the data/ folder (french.txt and english.txt).

To begin training, run this code:

python train.py -src_data path/lang1.txt -trg_data path/lang2.txt -src_lang lang1 -trg_lang lang2

The spacy tokenizer is used to tokenize the text, hence only languages supported by spacy are supported by this program. The languages supported by Spacy and their codes are:

English : 'en'
French : 'fr'
Portugese : 'pt'
Italian : 'it'
Dutch : 'nl'
Spanish : 'es'
German : 'de'

For example, to train tan English->French translator on the datasets provided in the data folder, you would run the following:

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr

Additional parameters:
-epochs : how many epochs to train data for (default=2)
-batch_size : measured as number of tokens fed to model in each iteration (default=1500)
-n_layers : how many layers to have in Transformer model (default=6)
-heads : how many heads to split into for multi-headed attention (default=8)
-no_cuda : adding this will disable cuda, and run model on cpu
-SGDR : adding this will implement stochastic gradient descent with restarts, using cosine annealing
-d_model : dimension of embedding vector and layers (default=512)
-dropout' : decide how big dropout will be (default=0.1)
-printevery : how many iterations run before printing (default=100)
-lr : learning rate (default=0.0001)
-load_weights : if loading pretrained weights, put path to folder where previous weights and pickles were saved
-max_strlen : sentenced with more words will not be included in dataset (default=80)
-checkpoint : enter a number of minutes. Model's weights will then be saved every this many minutes to folder 'weights/'

Training and Translating

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr -epochs 10

This code gave the following results on a K100 GPU with 8bg RAM:

After saving the results to folder 'weights', the model can then be tested:

python translate.py -load_weights weights

So with a small dataset of 150,000 sentences and 1 hour of training, already some quite good results...

Features still to add

create validation set and get validation scores each epoch
function to show translations of sentences from training and validation sets

transformer's People

Contributors

Stargazers

Watchers

Forkers

anirband dantodor benzei shubhampachori12110095 dnzengou radomd92 polytronicgr namaljayathunga ideaplexus thekevinscott sdozono handcraftsman vkuznet hal2001 qf6101 manolo20 kawasaki2013 l126t sanjeev2487 oscarberonius heath-lee alphadl suxuanyuan samurainote gzjas jakobhavtorn bxclib2 zckoh xiongshufeng minitu single430 sangnie gokunwu xuexixuexihaha allensmile xxyy1 mac-kim ritam006 gabrielelanaro unosonu crystalwlh wibruce lastrei charlottesean terence1023 leena201818 robertmacyiii liudefu nimasnjb lu839684437 parkchunyoung etrigger shamanez qujingying shannonjwirtz pandinosaurus alexli-98 madhuparna04 akmystery xingxinyu96 db-li joeau husnejahan jlopeztmu emilyo9264 alexlu13 ersawant sthitap2 dreaminvoker de30 nklpolley biranchi2018 laoli2046 ravi-0809 chencong-jxnu xiangzhumeng16 hovinhthinh embeddedsamurai parkchanjun ag027592 pbagherzadeh ukij jtmccoy afcarl annacurly17 linhduongtuan codeshaonian aaronbriel fuqianggu hanguantianxia yupenggao llcurious yuanzhedong chenchengnian lcg22 pradeeppathak9 pchankh dmahui suka1557 scarlett5945

transformer's Issues

AttributeError: 'int' object has no attribute 'dim'

I was following your post on Medium, thanks for the great walkthrough. While training the model I landed an error, following is the traceback of the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-2c3e6665094d> in <module>
----> 1 train_model(1)

<ipython-input-22-ed997c68c1f0> in train_model(epochs, print_every)
     22             # create function to make masks using mask code above
     23             src_mask, trg_mask = create_masks(src, trg_input)
---> 24             preds = model(src, trg_input, src_mask, trg_mask)
     25 
     26             optim.zero_grad()

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-19-c19adc4fb35e> in forward(self, src, trg, src_mask, trg_mask)
      7 
      8     def forward(self, src, trg, src_mask, trg_mask):
----> 9         e_outputs = self.encoder(src, src_mask)
     10         d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
     11         output = self.out(d_output)

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-18-75ae85de9dd0> in forward(self, src, mask)
     12         x = self.pe(x)
     13         for i in range(N):
---> 14             x = self.layers[i](x, mask)
     15         return self.norm(x)
     16 

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-17-1d977cf7013c> in forward(self, x, mask)
     13     def forward(self, x, mask):
     14         x2 = self.norm_1(x)
---> 15         x = x + self.dropout_1(self.attn(x2, x2, 2, mask))
     16         x2 = self.norm_2(x)
     17         x = x + self.dropout_2(self.ff(x2))

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-14-ead1732683eb> in forward(self, q, k, v, mask)
     34         k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
     35         q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
---> 36         v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
     37 
     38         # transpose to get dimensions bs * h * sl * d_model

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1606         if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
   1607             return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
-> 1608     if input.dim() == 2 and bias is not None:
   1609         # fused op is marginally faster
   1610         ret = torch.addmm(bias, input, weight.t())

AttributeError: 'int' object has no attribute 'dim'

I am unable to solve this as I am new to PyTorch. Pl, help me solve the issue.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Appriciate for release code, I have a little question is how to set gpu to train the model, when I train the model this error show up, thanks

"""
The device argument should be set by using torch.device or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
training model...
Traceback (most recent call last): ] 0% loss = ...
File "train.py", line 183, in
main()
File "train.py", line 111, in main
train_model(model, opt)
File "train.py", line 34, in train_model
src_mask, trg_mask = create_masks(src, trg_input, opt)
File "/home/lin/program/Transformer-master/Batch.py", line 26, in create_masks
trg_mask = trg_mask & np_mask
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
"""

runtime error

Traceback (most recent call last): ] 0% loss = ...
File "train.py", line 183, in
main()
File "train.py", line 111, in main
train_model(model, opt)
File "train.py", line 34, in train_model
src_mask, trg_mask = create_masks(src, trg_input, opt)
File "/dockerdata/bert_seq2seq/Transformer-master/Batch.py", line 25, in create_masks
trg_mask = trg_mask & np_mask
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_and

how to solve the problem

ModuleNotFoundError: No module named '_regex'

Hi,

I have been using this transformer implementation for so long and everything worked well.
I have some trained model (saved in checkpoint) but now, after some time when I try to load the checkpoints for generation, I got this error which is rooted from dill package:

loading spacy tokenizers...
Traceback (most recent call last):
  File "translate_file.py", line 148, in <module>
    main()
  File "translate_file.py", line 118, in main
    fields = create_fields(opt)
  File "/home/composition_func/Process.py", line 75, in create_fields
    SRC = pickle.load(open(f'{opt.load_weights}/SRC.pkl', 'rb'))
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 826, in _import_module
    return __import__(import_name)
ModuleNotFoundError: No module named '_regex'

I have regex installed and dill version is 0.3.1.1
I guess my previous dill version maybe was sth else. But I cannot solve this error. I cannot load my trained model at all.
Do you have any idea?
Thanks.

error: file not found

how do i run this on my local pc? it always shows me that error: file not found.
even the files are in that folder. i tried all slash versions [/, \, //, \, ]nothing worked to find that folder or .txt file

unexpected error, argument #3 'index' in call to _th_index_select

Im getting this error when following the readme instructions:
RuntimeError: Expected object of device type cuda but got device type cpu for argument #3 'index' in call to _th_index_select

any tips?

(im running it in colab with cuda GPU)

CUDA assert error or out memory error

Hi,
I am having the subjected issue.
I am just testing the code, training loops works well.
However, during evaluation, the following line causes the error.
x = x * math.sqrt(self.d_model)
RuntimeError: CUDA error: device-side assert triggered.

Any prompt help in this regard will be very helpful.

IndexError when testing with python translate.py

Hello,
Many thanks for sharing project, unfortunatelly getting IndexError: index 0 is out of bounds for dimension 0 with size 0 when running on floydhub python translate.py -load_weights weights -src_lang en -trg_lang fr -floyd -no_cuda and parsing text for translation. May someone know where could be the problem?

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'other'

Hello,

Thank you for sharing your code.

I tried running the code (train.py) on a dataset (both source and target are English) and I am getting the following error in Batch.py.

loading spacy tokenizers...
creating dataset and iterator...
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
training model...
cudam: epoch 1 [                    ]  0%  loss = ...
Traceback (most recent call last):
  File "train.py", line 183, in <module>
    main()
  File "train.py", line 111, in main
    train_model(model, opt)
  File "train.py", line 34, in train_model
    src_mask, trg_mask = create_masks(src, trg_input, opt)
  File "/home/hannahbrahman/ROCstories/Transformer/Batch.py", line 26, in create_masks
    trg_mask = trg_mask & np_mask
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'other'

I tried to add device= torch.device('cuda' if torch.cuda.is_available() else 'cpu') and np_mask.to(device) but still getting the same error.
I git cloned your repo on my own machine and tried running train.py.
My pytorch version is:
1.0.0.dev20181105

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

when i want to train the model with command python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr -epochs 10, it shows the error message above. emmm what's wrong with it?

create_valset argument in train.py

While trying out the repo, I came across this argument create_valset which was being parsed from the terminal, but I not not sure if it is being used anywhere else.

Can someone please point me in the right direction whether the split of train and validation set has been implemented or not?

Thanks in advance!

what's the shape of the padding mask?

Thanks for sharing so great repo, and one question: what's the shape of the padding mask? and how i construct my padding mask with custom dataset? such as now my padding mask is
[ [1,1,1,1,0,0,0],
[1,1,0,0,0,0,0],
[1,1,1,1,1,1,0],
[1,1,1,1,1,1,1]
]
but i give to encoder.forward, it shows runtime errror, anyone can help me?
@SamLynnEvans

What is `create_fields()` method in `translate.py` file?

Transformer/translate.py

Line 84 in e06ae28

SRC, TRG = create_fields(opt)

I think that is a typo, and the correct method call is create_masks(opt).

Is that what it should be?

RuntimeError: The size of tensor a (127) must match the size of tensor b (40) at non-singleton dimension 3

if mask is not None:
mask = mask.unsqueeze(1)
scores = scores.masked_fill(mask == 0, -1e9)
scores = F.softmax(scores, dim=-1)

runtime error in scores.masked_fill(mask == 0, -1e9)
Same for this:
x = x + Variable(self.pe[:,:seq_len], requires_grad=False).cuda()
x = x + pe
return self.dropout(x)

Please help

Code bug in Beam.py with line 76-78

i = vec[0]
if sentence_lengths[i]==0: # First end symbol has not been found yet
sentence_lengths[i] = vec[1] # Position of first end symbol

the above index i should be replaced by another j index.

Computational Error on PositionalEncoder()

I read your blog post in TowardsDataScience on this model, and I think there may be a computational error in line 27 of Transformer/Embed.py. In the paper and in other implementations, like this one, we should have PE_(pos, 2i+1) = math.cos(pos / (10000 ** ((2 * i)/d_model))), not math.cos(pos / (10000 ** ((2 * (i + 1))/d_model))), as the code currently stands.

AttributeError: module 'torchtext.data' has no attribute 'Iterator'

Traceback (most recent call last): File "train.py", line 5, in <module> from Process import * File "/Users/pycharm_pro/PyTorch_Learning/Transformer/Process.py", line 5, in <module> from Batch import MyIterator, batch_size_fn File "/Users/pycharm_pro/PyTorch_Learning/Transformer/Batch.py", line 35, in <module> class MyIterator(data.Iterator): AttributeError: module 'torchtext.data' has no attribute 'Iterator'

- torch 1.8.0
torch-cluster 1.5.4
torch-geometric 1.5.0
torch-scatter 2.0.4
torch-sparse 0.6.1
torch-spline-conv 1.2.0
torchtext 0.9.0

https://pytorch.org/text/stable/data_functional.html

i guess whether the torchtext.data's api has changed ,but i don't find right api

Run Time Error and Transfer Learning?

I got the following error while compiling
python train.py -src_data data/europarl-v7_de.txt -trg_data data/europarl-v7_en.txt -src_lang de -trg_lang en -SGDR -epochs 10 -checkpoint 10 -batchsize 128 -load_weights weights
loading spacy tokenizers...
loading presaved fields...
creating dataset and iterator...
The device argument should be set by using torch.device or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
Traceback (most recent call last):
File "train.py", line 185, in
main()
File "train.py", line 97, in main
opt.train = create_dataset(opt, SRC, TRG)
File "Documents\transformers\Process.py", line 89, in create_dataset
opt.train_len = get_len(train_iter)
File "Documents\transformers\Process.py", line 95, in get_len
for i, b in enumerate(train):
File "envs\alexandria\lib\site-packages\torchtext\data\iterator.py", line 157, in iter
yield Batch(minibatch, self.dataset, self.device)
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\field.py", line 201, in process
tensor = self.numericalize(padded, device=device)
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\field.py", line 323, in numericalize
var = torch.tensor(arr, dtype=self.dtype, device=device)
RuntimeError: sizes must be non-negative
I am not sure why this is occurring but I had changed my source and training parallel corpus to a larger europarl dataset is such transfer learning supported? If not how would i go about doing that.
EDIT 1: I have subsequently trained it a model from scratch with a batchsize of 128 ( I am running on a GTX960M) and encounter the same problem.

A little error in Positional Encoder

pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))

In paper, the cos value should be

pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * i) / d_model)))

The version

Can the author show the version of pytorch and torchtext. The code cannot run in my env.

pytorch 1.13
cuda 11.7
torchtext: 0.14.0

The error message is "ModuleNotFoundError: No module named 'torchtext.legacy' "

We replace the torchtext.legacy to torchtext.
the error message is "AttributeError: module 'torchtext.data' has no attribute 'Iterator'"

Does it support to load pretrained model

l wann't to know if lt support to load pretrained model. like bert as the encoder?

Did not multiply embedding weights by sqrt(d_model)

Hi,
In this line:

Transformer/Embed.py

Line 12 in 37bf492

return self.embed(x)

I think you need to multiply the embedding by sqrt(d_model)

The train command doesn't seem to work for me

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

I get this error if I try to run the following after cloning the repo

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr

Is there something I am missing?

Misinterpreted multi head attention

Hi, I think you misinterpreted the multi head attention in Vaswani's Attention is all you need paper.

What you do, is (assume only one query) projecting the query and keys and values once, separating them into sections (heads), and apply attention to the heads separately.

However imo the paper says, that you have nr_heads * 3 separate projections (so 3 set of weights per head), you do the projection, apply (again nr_heads times) the attention. Then you concatenate the results and project them back to the appropriate size.

Let me know what you think. Otherwise your post on towards data science is very helpful for me to learn pytroch.
Best regards,
Zoltán

What is argument 'k' in translate.py?

Does this variable 'k' relates with any other variable when we trained, e.g. 'max_len' variable in test is 'max_strlen' in training.

Error

F:\Anaconda\envs\Transformer-master\python.exe E:/Transformer-master/train.py -src_data english.txt -trg_data french.txt -src_lang en -trg_lang fr -epochs 10
loading spacy tokenizers...
Traceback (most recent call last):
File "E:/Transformer-master/train.py", line 184, in
main()
File "E:/Transformer-master/train.py", line 96, in main
SRC, TRG = create_fields(opt)
File "E:\Transformer-master\Process.py", line 35, in create_fields
t_src = tokenize(opt.src_lang)
File "E:\Transformer-master\Tokenize.py", line 7, in init
self.nlp = spacy.load(lang)
File "F:\Anaconda\envs\Transformer-master\lib\site-packages\spacy_init_.py", line 15, in load
return util.load_model(name, **overrides)
File "F:\Anaconda\envs\Transformer-master\lib\site-packages\spacy\util.py", line 119, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Adding a new layer to this model

Hi. Please i would like to know how to add a new layer in your Transformer model between the Encoder and Decoder Layers so that the outputs coming from the Encoder are given to that new layer before going to the decoder. I am new language translation and i am trying to play with model that i see. I am interested in yours and would like to add a new model to it just for fun. Please can you guide me since that new model should have:

nn.Dropout
nn.Embedding
nn.LSTM
nn.Linear
nn.Dropout

Please i want the dimension since of each layers considering the output size of your encoder.

Cheers.

runtime error

the whole error is as follows:, x = x + pe got different size?

creating dataset and iterator...
model weights will be saved every 20 minutes and at end of epoch to directory weights/
training model...
Traceback (most recent call last):
File "train.py", line 192, in
main()
File "train.py", line 120, in main
train_model(model, opt)
File "train.py", line 37, in train_model
preds = model(src, trg_input, src_mask, trg_mask)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Models.py", line 50, in forward
d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Models.py", line 36, in forward
x = self.pe(x)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Embed.py", line 40, in forward
x = x + pe
RuntimeError: The size of tensor a (230) must match the size of tensor b (200) at non-singleton dimension 1

Thanks for sharing the code!

Regards,
Theodore.

file no found,

I have tried many methods but none of them work successfully. Can you help me?