Giter Site home page Giter Site logo

transformer's Introduction

Transformer

This is a pytorch implementation of the transformer model. If you'd like to understand the model, or any of the code better, please refer to my tutorial.

Using the Europarl dataset plus the dataset in the data folder, I was able to achieve a BLEU score of 0.39 on the test set (current SOTA is around 0.42), after 4/5 days of training on a single 8gb GPU. For more results see the tutorial again.

Train the model immediately on FloydHub

Run on FloydHub

Launch a FloydHub Workspace to start training this model with 1 click. Workspace is a GPU-enabled cloud IDE for machine learning. It provides a fully configured environment so you can start hacking right away, without worrying about dependencies, data sets, etc.

Once you've started the workspace, run the 'start_here' notebook or type 'floyd run' into the workspace terminal. This will begin to train the model on the sample dataset.

Usage

Two text files containing parallel sentences (seperated by '\n' characters) in two languages are required to train the model. See an example of this in the data/ folder (french.txt and english.txt).

To begin training, run this code:

python train.py -src_data path/lang1.txt -trg_data path/lang2.txt -src_lang lang1 -trg_lang lang2

The spacy tokenizer is used to tokenize the text, hence only languages supported by spacy are supported by this program. The languages supported by Spacy and their codes are:

English : 'en'
French : 'fr'
Portugese : 'pt'
Italian : 'it'
Dutch : 'nl'
Spanish : 'es'
German : 'de'

For example, to train tan English->French translator on the datasets provided in the data folder, you would run the following:

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr

Additional parameters:
-epochs : how many epochs to train data for (default=2)
-batch_size : measured as number of tokens fed to model in each iteration (default=1500)
-n_layers : how many layers to have in Transformer model (default=6)
-heads : how many heads to split into for multi-headed attention (default=8)
-no_cuda : adding this will disable cuda, and run model on cpu
-SGDR : adding this will implement stochastic gradient descent with restarts, using cosine annealing
-d_model : dimension of embedding vector and layers (default=512)
-dropout' : decide how big dropout will be (default=0.1)
-printevery : how many iterations run before printing (default=100)
-lr : learning rate (default=0.0001)
-load_weights : if loading pretrained weights, put path to folder where previous weights and pickles were saved
-max_strlen : sentenced with more words will not be included in dataset (default=80)
-checkpoint : enter a number of minutes. Model's weights will then be saved every this many minutes to folder 'weights/'

Training and Translating

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr -epochs 10

This code gave the following results on a K100 GPU with 8bg RAM:

screen shot 2018-09-18 at 21 35 55

After saving the results to folder 'weights', the model can then be tested:

python translate.py -load_weights weights

screen shot 2018-09-18 at 21 40 08

So with a small dataset of 150,000 sentences and 1 hour of training, already some quite good results...

Features still to add

  • create validation set and get validation scores each epoch
  • function to show translations of sentences from training and validation sets

transformer's People

Contributors

oscarberonius avatar rcasero avatar samlynnevans avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

transformer's Issues

AttributeError: 'int' object has no attribute 'dim'

I was following your post on Medium, thanks for the great walkthrough. While training the model I landed an error, following is the traceback of the error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-28-2c3e6665094d> in <module>
----> 1 train_model(1)

<ipython-input-22-ed997c68c1f0> in train_model(epochs, print_every)
     22             # create function to make masks using mask code above
     23             src_mask, trg_mask = create_masks(src, trg_input)
---> 24             preds = model(src, trg_input, src_mask, trg_mask)
     25 
     26             optim.zero_grad()

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-19-c19adc4fb35e> in forward(self, src, trg, src_mask, trg_mask)
      7 
      8     def forward(self, src, trg, src_mask, trg_mask):
----> 9         e_outputs = self.encoder(src, src_mask)
     10         d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
     11         output = self.out(d_output)

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-18-75ae85de9dd0> in forward(self, src, mask)
     12         x = self.pe(x)
     13         for i in range(N):
---> 14             x = self.layers[i](x, mask)
     15         return self.norm(x)
     16 

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-17-1d977cf7013c> in forward(self, x, mask)
     13     def forward(self, x, mask):
     14         x2 = self.norm_1(x)
---> 15         x = x + self.dropout_1(self.attn(x2, x2, 2, mask))
     16         x2 = self.norm_2(x)
     17         x = x + self.dropout_2(self.ff(x2))

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

<ipython-input-14-ead1732683eb> in forward(self, q, k, v, mask)
     34         k = self.k_linear(k).view(bs, -1, self.h, self.d_k)
     35         q = self.q_linear(q).view(bs, -1, self.h, self.d_k)
---> 36         v = self.v_linear(v).view(bs, -1, self.h, self.d_k)
     37 
     38         # transpose to get dimensions bs * h * sl * d_model

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
    548             result = self._slow_forward(*input, **kwargs)
    549         else:
--> 550             result = self.forward(*input, **kwargs)
    551         for hook in self._forward_hooks.values():
    552             hook_result = hook(self, input, result)

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/modules/linear.py in forward(self, input)
     85 
     86     def forward(self, input):
---> 87         return F.linear(input, self.weight, self.bias)
     88 
     89     def extra_repr(self):

~/anaconda3/envs/tf-chatbot/lib/python3.6/site-packages/torch/nn/functional.py in linear(input, weight, bias)
   1606         if any([type(t) is not Tensor for t in tens_ops]) and has_torch_function(tens_ops):
   1607             return handle_torch_function(linear, tens_ops, input, weight, bias=bias)
-> 1608     if input.dim() == 2 and bias is not None:
   1609         # fused op is marginally faster
   1610         ret = torch.addmm(bias, input, weight.t())

AttributeError: 'int' object has no attribute 'dim'

I am unable to solve this as I am new to PyTorch. Pl, help me solve the issue.

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Appriciate for release code, I have a little question is how to set gpu to train the model, when I train the model this error show up, thanks

"""
The device argument should be set by using torch.device or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
training model...
Traceback (most recent call last): ] 0% loss = ...
File "train.py", line 183, in
main()
File "train.py", line 111, in main
train_model(model, opt)
File "train.py", line 34, in train_model
src_mask, trg_mask = create_masks(src, trg_input, opt)
File "/home/lin/program/Transformer-master/Batch.py", line 26, in create_masks
trg_mask = trg_mask & np_mask
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
"""

runtime error

Traceback (most recent call last): ] 0% loss = ...
File "train.py", line 183, in
main()
File "train.py", line 111, in main
train_model(model, opt)
File "train.py", line 34, in train_model
src_mask, trg_mask = create_masks(src, trg_input, opt)
File "/dockerdata/bert_seq2seq/Transformer-master/Batch.py", line 25, in create_masks
trg_mask = trg_mask & np_mask
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_and

how to solve the problem

ModuleNotFoundError: No module named '_regex'

Hi,

I have been using this transformer implementation for so long and everything worked well.
I have some trained model (saved in checkpoint) but now, after some time when I try to load the checkpoints for generation, I got this error which is rooted from dill package:

loading spacy tokenizers...
Traceback (most recent call last):
  File "translate_file.py", line 148, in <module>
    main()
  File "translate_file.py", line 118, in main
    fields = create_fields(opt)
  File "/home/composition_func/Process.py", line 75, in create_fields
    SRC = pickle.load(open(f'{opt.load_weights}/SRC.pkl', 'rb'))
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 270, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 472, in load
    obj = StockUnpickler.load(self)
  File "/home/anaconda3/envs/py36/lib/python3.7/site-packages/dill/_dill.py", line 826, in _import_module
    return __import__(import_name)
ModuleNotFoundError: No module named '_regex'

I have regex installed and dill version is 0.3.1.1
I guess my previous dill version maybe was sth else. But I cannot solve this error. I cannot load my trained model at all.
Do you have any idea?
Thanks.

error: file not found

how do i run this on my local pc? it always shows me that error: file not found.
even the files are in that folder. i tried all slash versions [/, \, //, \, ]nothing worked to find that folder or .txt file

CUDA assert error or out memory error

Hi,
I am having the subjected issue.
I am just testing the code, training loops works well.
However, during evaluation, the following line causes the error.
x = x * math.sqrt(self.d_model)
RuntimeError: CUDA error: device-side assert triggered.

Any prompt help in this regard will be very helpful.

image

IndexError when testing with python translate.py

Hello,
Many thanks for sharing project, unfortunatelly getting IndexError: index 0 is out of bounds for dimension 0 with size 0 when running on floydhub python translate.py -load_weights weights -src_lang en -trg_lang fr -floyd -no_cuda and parsing text for translation. May someone know where could be the problem?

error

RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'other'

Hello,

Thank you for sharing your code.

I tried running the code (train.py) on a dataset (both source and target are English) and I am getting the following error in Batch.py.

loading spacy tokenizers...
creating dataset and iterator...
The `device` argument should be set by using `torch.device` or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
training model...
cudam: epoch 1 [                    ]  0%  loss = ...
Traceback (most recent call last):
  File "train.py", line 183, in <module>
    main()
  File "train.py", line 111, in main
    train_model(model, opt)
  File "train.py", line 34, in train_model
    src_mask, trg_mask = create_masks(src, trg_input, opt)
  File "/home/hannahbrahman/ROCstories/Transformer/Batch.py", line 26, in create_masks
    trg_mask = trg_mask & np_mask
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'other'

I tried to add device= torch.device('cuda' if torch.cuda.is_available() else 'cpu') and np_mask.to(device) but still getting the same error.
I git cloned your repo on my own machine and tried running train.py.
My pytorch version is:
1.0.0.dev20181105

create_valset argument in train.py

While trying out the repo, I came across this argument create_valset which was being parsed from the terminal, but I not not sure if it is being used anywhere else.

Can someone please point me in the right direction whether the split of train and validation set has been implemented or not?

Screenshot from 2019-08-05 13-39-43

Thanks in advance!

what's the shape of the padding mask?

Thanks for sharing so great repo, and one question: what's the shape of the padding mask? and how i construct my padding mask with custom dataset? such as now my padding mask is
[ [1,1,1,1,0,0,0],
[1,1,0,0,0,0,0],
[1,1,1,1,1,1,0],
[1,1,1,1,1,1,1]
]
but i give to encoder.forward, it shows runtime errror, anyone can help me?
@SamLynnEvans

Code bug in Beam.py with line 76-78

i = vec[0]
if sentence_lengths[i]==0: # First end symbol has not been found yet
sentence_lengths[i] = vec[1] # Position of first end symbol

the above index i should be replaced by another j index.

Computational Error on PositionalEncoder()

I read your blog post in TowardsDataScience on this model, and I think there may be a computational error in line 27 of Transformer/Embed.py. In the paper and in other implementations, like this one, we should have PE_(pos, 2i+1) = math.cos(pos / (10000 ** ((2 * i)/d_model))), not math.cos(pos / (10000 ** ((2 * (i + 1))/d_model))), as the code currently stands.

AttributeError: module 'torchtext.data' has no attribute 'Iterator'

Traceback (most recent call last): File "train.py", line 5, in <module> from Process import * File "/Users/pycharm_pro/PyTorch_Learning/Transformer/Process.py", line 5, in <module> from Batch import MyIterator, batch_size_fn File "/Users/pycharm_pro/PyTorch_Learning/Transformer/Batch.py", line 35, in <module> class MyIterator(data.Iterator): AttributeError: module 'torchtext.data' has no attribute 'Iterator'

    • torch 1.8.0
  • torch-cluster 1.5.4
  • torch-geometric 1.5.0
  • torch-scatter 2.0.4
  • torch-sparse 0.6.1
  • torch-spline-conv 1.2.0
  • torchtext 0.9.0

https://pytorch.org/text/stable/data_functional.html

image

i guess whether the torchtext.data's api has changed ,but i don't find right api

Run Time Error and Transfer Learning?

I got the following error while compiling
python train.py -src_data data/europarl-v7_de.txt -trg_data data/europarl-v7_en.txt -src_lang de -trg_lang en -SGDR -epochs 10 -checkpoint 10 -batchsize 128 -load_weights weights
loading spacy tokenizers...
loading presaved fields...
creating dataset and iterator...
The device argument should be set by using torch.device or passing a string as an argument. This behavior will be deprecated soon and currently defaults to cpu.
Traceback (most recent call last):
File "train.py", line 185, in
main()
File "train.py", line 97, in main
opt.train = create_dataset(opt, SRC, TRG)
File "Documents\transformers\Process.py", line 89, in create_dataset
opt.train_len = get_len(train_iter)
File "Documents\transformers\Process.py", line 95, in get_len
for i, b in enumerate(train):
File "envs\alexandria\lib\site-packages\torchtext\data\iterator.py", line 157, in iter
yield Batch(minibatch, self.dataset, self.device)
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\batch.py", line 34, in init
setattr(self, name, field.process(batch, device=device))
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\field.py", line 201, in process
tensor = self.numericalize(padded, device=device)
File "Anaconda3\envs\alexandria\lib\site-packages\torchtext\data\field.py", line 323, in numericalize
var = torch.tensor(arr, dtype=self.dtype, device=device)
RuntimeError: sizes must be non-negative
I am not sure why this is occurring but I had changed my source and training parallel corpus to a larger europarl dataset is such transfer learning supported? If not how would i go about doing that.
EDIT 1: I have subsequently trained it a model from scratch with a batchsize of 128 ( I am running on a GTX960M) and encounter the same problem.

A little error in Positional Encoder

pe[pos, i] = math.sin(pos / (10000 ** ((2 * i) / d_model)))
pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * (i + 1)) / d_model)))

In paper, the cos value should be

pe[pos, i + 1] = math.cos(pos / (10000 ** ((2 * i) / d_model)))

The version

Can the author show the version of pytorch and torchtext. The code cannot run in my env.

pytorch 1.13
cuda 11.7
torchtext: 0.14.0

The error message is "ModuleNotFoundError: No module named 'torchtext.legacy' "

We replace the torchtext.legacy to torchtext.
the error message is "AttributeError: module 'torchtext.data' has no attribute 'Iterator'"

The train command doesn't seem to work for me

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

I get this error if I try to run the following after cloning the repo

python train.py -src_data data/english.txt -trg_data data/french.txt -src_lang en -trg_lang fr

Is there something I am missing?

Misinterpreted multi head attention

Hi, I think you misinterpreted the multi head attention in Vaswani's Attention is all you need paper.

What you do, is (assume only one query) projecting the query and keys and values once, separating them into sections (heads), and apply attention to the heads separately.

However imo the paper says, that you have nr_heads * 3 separate projections (so 3 set of weights per head), you do the projection, apply (again nr_heads times) the attention. Then you concatenate the results and project them back to the appropriate size.

Let me know what you think. Otherwise your post on towards data science is very helpful for me to learn pytroch.
Best regards,
Zoltán

Error

F:\Anaconda\envs\Transformer-master\python.exe E:/Transformer-master/train.py -src_data english.txt -trg_data french.txt -src_lang en -trg_lang fr -epochs 10
loading spacy tokenizers...
Traceback (most recent call last):
File "E:/Transformer-master/train.py", line 184, in
main()
File "E:/Transformer-master/train.py", line 96, in main
SRC, TRG = create_fields(opt)
File "E:\Transformer-master\Process.py", line 35, in create_fields
t_src = tokenize(opt.src_lang)
File "E:\Transformer-master\Tokenize.py", line 7, in init
self.nlp = spacy.load(lang)
File "F:\Anaconda\envs\Transformer-master\lib\site-packages\spacy_init_.py", line 15, in load
return util.load_model(name, **overrides)
File "F:\Anaconda\envs\Transformer-master\lib\site-packages\spacy\util.py", line 119, in load_model
raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

Adding a new layer to this model

Hi. Please i would like to know how to add a new layer in your Transformer model between the Encoder and Decoder Layers so that the outputs coming from the Encoder are given to that new layer before going to the decoder. I am new language translation and i am trying to play with model that i see. I am interested in yours and would like to add a new model to it just for fun. Please can you guide me since that new model should have:

  • nn.Dropout
  • nn.Embedding
  • nn.LSTM
  • nn.Linear
  • nn.Dropout

Please i want the dimension since of each layers considering the output size of your encoder.

Cheers.

runtime error

the whole error is as follows:, x = x + pe got different size?

creating dataset and iterator...
model weights will be saved every 20 minutes and at end of epoch to directory weights/
training model...
Traceback (most recent call last):
File "train.py", line 192, in
main()
File "train.py", line 120, in main
train_model(model, opt)
File "train.py", line 37, in train_model
preds = model(src, trg_input, src_mask, trg_mask)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Models.py", line 50, in forward
d_output = self.decoder(trg, e_outputs, src_mask, trg_mask)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Models.py", line 36, in forward
x = self.pe(x)
File "/home/tensorflow/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/tensorflow/reaction/Transformer/Embed.py", line 40, in forward
x = x + pe
RuntimeError: The size of tensor a (230) must match the size of tensor b (200) at non-singleton dimension 1

Process.py missing?

Hi I was wondering if there is a Process.py missing in your code. It seems to be imported, and a couple of functions rely on it (read_data perhaps?) but I can't seem to find it.

I hope this is what is going on and I haven't missed smth blatantly obvious!

Thanks for sharing the code!

Regards,
Theodore.

file no found,

屏幕截图 2023-12-27 234231 I have tried many methods but none of them work successfully. Can you help me?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.