Giter Site home page Giter Site logo

Comments (8)

spro avatar spro commented on May 24, 2024 9

I put a first version of the batched model at https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb via 31fdb61

The biggest changes are using pack_padded_sequence before the encoder RNN and pad_packed_sequence after it, and the masked cross entropy loss from @jihunchoi after decoding. For the decoder itself changes are minor because it only runs one time step at a time.

from practical-pytorch.

howardyclo avatar howardyclo commented on May 24, 2024 4

Hi guys, I implemented more features based on this tutorial (e.g. batched computation for attention) and added some notes.
Check out my repo here: https://github.com/howardyclo/pytorch-seq2seq-example/blob/master/seq2seq.ipynb

from practical-pytorch.

ehsanasgari avatar ehsanasgari commented on May 24, 2024

Hi,
Thank you for the great work! Would you please add batching to the tutorial as well?

from practical-pytorch.

vijendra-rana avatar vijendra-rana commented on May 24, 2024

Hello, @spro I have been working through extending with batch. My code is here https://github.com/vijendra-rana/Random/blob/master/translation_with_batch.py . I have created some fake data for this. But the problem is I am getting error in loss saying


RuntimeError: Trying to backward through the graph second time, but the buffers have already been freed. Please specify retain_variables=True when calling backward for the first time.

I understand we cannot have loss being backwarded twice but I don't see anywhere that I am doing it twice.
Also I have question about masking how would you mask the loss at the encoder. Not sure how to implement it the encoder o/p being size (seq_len,batch,hidden_size) and mask being (batch_size,seq_len)

Thanks in advance for help :)

from practical-pytorch.

vijendra-rana avatar vijendra-rana commented on May 24, 2024

Thanks, @spro for your effort in putting these together. Your tutorials are really nice.

from practical-pytorch.

physicsman avatar physicsman commented on May 24, 2024

I noticed some implementations of batch seq2seq with attention allow for an embedded size that is different then the hidden size. Is there a reason to match the two sizes?

from practical-pytorch.

suwangcompling avatar suwangcompling commented on May 24, 2024

@spro Thanks for the nice code sample.
Had some trouble, looking to get some help: I tried to run it out of the box, hit an error in this block:

max_target_length = max(target_lengths)


decoder_input = Variable(torch.LongTensor([SOS_token] * small_batch_size))
decoder_hidden = encoder_hidden[:decoder_test.n_layers] # Use last (forward) hidden state from encoder
all_decoder_outputs = Variable(torch.zeros(max_target_length, small_batch_size, decoder_test.output_size))

if USE_CUDA:
    all_decoder_outputs = all_decoder_outputs.cuda()
    decoder_input = decoder_input.cuda()

# Run through decoder one time step at a time
for t in range(max_target_length):
    decoder_output, decoder_hidden, decoder_attn = decoder_test(
        decoder_input, decoder_hidden, encoder_outputs
    )
    all_decoder_outputs[t] = decoder_output # Store this step's outputs
    decoder_input = target_batches[t] # Next input is current target

# Test masked cross entropy loss
loss = masked_cross_entropy(
    all_decoder_outputs.transpose(0, 1).contiguous(),
    target_batches.transpose(0, 1).contiguous(),
    target_lengths
)
print('loss', loss.data[0])

The error reads as follows:


---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-28-babf231e41ef> in <module>()
     13 for t in range(max_target_length):
     14     decoder_output, decoder_hidden, decoder_attn = decoder_test(
---> 15         decoder_input, decoder_hidden, encoder_outputs
     16     )
     17     all_decoder_outputs[t] = decoder_output # Store this step's outputs

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

<ipython-input-24-43d7954b3ba4> in forward(self, input_seq, last_hidden, encoder_outputs)
     35         # Calculate attention from current RNN state and all encoder outputs;
     36         # apply to encoder outputs to get weighted average
---> 37         attn_weights = self.attn(rnn_output, encoder_outputs)
     38         context = attn_weights.bmm(encoder_outputs.transpose(0, 1)) # B x S=1 x N
     39 

/usr/local/lib/python2.7/dist-packages/torch/nn/modules/module.pyc in __call__(self, *input, **kwargs)
    489             result = self._slow_forward(*input, **kwargs)
    490         else:
--> 491             result = self.forward(*input, **kwargs)
    492         for hook in self._forward_hooks.values():
    493             hook_result = hook(self, input, result)

<ipython-input-22-61485b548d0f> in forward(self, hidden, encoder_outputs)
     27             # Calculate energy for each encoder output
     28             for i in range(max_len):
---> 29                 attn_energies[b, i] = self.score(hidden[:, b], encoder_outputs[i, b].unsqueeze(0))
     30 
     31         # Normalize energies to weights in range 0 to 1, resize to 1 x B x S

<ipython-input-22-61485b548d0f> in score(self, hidden, encoder_output)
     40         elif self.method == 'general':
     41             energy = self.attn(encoder_output)
---> 42             energy = hidden.dot(energy)
     43             return energy
     44 

RuntimeError: Expected argument self to have 1 dimension, but has 2

from practical-pytorch.

NLPScott avatar NLPScott commented on May 24, 2024

@suwangcompling hidden = hidden.squeeze(), encoder_output = encoder_output.squeeze()
you can try it!

from practical-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.