Giter Site home page Giter Site logo

Comments (13)

spro avatar spro commented on May 28, 2024 1

Yes, you should see a greater improvement by using larger batches and/or a larger model.

from practical-pytorch.

spro avatar spro commented on May 28, 2024

The tutorial does not use CUDA yet - usually you have to do something like

tensor = tensor.cuda()

for every tensor and model, to move it over to GPU. I will be updating it soon to include this.

from practical-pytorch.

spro avatar spro commented on May 28, 2024

I just updated it to include a USE_CUDA variable

from practical-pytorch.

czs0x55aa avatar czs0x55aa commented on May 28, 2024

Thank you for your help :οΌ‰
I tried to add .cuda() in code, itβ€˜s only a slight acceleration for the run time. Due to the use of GPU is not high, I guess it maybe be related to batch_size.

from practical-pytorch.

michaelklachko avatar michaelklachko commented on May 28, 2024

Was this change reverted? I don't see any CUDA commands in the code...

from practical-pytorch.

spro avatar spro commented on May 28, 2024

Currently only the seq2seq tutorial uses CUDA.

from practical-pytorch.

michaelklachko avatar michaelklachko commented on May 28, 2024

To run char-rnn-generation on 4 GPUs I need to create input batches (characters from multiple chunks), and change the leading dimension of inputs and outputs in forward function to batch_size, correct?

from practical-pytorch.

spro avatar spro commented on May 28, 2024

That sounds right... char-rnn.pytorch might help as it has batching & CUDA support, but I haven't tried it on multiple GPUs.

from practical-pytorch.

michaelklachko avatar michaelklachko commented on May 28, 2024

I implemented batching, but can't get it to work on multiple GPUs. Here's my code.

Using batch_size=64, I want to give each of my four GPUs 16 training samples, but instead I get this error:

Traceback (most recent call last):
  File "char-rnn.py", line 185, in <module>
    output, hidden = net(c, hidden, batch_size)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: size '[64 x 1]' is invalid for input of with 16 elements at /py/conda-bld/pytorch_1490981920203/work/torch/lib/TH/THStorage.c:59

So it seems like each GPU (or just one GPU, not sure) gets full batch of 64 samples. Any ideas?

from practical-pytorch.

spro avatar spro commented on May 28, 2024

It looks like one caveat of DataParallel is you need the first dimension of every input to be the batch size (looking at https://github.com/pytorch/examples/pull/80/files) so maybe try transposing within the RNN module. Currently due to .t() the input and target data are seq-first, and the hidden state of the RNN will be seq-first by default. You could also try the batch_first argument of GRU. Beyond that I can't be too helpful because I only have one GPU 😞

from practical-pytorch.

michaelklachko avatar michaelklachko commented on May 28, 2024

Ok, I used batch_first=True, and have my inputs and outputs in correct shape, but now it complains about hidden, which as you said is seq-first by default.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers 
        
        self.encoder = nn.Embedding(input_size, hidden_size) #first arg is dictionary size
        self.GRU = nn.GRU(hidden_size, hidden_size, n_layers, batch_first=True)  
        self.decoder = nn.Linear(hidden_size, output_size)
        
    def forward(self, input, hidden, batch_size):
        input = self.encoder(input)
        output, hidden = self.GRU(input, hidden) 
        output = self.decoder(output) 
        return output, hidden
    
    def init_hidden(self, batch_size):
        return Variable(torch.randn(self.n_layers, batch_size, self.hidden_size).cuda())
Traceback (most recent call last):
  File "char-rnn.py", line 232, in <module>
    output = net(c, batch_size)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: Expected hidden size (2, 16L, 256), got (2L, 64L, 256L)

Any ideas on what to do about this hidden tensor?

from practical-pytorch.

spro avatar spro commented on May 28, 2024

Could be related to the init_hidden method, it might be easier to switch from hidden = net.module.init_hidden(batch_size) to hidden = None there and let the GRU handle it.

from practical-pytorch.

michaelklachko avatar michaelklachko commented on May 28, 2024

I thought so too, so I modified your code like this: https://gist.github.com/michaelklachko/540428fc112f5a6b06e842bb6a3f5e1e

However, getting the same error, looks like the hidden tensor inside the forward function must have batch_size as first dim as well. I don't quite understand how they managed to get it working in the OpenNMT code...

from practical-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.