Thank you for your tutorial, It's very helpful to me. however, I have a puzzle abo

That sounds right... <a href="https://github.com/spro/char-rnn.pytorch/blob/master/tra

I have a puzzle about how to use cuda correctly about practical-pytorch HOT 13 CLOSED

spro commented on May 28, 2024

I have a puzzle about how to use cuda correctly

from practical-pytorch.

Comments (13)

spro commented on May 28, 2024 1

Yes, you should see a greater improvement by using larger batches and/or a larger model.

from practical-pytorch.

spro commented on May 28, 2024

The tutorial does not use CUDA yet - usually you have to do something like

tensor = tensor.cuda()

for every tensor and model, to move it over to GPU. I will be updating it soon to include this.

from practical-pytorch.

spro commented on May 28, 2024

I just updated it to include a USE_CUDA variable

from practical-pytorch.

czs0x55aa commented on May 28, 2024

Thank you for your help :）
I tried to add .cuda() in code, it‘s only a slight acceleration for the run time. Due to the use of GPU is not high, I guess it maybe be related to batch_size.

from practical-pytorch.

michaelklachko commented on May 28, 2024

Was this change reverted? I don't see any CUDA commands in the code...

from practical-pytorch.

spro commented on May 28, 2024

Currently only the seq2seq tutorial uses CUDA.

from practical-pytorch.

michaelklachko commented on May 28, 2024

To run char-rnn-generation on 4 GPUs I need to create input batches (characters from multiple chunks), and change the leading dimension of inputs and outputs in forward function to batch_size, correct?

from practical-pytorch.

spro commented on May 28, 2024

That sounds right... char-rnn.pytorch might help as it has batching & CUDA support, but I haven't tried it on multiple GPUs.

from practical-pytorch.

michaelklachko commented on May 28, 2024

I implemented batching, but can't get it to work on multiple GPUs. Here's my code.

Using batch_size=64, I want to give each of my four GPUs 16 training samples, but instead I get this error:

Traceback (most recent call last):
  File "char-rnn.py", line 185, in <module>
    output, hidden = net(c, hidden, batch_size)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: size '[64 x 1]' is invalid for input of with 16 elements at /py/conda-bld/pytorch_1490981920203/work/torch/lib/TH/THStorage.c:59

So it seems like each GPU (or just one GPU, not sure) gets full batch of 64 samples. Any ideas?

from practical-pytorch.

spro commented on May 28, 2024

It looks like one caveat of DataParallel is you need the first dimension of every input to be the batch size (looking at https://github.com/pytorch/examples/pull/80/files) so maybe try transposing within the RNN module. Currently due to .t() the input and target data are seq-first, and the hidden state of the RNN will be seq-first by default. You could also try the batch_first argument of GRU. Beyond that I can't be too helpful because I only have one GPU 😞

from practical-pytorch.

michaelklachko commented on May 28, 2024

Ok, I used batch_first=True, and have my inputs and outputs in correct shape, but now it complains about hidden, which as you said is seq-first by default.

class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size, n_layers):
        super(RNN, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.n_layers = n_layers 
        
        self.encoder = nn.Embedding(input_size, hidden_size) #first arg is dictionary size
        self.GRU = nn.GRU(hidden_size, hidden_size, n_layers, batch_first=True)  
        self.decoder = nn.Linear(hidden_size, output_size)
        
    def forward(self, input, hidden, batch_size):
        input = self.encoder(input)
        output, hidden = self.GRU(input, hidden) 
        output = self.decoder(output) 
        return output, hidden
    
    def init_hidden(self, batch_size):
        return Variable(torch.randn(self.n_layers, batch_size, self.hidden_size).cuda())

Traceback (most recent call last):
  File "char-rnn.py", line 232, in <module>
    output = net(c, batch_size)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/root/miniconda2/lib/python2.7/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
RuntimeError: Expected hidden size (2, 16L, 256), got (2L, 64L, 256L)

Any ideas on what to do about this hidden tensor?

from practical-pytorch.

spro commented on May 28, 2024

Could be related to the init_hidden method, it might be easier to switch from hidden = net.module.init_hidden(batch_size) to hidden = None there and let the GRU handle it.

from practical-pytorch.

michaelklachko commented on May 28, 2024

I thought so too, so I modified your code like this: https://gist.github.com/michaelklachko/540428fc112f5a6b06e842bb6a3f5e1e

However, getting the same error, looks like the hidden tensor inside the forward function must have batch_size as first dim as well. I don't quite understand how they managed to get it working in the OpenNMT code...

from practical-pytorch.

I have a puzzle about how to use cuda correctly about practical-pytorch HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent