pytorch / examples Goto Github PK

A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.

License: BSD 3-Clause "New" or "Revised" License

Python 83.29% CMake 1.22% C++ 11.66% Shell 3.83%

examples's Issues

ImageNet preprocessing

Hi, in your ImageNet main.py code, you do not scale the training images to [0, 1], but go on to do normalization with means and std in this scale:

 normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                     std=[0.229, 0.224, 0.225])

but you do scale the validation images before normalisation:

transforms.Scale(256),

Any reason why?

nvprof can't terminate gracefully

Running the Word_language_model for testing GPUs.

command:

nvprof -o profile.out python main.py --epochs 2 --cuda

the nvprof won't terminate after the training finishes. And the size of the output file profile.out keeps growing up.

When I press CTRL+C, it prints following logs:

Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

Perhaps some code in pytorch's cuda backend should be modified.

stuck at test ImageNet?

Hi, I run ImageNet training successfully for 1 epoch but then it got stuck at testing, with no error message, did this happen to you?

multi gpu training with different subprocesses

Hello, I was wondering whether it would be possible to have a small example of code where a same network is cloned on different GPUs, with all clones sharing the same parameters.

For instance, I would like something where different subprocesses can train the model separately (like 8 subprocesses, each responsible for training a model on one GPU). The updates could then be accumulated to a common network, and all GPU network clones could synchronize their parameters to the ones of the common network periodically, or something like this.

DCGAN example doesn't work with different image sizes

I'm trying to use this code as a starting point for building GANs from my own image data-- 512x512 grayscale images. If I change any of the default arguments (e.g. --imageSize 512) I get the following error:

Traceback (most recent call last):
  File "main.py", line 209, in <module>
    errD_real = criterion(output, label)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
    return backend_fn(self.size_average, weight=self.weight)(input, target)
  File "/opt/python/lib/python3.6/site-packages/torch/nn/_functions/thnn/loss.py", line 22, in forward
    assert input.nelement() == target.nelement()
AssertionError

Still learning my way around PyTorch so the network architectures that are spit out before the above message don't yet give me much intuition. I appreciate any pointers you can give!

Data loading implementation

Is their any examples with the usage of Multiprocessing? It should be faster to use one thread or process to generate batch and manipulate the data.

AssertionError: assert not torch.is_tensor(self.noise) in backward of dropout

Actually a pytorch issue rather than an issue with the examples. It is noted there, so this issue can be ignored: pytorch/pytorch#467

Learning rate decay

Line of word language modeling 177 should be dividing the learning rate by 4.0 and not 4 (float vs integer) for proper decay of learning rate.

https://github.com/pytorch/examples/blob/master/word_language_model/main.py#L177

why not use pooling layer in D model？

I see you using convolution to reduce the size of the image instead of pooling.
Is the pooling operating performance bad?

memoryEfficientLoss does not split the batch dimension

memoryEfficientLoss does not split the batch dimension, which is 1.

https://github.com/pytorch/examples/blob/master/OpenNMT/train.py#L138

mnist_hogwild - breaking gradient sharing

The loop which is supposed to be breaking gradient sharing in mnist_hogwild doesn't seem to be doing anything. param.grad is not None evaluates to false, since param.grad is allocated lazily, in the subprocesses. There, every process allocates gradient tensors separately (i think?), so there might be no need for breaking gradient sharing manually at all.

OpenNMT: training speed using target token/s instead of source token/s

Hi,

I just realized that train.py is printing speed using target token per second (cf. train.py#184).

It turns out that, for the same process, OpenNMT (LUA) is printing source token/s.

This is quite miss-leading for people benchmarking both solutions.

Note that I the case of text summarization, target token/s and source token/s are way different (5 to 10 times). At first, I saw that, for the exact same task (and same amount of parameters, same batchsize), PyONMT was 9x "slower" than LUA ONMT.

I'm not sure of the reason why you chose tgt token/s but it would be easier for user to have the same metric I guess.

Ty for pytorch, ty for pyONMT, very nice work here :)
pltrdy

In DCGAN example, when training G, D is not freezed

Unlike in GAN paper, there is no

for p in netD.parameters():
    p.requires_grad = False # to avoid computation

when updating generator. Is this on purpose or by mistake?

In @soumith's torch reference implementation, D is fixed when updating G:

   local df_do = criterion:backward(output, label)
   local df_dg = netD:updateGradInput(input, df_do)
   netG:backward(noise, df_dg)

'ImportError: No module named gym' after having done full install of gym in Ubuntu 14.04

I have installed both torch and gym with

apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig

what can be wrong when running
/examples/reinforcement_learning$ python reinforce.py
Same occurs for torch module, after it has been installed, when I run
reinforcement_q_learning.ipynb

ImageNet example is falling apart in multiple ways

I am experimenting with Soumith's ImageNet example, but it is crashing or deadlocking in three different ways. I have added a bunch of "print" statements to it to figure out where it is crashing, and here is the GIST of full script: (as you can see, there are almost no significant modifications to the original code.) All code is running on 2x NVidia Titan X 12 GB cards with 96 GB RAM.

https://gist.github.com/FuriouslyCurious/81742b8126f07f919522a588147e6086

Issue 1: transforms.Scale(512) fails in THCTensorMathBlas.cu:241

How to reproduce:

Images are being fed with transforms.Scale(512) or transforms.Scale(1024)
Source images are 2048x2048.
Workers >= 1
Batchsize >= 2
Script will crash on its own in few minutes

Output

 python train.py -a resnet18 -j 1 -b 2 /home/FC/data/P/
=> Parsing complete...
=> creating model 'resnet18'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
Traceback (most recent call last):
  File "train.py", line 299, in <module>
    main()
  File "train.py", line 140, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "train.py", line 177, in train
    output = model(input_var)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 92, in forward
    outputs = self.parallel_apply(replicas, scattered, gpu_dicts)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 102, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 50, in parallel_apply
    raise output
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 30, in _worker
    output = module(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torchvision-0.1.6-py3.5.egg/torchvision/models/resnet.py", line 150, in forward
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 54, in forward
    return self._backend.Linear()(input, self.weight, self.bias)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 10, in forward
    output.addmm_(0, 1, input, weight.t())
RuntimeError: size mismatch at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488757768560/work/torch/lib/THC/generic/THCTensorMathBlas.cu:241

Issue 2: Multiple worker threads deadlock in index_queue.get() and waiter.acquire()

How to reproduce:

Images are being fed with default crop: transforms.RandomSizedCrop(224)
Source images are 2048x2048.
Workers > 2
Batchsize > 40
When you see GPU clock speed fall to resting MHz on NVidia-smi, script has deadlocked in waiter.acquire() and index_queue.get(). Abort the script manually.

python train.py -a resnet18 /home/FC/data/P
=> Parsing complete...
=> creating model 'resnet18'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
^CProcess Process-4:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
  File "train.py", line 299, in <module>
    main()
  File "train.py", line 140, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "train.py", line 168, in train
    for i, (input, target) in enumerate(train_loader):
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 168, in __next__
    idx, batch = self.data_queue.get()
  File "/conda3/envs/idp/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/conda3/envs/idp/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
Traceback (most recent call last):
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 26, in _worker_loop
    r = index_queue.get()
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/queues.py", line 342, in get
    with self._rlock:
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
    return self._semlock.__enter__()
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
    self.run()
  File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 26, in _worker_loop
    r = index_queue.get()

Issue 3: Single Worker thread hangs in threading.py:293 waiter.acquire()

How to reproduce:

Images are being fed with NO crop or scale
Source images are 2048x2048.
Workers >= 1
Batchsize >= 1
When you see GPU clock speed fall to resting MHz on NVidia-smi, script has stalled in waiter.acquire(). Manually abort the script.

python train.py -a resnet152 -j 1 -b 1 /home/FC/data/P/
=> Parsing complete...
=> creating model 'resnet152'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
^CTraceback (most recent call last):
  File "train.py", line 298, in <module>
    main()
  File "train.py", line 139, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "train.py", line 167, in train
    for i, (input, target) in enumerate(train_loader):
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 168, in __next__
    idx, batch = self.data_queue.get()
  File "/conda3/envs/idp/lib/python3.5/queue.py", line 164, in get
    self.not_empty.wait()
  File "/conda3/envs/idp/lib/python3.5/threading.py", line 293, in wait
    waiter.acquire()
KeyboardInterrupt

Forgot to add optimizer in word_language_model/main.py?

I didn't see optimizer used in training code. A bug?

Variable should be volatile when decoding.

translate.py can't be run with the same batch size as train.py. The reason is that intermediate variables are not destroyed without backprop and cause memory overflow. Specifying the input variable to be volatile solves this problem.

https://github.com/pytorch/examples/blob/master/OpenNMT/onmt/Dataset.py#L31

Add a basic RL example

Starting work on a basic OpenAI gym RL example in a fork.
Based on torch-twrl and several basic tensorflow RL examples.
Details to come.

No license

Would it be possible to add a MIT/BSD license to these examples. It's hard to use these as a starting point without clear license guidance.

addmm_ TypeError when running vae example with cuda

Hi !
First, thanks for the great work on the provided examples. I enjoyed playing around with both the mnist and the dcgan examples !
On the vae example I ran into the following issue.
It works fine on the CPU, but when I run it on a GPU device with cuda installed I obtain the following stacktrace

Traceback (most recent call last):
  File "main.py", line 130, in <module>
    train(epoch)
  File "var.py", line 102, in train
    recon_batch, mu, logvar = model(data)
  File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "var.py", line 67, in forward
    mu, logvar = self.encode(x.view(-1, 784))
  File "var.py", line 54, in encode
    h1 = self.relu(self.fc1(x))
  File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/module.py", line 210, in __call__
    result = self.forward(*input, **kwargs)
  File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 53, in forward
    return self._backend.Linear()(input, self.weight, self.bias)
  File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 10, in forward
    output.addmm_(0, 1, input, weight.t())
TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:
 * (torch.FloatTensor mat1, torch.FloatTensor mat2)
 * (torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
 * (float beta, torch.FloatTensor mat1, torch.FloatTensor mat2)
 * (float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
 * (float beta, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
 * (float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
 * (float beta, float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
 * (float beta, float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)

The weights being a cuda tensor instead of a regular one seems to be the problem and I haven't found a way around it yet.

I would greatly appreciate any hint if you have an idea on how to fix this.

All the best,

Siamese/Triplet network

Hi! About starting to work on siamese/triplet architectures. Is there any consideration to take into account? Regarding the loss functions, is there any example using the autograd stuff?

generate.py blows up GPU memory

When I ran language model example, generate.py blew up GPU memory as it generated sentences (starting from ~500MB to ~4GB). In the end I got out of memory error: RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.5_1479441063232/work/torch/lib/THC/generic/THCStorage.cu:65 .

Some info:

Ubuntu 14.04
Geforce GTX 970 (Memory size: 4GB)
Python 3.5.2

cc: @adamlerer

ImageNet example for single image classification

Hi Soumith,

Do you have an example for single image classification code? I am trying to load a checkpointed model and classify a single image with this code, but I get a "3D tensor expected" error.

Code:

# Bunch of imports go here

# Convert image to Variable
def Torchify( aImage ):
    ptLoader = transforms.Compose([transforms.ToTensor()])
    aImage = ptLoader( aImage ).float()
    aImage = Variable( aImage, volatile=True  )
    return aImage.cuda()

# Load model from Checkpoint
print("=> Loading Network")
ptModelAxial = densenet.__dict__['densenet161'](pretrained=False, num_classes=5)
ptModelAxial.classifier = nn.Linear(8832, 5)
ptModelAxial = torch.nn.DataParallel(ptModelAxial).cuda()
dTemp = torch.load("best.pth.tar")
ptModelAxial.load_state_dict(dTemp['state_dict'])
for p in ptModelAxial.parameters():
    p.requires_grad = False
ptModelAxial.eval()

InputImg = skimage.img_as_float(skimage.io.imread(sFileName))
ptModelPreds = ptModelAxial( Torchify(InputImg) )
print( ptModelPreds )

Error message:

Traceback (most recent call last):
  File "extract.py", line 298
      ptModelPreds = ptModelAxial( Torchify(InputImg) )
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
    raise output
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 25, in _worker
    output = module(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/keyur/kaggle/densenet.py", line 153, in forward
    features = self.features(x)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/container.py", line 64, in forward
    input = module(input)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
    result = self.forward(*input, **kwargs)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 237, in forward
    self.padding, self.dilation, self.groups)
  File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/functional.py", line 39, in conv2d
    return f(input, weight, bias)
RuntimeError: expected 3D tensor

RL Algorithms

@ludc and @korymath are interested in building out some RL algorithms and doing OpenAI Gym integration.

Kory from his repo: https://github.com/korymath/examples/tree/master/rl hasn't yet started on anything concrete.

If each of you declares here what you are doing, before you start developing it, then i think the other person can avoid overlap.

imagenet example training gets slower over time.

It seems that as I do training, the per batch time gets slower and slower.

For example, when I run CUDA_VISIBLE_DEVICES=0 python main.py -a alexnet --lr 0.01 --workers 22 /ssd/cv_datasets/ILSVRC2015/Data/CLS-LOC.

Initially I get an average per batch time of about 0.25s

After several batches, I get 0.5s.

I top and find that most of memory (128GB) is occupied

How to fix this?

why is detach necessary

Hi, I am wondering why is detach necessary in this line:

examples/dcgan/main.py

Line 230 in a60bd4e

output = netD(fake.detach())

I understand that we want to update the gradients of netD without changin the ones of netG. But if the optimizer is only using the parameters of netD, then only its weight will be updated. Am I missing something here?
Thanks in advance!

Adjusting learning rate

I was not able to get the adjust learning rate working unless I change the code at line 266 in main.py from

for param_group in optimizer.state_dict()['param_groups']:

for param_group in optimizer.param_groups:

training will randomly freeze for training AlexNet from scratch.

sometimes, the training process will simply get stuck at testing.

Epoch: [0][5000/5005]   Time 0.100 (0.335)      Data 0.000 (0.244)      Loss 5.9800 (6.5614)    Prec@1 1.953 (0.735)    Prec@5 7.812 (2.896)
Test: [0/196]   Time 7.905 (7.905)      Loss 4.1344 (4.1344)    Prec@1 16.016 (16.016)  Prec@5 51.562 (51.562)

Or, more frequently, the line Test: [0/196] won't appear and the whole process gets stuck at line Epoch: [0][5000/5005]

it has been like so for several hours, and by looking at top, no processes are using CPU.

I called CUDA_VISIBLE_DEVICES=1 PYTHONUNBUFFERED=1 python main.py -a alexnet --print-freq 20 --lr 0.01 --workers 20 --batch-size 256 /ssd/cv_datasets/ILSVRC2015/Data/CLS-LOC 2>&1 | tee alexnet_train.log to train the network.

This appears both on a CentOS 6 machine as well as a Ubuntu 14.04 machine.

Implementation of Bottle

Isn't the implementation of Bottle of examples/snli/model.py not finished yet?
It would be cool if we have same function of torch's nn.Bottle( ) which allows varying dimensionality input. :-)

OpenNMT: hidden state not updated in the decoder

Hi,

I was trying the OpenNMT example.
It seems that the hidden state of the decoder is not updated for each step. Models.py L118

I tried changing
output, h = self.rnn(emb_t, hidden)
to
output, h = self.rnn(emb_t, h)
and added h = hidden before the loop.

Both training and validation perplexities improved after the change.

why treating Alexnet/VGG differently in ImageNet example?

in https://github.com/pytorch/examples/blob/master/imagenet/main.py#L68-L72, it seems that special care has to be taken when wrapping the module with DataParallel. Why is this the case? Also, I don't understand why for AlexNet and VGG, features is wrapped, yet classifier is not.

Classifying test instances with the SNLI model loaded from snapshot

I train the SNLI model with the example training code and then I try to load up sample of the test dataset (~ 100 instances) and classify them using the code below. The train accuracy I get is 35%. That's too low since the best trained model gets around 78% on the validation set so sth must be wrong. I tried to classify a sample of the training set as well and the model performed poorly so that was confirmation that sth is not working. Is it sth with my code?

Another weird thing is len(answers.vocab) returns 4 although it should be 3 since the labels are neutral, entail, contradict. This does not affect much since none of the predicted labels refer to unk which is the other extra label in answers.vocab

inputs = data.Field(lower=args.lower)
answers = data.Field(sequential=False) 

test = data.TabularDataset( path=[path/to/test/data], format='json', fields={'sentence1': ('premise', inputs),
                                                                    'sentence2': ('hypothesis', inputs),
                                                                    'gold_label': ('label', answers)},
                                                            filter_pred=lambda ex: ex.label != '-' )

inputs.build_vocab(test)
inputs.vocab.vectors = torch.load([path/to/vector/cached])
answers.build_vocab(test)

# test iterator has batch size equal to length --> full test set
# # test_iter = data.BucketIterator(test, batch_size=len(test), device=args.gpu, sort_key=lambda ex: len(ex.premise) + len(ex.hypothesis))
test_iter = data.Iterator(test, batch_size=len(test), device=args.gpu)

model = torch.load([path/to/model/snapshot], map_location=lambda storage, location: storage.cuda(args.gpu))

test_full_batch =  next(iter(test_iter))  # there will be only 1 batch

predicted_scores = model(test_full_batch)
predicted_labels = torch.max(predicted_scores, 1)[1].view(test_full_batch.label.size()).data
n_correct = (predicted_labels == test_full_batch.label.data).sum()
n_total = test_full_batch.batch_size
train_acc = 100. * n_correct/n_total
print("train accuracy - %f" % train_acc)

How To Correctly Kill MultiProcesses During Multi-GPU Training

During the training of using examples/imagenet/main.py, I used the following command:

CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python main.py [options] path/to/imagenetdir 1>a.log 2>a.err &

Then it starts 5 processes in the system, 1 main process appears in nvidia-smi.

Most of the Time (90% of the time) after I first kill the main process, GPU usage down to 0% so I can kill the other 4 to release GPU Mem to start a new training task. Sometimes (10% of the time), after I killed these 5 processes, the main process remained to be "python [defunct]" that cannot be killed even by sudo kill -s 9. The usage of GPU AND the GPU mem are not released.

Multi-gpu training happened at where I use the following line in my code:

model = torch.nn.DataParallel(model).cuda()

Please give some hint on "how to correctly kill multi-gpu training pytorch process[es]."

Thanks.

Data not converted to cuda in VAE examples(VAE/main.py)

In the VAE example, although the model is converted to cuda if cuda support is present, the data and the loss functions are not converted t0 cuda, resulting in an error when training on GPU.

[Request] Differentiable Neural Computer

A barebones port from https://github.com/Mostafa-Samir/DNC-tensorflow showing just the read/write system in pytorch would be fantastic.

Thanks!

LSTM language model baseline gap

The test ppl didn't reach the ppl of 113 as documented.

System

GTX 1070
Driver Version: 367.57
cuDNN: 5
CUDA: 8.0
Intel i7 3770

| epoch   1 |   200/ 2323 batches | lr 20.00 | ms/batch 15.86 | loss  6.78 | ppl   883.54
| epoch   1 |   400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  6.11 | ppl   451.70
| epoch   1 |   600/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  5.81 | ppl   332.98
| epoch   1 |   800/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  5.65 | ppl   283.32
| epoch   1 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.53 | ppl   252.06
| epoch   1 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.47 | loss  5.45 | ppl   232.68
| epoch   1 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.29 | ppl   197.84
| epoch   1 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.40 | loss  5.27 | ppl   193.50
| epoch   1 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  5.26 | ppl   192.84
| epoch   1 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.52 | loss  5.11 | ppl   165.52
| epoch   1 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  5.00 | ppl   149.01
-----------------------------------------------------------------------------------------
| end of epoch   1 | time: 24.19s | valid loss  5.15 | valid ppl   172.34
-----------------------------------------------------------------------------------------
| epoch   2 |   200/ 2323 batches | lr 20.00 | ms/batch  9.50 | loss  5.01 | ppl   150.18
| epoch   2 |   400/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  5.07 | ppl   159.75
| epoch   2 |   600/ 2323 batches | lr 20.00 | ms/batch  9.48 | loss  4.97 | ppl   143.50
| epoch   2 |   800/ 2323 batches | lr 20.00 | ms/batch  9.71 | loss  4.92 | ppl   137.16
| epoch   2 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.92 | ppl   136.96
| epoch   2 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.89 | ppl   133.62
| epoch   2 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.78 | ppl   118.79
| epoch   2 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.83 | ppl   125.03
| epoch   2 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.87 | ppl   130.80
| epoch   2 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.69 | ppl   109.35
| epoch   2 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.64 | ppl   103.29
-----------------------------------------------------------------------------------------
| end of epoch   2 | time: 22.96s | valid loss  4.96 | valid ppl   142.18
-----------------------------------------------------------------------------------------
| epoch   3 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.67 | ppl   106.62
| epoch   3 |   400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.79 | ppl   120.30
| epoch   3 |   600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.68 | ppl   107.72
| epoch   3 |   800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.65 | ppl   104.60
| epoch   3 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.67 | ppl   106.95
| epoch   3 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.66 | ppl   105.12
| epoch   3 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.55 | ppl    94.70
| epoch   3 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.62 | ppl   101.98
| epoch   3 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.68 | ppl   108.26
| epoch   3 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.48 | ppl    88.55
| epoch   3 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.45 | ppl    85.87
-----------------------------------------------------------------------------------------
| end of epoch   3 | time: 22.89s | valid loss  4.90 | valid ppl   133.71
-----------------------------------------------------------------------------------------
| epoch   4 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.48 | ppl    88.58
| epoch   4 |   400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.63 | ppl   102.72
| epoch   4 |   600/ 2323 batches | lr 20.00 | ms/batch  9.48 | loss  4.52 | ppl    91.82
| epoch   4 |   800/ 2323 batches | lr 20.00 | ms/batch  9.58 | loss  4.50 | ppl    89.90
| epoch   4 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.57 | loss  4.53 | ppl    92.52
| epoch   4 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.59 | loss  4.52 | ppl    91.63
| epoch   4 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.42 | ppl    82.96
| epoch   4 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.50 | ppl    90.31
| epoch   4 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.57 | ppl    96.44
| epoch   4 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.37 | ppl    78.93
| epoch   4 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.34 | ppl    77.00
-----------------------------------------------------------------------------------------
| end of epoch   4 | time: 23.00s | valid loss  4.89 | valid ppl   133.30
-----------------------------------------------------------------------------------------
| epoch   5 |   200/ 2323 batches | lr 20.00 | ms/batch  9.47 | loss  4.38 | ppl    79.91
| epoch   5 |   400/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.53 | ppl    92.42
| epoch   5 |   600/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.42 | ppl    83.08
| epoch   5 |   800/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.40 | ppl    81.46
| epoch   5 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.44 | ppl    84.81
| epoch   5 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.44 | ppl    84.47
| epoch   5 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.34 | ppl    76.87
| epoch   5 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.42 | ppl    83.43
| epoch   5 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.49 | ppl    89.41
| epoch   5 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.30 | ppl    73.41
| epoch   5 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  4.28 | ppl    71.96
-----------------------------------------------------------------------------------------
| end of epoch   5 | time: 22.90s | valid loss  4.89 | valid ppl   132.54
-----------------------------------------------------------------------------------------
| epoch   6 |   200/ 2323 batches | lr 20.00 | ms/batch  9.49 | loss  4.32 | ppl    74.99
| epoch   6 |   400/ 2323 batches | lr 20.00 | ms/batch  9.46 | loss  4.47 | ppl    87.01
| epoch   6 |   600/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.36 | ppl    77.89
| epoch   6 |   800/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.34 | ppl    76.46
| epoch   6 |  1000/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.38 | ppl    79.95
| epoch   6 |  1200/ 2323 batches | lr 20.00 | ms/batch  9.53 | loss  4.37 | ppl    79.05
| epoch   6 |  1400/ 2323 batches | lr 20.00 | ms/batch  9.44 | loss  4.29 | ppl    72.78
| epoch   6 |  1600/ 2323 batches | lr 20.00 | ms/batch  9.43 | loss  4.37 | ppl    79.35
| epoch   6 |  1800/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.44 | ppl    84.42
| epoch   6 |  2000/ 2323 batches | lr 20.00 | ms/batch  9.45 | loss  4.24 | ppl    69.63
| epoch   6 |  2200/ 2323 batches | lr 20.00 | ms/batch  9.42 | loss  4.23 | ppl    68.58
-----------------------------------------------------------------------------------------
| end of epoch   6 | time: 22.92s | valid loss  4.89 | valid ppl   132.85
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss  4.86 | test ppl   128.44

word_language_model: Some suggested arguments for `--model` are not valid

In the word_language_model example, the suggestions for a RNN_TANH or RNN_RELU are not valid arguments. It looks like these haven't been valid members of the API since October 2016: pytorch/pytorch@b5d1329, instead the API determines the type of RNN based on the mode argument to the RNN class.

An easy fix would be to check for either RNN_TANH or RNN_RELU and then pass that as the mode for the class, or use the LSTM or GRU models directly as is done currently in the code (by accessing the classes from getattribute(nn, 'LSTM'/'GRU')

OMP_NUM_THREADS=1

Bad form to need to set magic envvars. I know in Torch there was an issue with other packages going funny, but seems to matter less in pytorch. If that is standard practice, we should do omp_set_num_threads(1) in the code unless overridden by the user.

"CUDA_VISIBLE_DEVICES=2" in mnist example actually disables CUDA.

I'm running the examples/mnist/ code at https://github.com/pytorch/examples/tree/master/mnist.

When I follow the README's instructions I get (running from bash):

(py35) ~/pytorch/examples/mnist$ CUDA_VISIBLE_DEVICES=2 python main.py
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=109 error=38 : no CUDA-capable device is detected
Files already downloaded
Train Epoch: 1 [0/60000 (0%)]	Loss: 2.297210
Train Epoch: 1 [640/60000 (1%)]	Loss: 2.318286
Train Epoch: 1 [1280/60000 (2%)]	Loss: 2.298914
Train Epoch: 1 [1920/60000 (3%)]	Loss: 2.317417
Train Epoch: 1 [2560/60000 (4%)]	Loss: 2.295015
^C

But my cards are enabled...

~/pytorch/examples/mnist$ nvidia-smi
Sun Mar 19 15:50:14 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39                 Driver Version: 375.39                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN X (Pascal)    Off  | 0000:01:00.0      On |                  N/A |
| 23%   27C    P8    10W / 250W |     63MiB / 12186MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN X (Pascal)    Off  | 0000:02:00.0     Off |                  N/A |
| 23%   27C    P8     8W / 250W |      1MiB / 12189MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1094    G   /usr/lib/xorg/Xorg                              60MiB |
+-----------------------------------------------------------------------------+

...and I've been running other PyTorch code that correctly detects my devices, for example,

(py35) ~/pytorch/examples/mnist$ python
Python 3.5.2 |Anaconda 4.3.1 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.cuda.is_available():
...     print("Using CUDA, number of devices = ",torch.cuda.device_count())
... 
Using CUDA, number of devices =  2
>>>

Seems like the use of the flag CUDA_VISIBLE_DEVICES=2 actually disables CUDA, as shown...

(py35) ~/pytorch/examples/mnist$ CUDA_VISIBLE_DEVICES=2 python
Python 3.5.2 |Anaconda 4.3.1 (64-bit)| (default, Jul  2 2016, 17:53:06) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.cuda.is_available():
...     print("Using CUDA, number of devices = ",torch.cuda.device_count())
... else:
...     print("Can't find CUDA")
... 
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=109 error=38 : no CUDA-capable device is detected
Can't find CUDA

Perhaps the README for this example should have that env flag deleted, or....?

questions about LSTM word language model example

Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.

question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.

Please correct me if my understanding is wrong. Thanks!

Loss not normalized by batch size

After switching to batch first, .size(1) is no longer the batch size.

https://github.com/pytorch/examples/blob/master/OpenNMT/train.py#L123

A3C instead of actor-critic in reinforcement_learning/reinforce.py

There is the code of reinforce.py
for action, r in zip(self.saved_actions, rewards): action.reinforce(r)

And there is the code of actor-critic.py:
for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))

So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic

error while sampling from the VAE model

Hi
I am trying to train a variational auto-encoder as in the given example code. The training is working fine, but when I was trying out the decoder by giving it as input a random sample with the same dimensionality and prior as the original latent space (z in the paper), I ran into an error. I was testing the following method from the model:

model.decode()

This is my code and the error, any help is highly appreciated!:
`z = torch.randn(150,16) z.cuda() v = Variable(z) out = model(v)`

TypeError Traceback (most recent call last)
in ()
1 #test the model
2
----> 3 out = model.decode(v)

in decode(self, z)
38 def decode(self, z):
39 #pdb.set_trace()
---> 40 h1 = self.relu(self.fd1(z))
41 h2 = self.relu(self.fd2(h1))
42 h3 = self.fd3d( self.relu( self.fd3( h2 ) ) )

/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in call(self, *input, **kwargs)
208
209 def call(self, *input, **kwargs):
--> 210 result = self.forward(*input, **kwargs)
211 for hook in self._forward_hooks.values():
212 hook_result = hook(self, input, result)

/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/modules/linear.pyc in forward(self, input)
52 return self._backend.Linear()(input, self.weight)
53 else:
---> 54 return self._backend.Linear()(input, self.weight, self.bias)
55
56 def repr(self):

/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/functions/linear.pyc in forward(self, input, weight, bias)
8 self.save_for_backward(input, weight, bias)
9 output = input.new(input.size(0), weight.size(0))
---> 10 output.addmm(0, 1, input, weight.t())
11 if bias is not None:
12 # cuBLAS doesn't support 0 strides in sger, so we can't use expand

TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:

(torch.FloatTensor mat1, torch.FloatTensor mat2)
(torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
(float beta, torch.FloatTensor mat1, torch.FloatTensor mat2)
(float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
(float beta, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
(float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
(float beta, float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
(float beta, float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)

/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/functions/linear.py(10)forward()
8 self.save_for_backward(input, weight, bias)
9 output = input.new(input.size(0), weight.size(0))
---> 10 output.addmm(0, 1, input, weight.t())
11 if bias is not None:
12 # cuBLAS doesn't support 0 strides in sger, so we can't use expand

ResNet Scale transform is buggy

There are several bugs in these lines https://github.com/pytorch/examples/blob/master/imagenet/transforms.py#L48

h and w are integers in python and should be casted to float before division
should be self.size instead of w as in https://github.com/facebook/fb.resnet.torch/blob/master/datasets/transforms.lua#L45

(I only tested this class, was doing some visualizations)

How to load image data if different formats

I am a regular caffe user and all of my datasets are prepared in lmdb format. I would like to have an example to load data of lmdb/hdf5 format? Does pytorch support it?

Thanks

action.reinforce(reward)

What does "action.reinforce(reward)" mean? Does it means gradient descent?

fluctuation in per batch time in ImageNet examples.

I tried to train AlexNet from scratch, and compare the training time using PyTorch and using Caffe. I'm on a Pascal Titan X, using PyTorch 0.1.11 from pip.

It looks to me that the time taken to compute each batch varies, from 0.1s to 2s (worst case). I used 256 batch, and 22 workers. The data is fetched from a PCIe NVMe SSD, so IO should not be an issue (I think?).

Is this expected, or is this something that can be addressed? Thanks.

Epoch: [0][905/5005]    Time 1.317 (0.256)      Data 1.264 (0.168)      Loss 6.8791 (6.9048)    Prec@1 0.000 (0.097)    Prec@5 0.391 (0.498)
Epoch: [0][906/5005]    Time 0.099 (0.255)      Data 0.001 (0.168)      Loss 6.9001 (6.9048)    Prec@1 0.000 (0.096)    Prec@5 0.000 (0.498)
Epoch: [0][907/5005]    Time 0.103 (0.255)      Data 0.000 (0.168)      Loss 6.8702 (6.9047)    Prec@1 0.781 (0.097)    Prec@5 1.562 (0.499)
Epoch: [0][908/5005]    Time 0.102 (0.255)      Data 0.001 (0.167)      Loss 6.8882 (6.9047)    Prec@1 0.000 (0.097)    Prec@5 0.000 (0.498)
Epoch: [0][909/5005]    Time 0.257 (0.255)      Data 0.206 (0.167)      Loss 6.8977 (6.9047)    Prec@1 0.391 (0.097)    Prec@5 0.391 (0.498)
Epoch: [0][910/5005]    Time 0.102 (0.255)      Data 0.001 (0.167)      Loss 6.8973 (6.9047)    Prec@1 0.000 (0.097)    Prec@5 0.000 (0.498)
Epoch: [0][911/5005]    Time 0.603 (0.255)      Data 0.552 (0.168)      Loss 6.8929 (6.9047)    Prec@1 0.000 (0.097)    Prec@5 0.391 (0.498)
Epoch: [0][912/5005]    Time 0.101 (0.255)      Data 0.001 (0.168)      Loss 6.8911 (6.9047)    Prec@1 0.000 (0.097)    Prec@5 0.781 (0.498)
Epoch: [0][913/5005]    Time 0.497 (0.255)      Data 0.445 (0.168)      Loss 6.8757 (6.9046)    Prec@1 0.000 (0.097)    Prec@5 1.172 (0.499)
Epoch: [0][914/5005]    Time 0.111 (0.255)      Data 0.001 (0.168)      Loss 6.8713 (6.9046)    Prec@1 0.000 (0.097)    Prec@5 1.172 (0.499)
Epoch: [0][915/5005]    Time 0.100 (0.255)      Data 0.001 (0.167)      Loss 6.8851 (6.9046)    Prec@1 0.000 (0.097)    Prec@5 0.000 (0.499)
Epoch: [0][916/5005]    Time 0.106 (0.255)      Data 0.001 (0.167)      Loss 6.8716 (6.9045)    Prec@1 0.000 (0.097)    Prec@5 1.172 (0.500)
Epoch: [0][917/5005]    Time 0.156 (0.255)      Data 0.105 (0.167)      Loss 6.9136 (6.9046)    Prec@1 0.000 (0.097)    Prec@5 0.391 (0.500)
Epoch: [0][918/5005]    Time 0.102 (0.255)      Data 0.001 (0.167)      Loss 6.8948 (6.9045)    Prec@1 0.000 (0.096)    Prec@5 0.000 (0.499)
Epoch: [0][919/5005]    Time 0.101 (0.255)      Data 0.001 (0.167)      Loss 6.8860 (6.9045)    Prec@1 0.000 (0.096)    Prec@5 1.172 (0.500)
Epoch: [0][920/5005]    Time 0.101 (0.254)      Data 0.001 (0.167)      Loss 6.8774 (6.9045)    Prec@1 0.000 (0.096)    Prec@5 0.391 (0.500)
Epoch: [0][921/5005]    Time 0.108 (0.254)      Data 0.001 (0.167)      Loss 6.8833 (6.9045)    Prec@1 0.391 (0.097)    Prec@5 0.391 (0.500)
Epoch: [0][922/5005]    Time 0.163 (0.254)      Data 0.106 (0.166)      Loss 6.8969 (6.9045)    Prec@1 0.000 (0.096)    Prec@5 0.000 (0.499)
Epoch: [0][923/5005]    Time 0.251 (0.254)      Data 0.194 (0.166)      Loss 6.8844 (6.9044)    Prec@1 0.000 (0.096)    Prec@5 0.391 (0.499)

I noticed this because when I print timing info every 10 epochs, it seems that the average value is far from the value of the current batch.

Reasons for the two random generators

I saw the codes here, https://github.com/pytorch/examples/blob/master/mnist/main.py#L31-L33

torch.manual_seed(args.seed)
if args.cuda:
    torch.cuda.manual_seed(args.seed)

Why there are two random generators? Any reasons for using this in the design of PyTorch?

Speed up attention

Attention can be sped up by pre-computing the linear transformation of context, instead of feeding it to the attention at every step.

https://github.com/pytorch/examples/blob/master/OpenNMT/onmt/Models.py#L119

pytorch / examples Goto Github PK

examples's Issues

Issue 1: transforms.Scale(512) fails in THCTensorMathBlas.cu:241

Issue 2: Multiple worker threads deadlock in index_queue.get() and waiter.acquire()

Issue 3: Single Worker thread hangs in threading.py:293 waiter.acquire()

System

This is my code and the error, any help is highly appreciated!: z = torch.randn(150,16) z.cuda() v = Variable(z) out = model(v)

Recommend Projects

Recommend Topics

Recommend Org

This is my code and the error, any help is highly appreciated!:
`z = torch.randn(150,16) z.cuda() v = Variable(z) out = model(v)`