pytorch / examples Goto Github PK
View Code? Open in Web Editor NEWA set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Home Page: https://pytorch.org/examples
License: BSD 3-Clause "New" or "Revised" License
A set of examples around pytorch in Vision, Text, Reinforcement Learning, etc.
Home Page: https://pytorch.org/examples
License: BSD 3-Clause "New" or "Revised" License
Hi, in your ImageNet main.py code, you do not scale the training images to [0, 1], but go on to do normalization with means and std in this scale:
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
but you do scale the validation images before normalisation:
transforms.Scale(256),
Any reason why?
Running the Word_language_model
for testing GPUs.
command:
nvprof -o profile.out python main.py --epochs 2 --cuda
the nvprof
won't terminate after the training finishes. And the size of the output file profile.out
keeps growing up.
When I press CTRL+C, it prints following logs:
Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
Perhaps some code in pytorch
's cuda backend should be modified.
Hi, I run ImageNet training successfully for 1 epoch but then it got stuck at testing, with no error message, did this happen to you?
Hello, I was wondering whether it would be possible to have a small example of code where a same network is cloned on different GPUs, with all clones sharing the same parameters.
For instance, I would like something where different subprocesses can train the model separately (like 8 subprocesses, each responsible for training a model on one GPU). The updates could then be accumulated to a common network, and all GPU network clones could synchronize their parameters to the ones of the common network periodically, or something like this.
I'm trying to use this code as a starting point for building GANs from my own image data-- 512x512 grayscale images. If I change any of the default arguments (e.g. --imageSize 512
) I get the following error:
Traceback (most recent call last):
File "main.py", line 209, in <module>
errD_real = criterion(output, label)
File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/opt/python/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 36, in forward
return backend_fn(self.size_average, weight=self.weight)(input, target)
File "/opt/python/lib/python3.6/site-packages/torch/nn/_functions/thnn/loss.py", line 22, in forward
assert input.nelement() == target.nelement()
AssertionError
Still learning my way around PyTorch so the network architectures that are spit out before the above message don't yet give me much intuition. I appreciate any pointers you can give!
Is their any examples with the usage of Multiprocessing? It should be faster to use one thread or process to generate batch and manipulate the data.
Actually a pytorch issue rather than an issue with the examples. It is noted there, so this issue can be ignored: pytorch/pytorch#467
Line of word language modeling 177 should be dividing the learning rate by 4.0 and not 4 (float vs integer) for proper decay of learning rate.
https://github.com/pytorch/examples/blob/master/word_language_model/main.py#L177
I see you using convolution to reduce the size of the image instead of pooling.
Is the pooling operating performance bad?
memoryEfficientLoss does not split the batch dimension, which is 1.
https://github.com/pytorch/examples/blob/master/OpenNMT/train.py#L138
The loop which is supposed to be breaking gradient sharing in mnist_hogwild doesn't seem to be doing anything. param.grad is not None
evaluates to false, since param.grad
is allocated lazily, in the subprocesses. There, every process allocates gradient tensors separately (i think?), so there might be no need for breaking gradient sharing manually at all.
Hi,
I just realized that train.py
is printing speed using target token per second (cf. train.py#184).
It turns out that, for the same process, OpenNMT (LUA) is printing source token/s.
This is quite miss-leading for people benchmarking both solutions.
Note that I the case of text summarization, target token/s and source token/s are way different (5 to 10 times). At first, I saw that, for the exact same task (and same amount of parameters, same batchsize), PyONMT was 9x "slower" than LUA ONMT.
I'm not sure of the reason why you chose tgt token/s but it would be easier for user to have the same metric I guess.
Ty for pytorch, ty for pyONMT, very nice work here :)
pltrdy
Unlike in GAN paper, there is no
for p in netD.parameters():
p.requires_grad = False # to avoid computation
when updating generator. Is this on purpose or by mistake?
In @soumith's torch reference implementation, D is fixed when updating G:
local df_do = criterion:backward(output, label)
local df_dg = netD:updateGradInput(input, df_do)
netG:backward(noise, df_dg)
I have installed both torch and gym with
apt-get install -y python-numpy python-dev cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig
what can be wrong when running
/examples/reinforcement_learning$ python reinforce.py
Same occurs for torch module, after it has been installed, when I run
reinforcement_q_learning.ipynb
I am experimenting with Soumith's ImageNet example, but it is crashing or deadlocking in three different ways. I have added a bunch of "print" statements to it to figure out where it is crashing, and here is the GIST of full script: (as you can see, there are almost no significant modifications to the original code.) All code is running on 2x NVidia Titan X 12 GB cards with 96 GB RAM.
https://gist.github.com/FuriouslyCurious/81742b8126f07f919522a588147e6086
How to reproduce:
Output
python train.py -a resnet18 -j 1 -b 2 /home/FC/data/P/
=> Parsing complete...
=> creating model 'resnet18'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
Traceback (most recent call last):
File "train.py", line 299, in <module>
main()
File "train.py", line 140, in main
train(train_loader, model, criterion, optimizer, epoch)
File "train.py", line 177, in train
output = model(input_var)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 92, in forward
outputs = self.parallel_apply(replicas, scattered, gpu_dicts)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 102, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 50, in parallel_apply
raise output
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 30, in _worker
output = module(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torchvision-0.1.6-py3.5.egg/torchvision/models/resnet.py", line 150, in forward
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 202, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 54, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 10, in forward
output.addmm_(0, 1, input, weight.t())
RuntimeError: size mismatch at /data/users/soumith/miniconda2/conda-bld/pytorch-cuda80-0.1.10_1488757768560/work/torch/lib/THC/generic/THCTensorMathBlas.cu:241
How to reproduce:
python train.py -a resnet18 /home/FC/data/P
=> Parsing complete...
=> creating model 'resnet18'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
^CProcess Process-4:
Process Process-3:
Traceback (most recent call last):
Traceback (most recent call last):
File "train.py", line 299, in <module>
main()
File "train.py", line 140, in main
train(train_loader, model, criterion, optimizer, epoch)
File "train.py", line 168, in train
for i, (input, target) in enumerate(train_loader):
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 168, in __next__
idx, batch = self.data_queue.get()
File "/conda3/envs/idp/lib/python3.5/queue.py", line 164, in get
self.not_empty.wait()
File "/conda3/envs/idp/lib/python3.5/threading.py", line 293, in wait
waiter.acquire()
Traceback (most recent call last):
File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 26, in _worker_loop
r = index_queue.get()
File "/conda3/envs/idp/lib/python3.5/multiprocessing/queues.py", line 342, in get
with self._rlock:
File "/conda3/envs/idp/lib/python3.5/multiprocessing/synchronize.py", line 96, in __enter__
return self._semlock.__enter__()
File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 249, in _bootstrap
self.run()
File "/conda3/envs/idp/lib/python3.5/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 26, in _worker_loop
r = index_queue.get()
How to reproduce:
python train.py -a resnet152 -j 1 -b 1 /home/FC/data/P/
=> Parsing complete...
=> creating model 'resnet152'
=> Using CUDA DataParallel
=> Starting training images loading...
=> Starting validation images loading...
=> Loss criterion and optimizer setup
=> Starting training...
=> Training Epoch 0
^CTraceback (most recent call last):
File "train.py", line 298, in <module>
main()
File "train.py", line 139, in main
train(train_loader, model, criterion, optimizer, epoch)
File "train.py", line 167, in train
for i, (input, target) in enumerate(train_loader):
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/utils/data/dataloader.py", line 168, in __next__
idx, batch = self.data_queue.get()
File "/conda3/envs/idp/lib/python3.5/queue.py", line 164, in get
self.not_empty.wait()
File "/conda3/envs/idp/lib/python3.5/threading.py", line 293, in wait
waiter.acquire()
KeyboardInterrupt
I didn't see optimizer used in training code. A bug?
translate.py can't be run with the same batch size as train.py. The reason is that intermediate variables are not destroyed without backprop and cause memory overflow. Specifying the input variable to be volatile solves this problem.
https://github.com/pytorch/examples/blob/master/OpenNMT/onmt/Dataset.py#L31
Starting work on a basic OpenAI gym RL example in a fork.
Based on torch-twrl and several basic tensorflow RL examples.
Details to come.
Would it be possible to add a MIT/BSD license to these examples. It's hard to use these as a starting point without clear license guidance.
Hi !
First, thanks for the great work on the provided examples. I enjoyed playing around with both the mnist and the dcgan examples !
On the vae example I ran into the following issue.
It works fine on the CPU, but when I run it on a GPU device with cuda installed I obtain the following stacktrace
Traceback (most recent call last):
File "main.py", line 130, in <module>
train(epoch)
File "var.py", line 102, in train
recon_batch, mu, logvar = model(data)
File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "var.py", line 67, in forward
mu, logvar = self.encode(x.view(-1, 784))
File "var.py", line 54, in encode
h1 = self.relu(self.fc1(x))
File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/module.py", line 210, in __call__
result = self.forward(*input, **kwargs)
File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/modules/linear.py", line 53, in forward
return self._backend.Linear()(input, self.weight, self.bias)
File "/gpfs/workdir/hassony/virtual-python/lib/python3.5/site-packages/torch/nn/_functions/linear.py", line 10, in forward
output.addmm_(0, 1, input, weight.t())
TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:
* (torch.FloatTensor mat1, torch.FloatTensor mat2)
* (torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
* (float beta, torch.FloatTensor mat1, torch.FloatTensor mat2)
* (float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
* (float beta, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
* (float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
* (float beta, float alpha, torch.FloatTensor mat1, torch.FloatTensor mat2)
* (float beta, float alpha, torch.SparseFloatTensor mat1, torch.FloatTensor mat2)
The weights being a cuda tensor instead of a regular one seems to be the problem and I haven't found a way around it yet.
I would greatly appreciate any hint if you have an idea on how to fix this.
All the best,
Hi! About starting to work on siamese/triplet architectures. Is there any consideration to take into account? Regarding the loss functions, is there any example using the autograd stuff?
When I ran language model example, generate.py
blew up GPU memory as it generated sentences (starting from ~500MB to ~4GB). In the end I got out of memory error: RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.5_1479441063232/work/torch/lib/THC/generic/THCStorage.cu:65
.
Some info:
cc: @adamlerer
Hi Soumith,
Do you have an example for single image classification code? I am trying to load a checkpointed model and classify a single image with this code, but I get a "3D tensor expected" error.
Code:
# Bunch of imports go here
# Convert image to Variable
def Torchify( aImage ):
ptLoader = transforms.Compose([transforms.ToTensor()])
aImage = ptLoader( aImage ).float()
aImage = Variable( aImage, volatile=True )
return aImage.cuda()
# Load model from Checkpoint
print("=> Loading Network")
ptModelAxial = densenet.__dict__['densenet161'](pretrained=False, num_classes=5)
ptModelAxial.classifier = nn.Linear(8832, 5)
ptModelAxial = torch.nn.DataParallel(ptModelAxial).cuda()
dTemp = torch.load("best.pth.tar")
ptModelAxial.load_state_dict(dTemp['state_dict'])
for p in ptModelAxial.parameters():
p.requires_grad = False
ptModelAxial.eval()
InputImg = skimage.img_as_float(skimage.io.imread(sFileName))
ptModelPreds = ptModelAxial( Torchify(InputImg) )
print( ptModelPreds )
Error message:
Traceback (most recent call last):
File "extract.py", line 298
ptModelPreds = ptModelAxial( Torchify(InputImg) )
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 61, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 71, in parallel_apply
return parallel_apply(replicas, inputs, kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 45, in parallel_apply
raise output
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 25, in _worker
output = module(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/home/keyur/kaggle/densenet.py", line 153, in forward
features = self.features(x)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/container.py", line 64, in forward
input = module(input)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/module.py", line 206, in __call__
result = self.forward(*input, **kwargs)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/modules/conv.py", line 237, in forward
self.padding, self.dilation, self.groups)
File "/conda3/envs/idp/lib/python3.5/site-packages/torch/nn/functional.py", line 39, in conv2d
return f(input, weight, bias)
RuntimeError: expected 3D tensor
@ludc and @korymath are interested in building out some RL algorithms and doing OpenAI Gym integration.
Kory from his repo: https://github.com/korymath/examples/tree/master/rl hasn't yet started on anything concrete.
If each of you declares here what you are doing, before you start developing it, then i think the other person can avoid overlap.
It seems that as I do training, the per batch time gets slower and slower.
For example, when I run CUDA_VISIBLE_DEVICES=0 python main.py -a alexnet --lr 0.01 --workers 22 /ssd/cv_datasets/ILSVRC2015/Data/CLS-LOC
.
Initially I get an average per batch time of about 0.25s
After several batches, I get 0.5s.
I top
and find that most of memory (128GB) is occupied
How to fix this?
Hi, I am wondering why is detach necessary in this line:
Line 230 in a60bd4e
I understand that we want to update the gradients of netD without changin the ones of netG. But if the optimizer is only using the parameters of netD, then only its weight will be updated. Am I missing something here?
Thanks in advance!
I was not able to get the adjust learning rate working unless I change the code at line 266 in main.py from
for param_group in optimizer.state_dict()['param_groups']:
to
for param_group in optimizer.param_groups:
sometimes, the training process will simply get stuck at testing.
Epoch: [0][5000/5005] Time 0.100 (0.335) Data 0.000 (0.244) Loss 5.9800 (6.5614) Prec@1 1.953 (0.735) Prec@5 7.812 (2.896)
Test: [0/196] Time 7.905 (7.905) Loss 4.1344 (4.1344) Prec@1 16.016 (16.016) Prec@5 51.562 (51.562)
Or, more frequently, the line Test: [0/196]
won't appear and the whole process gets stuck at line Epoch: [0][5000/5005]
it has been like so for several hours, and by looking at top
, no processes are using CPU.
I called CUDA_VISIBLE_DEVICES=1 PYTHONUNBUFFERED=1 python main.py -a alexnet --print-freq 20 --lr 0.01 --workers 20 --batch-size 256 /ssd/cv_datasets/ILSVRC2015/Data/CLS-LOC 2>&1 | tee alexnet_train.log
to train the network.
This appears both on a CentOS 6 machine as well as a Ubuntu 14.04 machine.
Isn't the implementation of Bottle of examples/snli/model.py not finished yet?
It would be cool if we have same function of torch's nn.Bottle( ) which allows varying dimensionality input. :-)
Hi,
I was trying the OpenNMT example.
It seems that the hidden state of the decoder is not updated for each step. Models.py L118
I tried changing
output, h = self.rnn(emb_t, hidden)
to
output, h = self.rnn(emb_t, h)
and added h = hidden
before the loop.
Both training and validation perplexities improved after the change.
in https://github.com/pytorch/examples/blob/master/imagenet/main.py#L68-L72, it seems that special care has to be taken when wrapping the module with DataParallel
. Why is this the case? Also, I don't understand why for AlexNet and VGG, features
is wrapped, yet classifier
is not.
I train the SNLI model with the example training code and then I try to load up sample of the test dataset (~ 100 instances) and classify them using the code below. The train accuracy I get is 35%. That's too low since the best trained model gets around 78% on the validation set so sth must be wrong. I tried to classify a sample of the training set as well and the model performed poorly so that was confirmation that sth is not working. Is it sth with my code?
Another weird thing is len(answers.vocab) returns 4 although it should be 3 since the labels are neutral, entail, contradict. This does not affect much since none of the predicted labels refer to unk which is the other extra label in answers.vocab
inputs = data.Field(lower=args.lower)
answers = data.Field(sequential=False)
test = data.TabularDataset( path=[path/to/test/data], format='json', fields={'sentence1': ('premise', inputs),
'sentence2': ('hypothesis', inputs),
'gold_label': ('label', answers)},
filter_pred=lambda ex: ex.label != '-' )
inputs.build_vocab(test)
inputs.vocab.vectors = torch.load([path/to/vector/cached])
answers.build_vocab(test)
# test iterator has batch size equal to length --> full test set
# # test_iter = data.BucketIterator(test, batch_size=len(test), device=args.gpu, sort_key=lambda ex: len(ex.premise) + len(ex.hypothesis))
test_iter = data.Iterator(test, batch_size=len(test), device=args.gpu)
model = torch.load([path/to/model/snapshot], map_location=lambda storage, location: storage.cuda(args.gpu))
test_full_batch = next(iter(test_iter)) # there will be only 1 batch
predicted_scores = model(test_full_batch)
predicted_labels = torch.max(predicted_scores, 1)[1].view(test_full_batch.label.size()).data
n_correct = (predicted_labels == test_full_batch.label.data).sum()
n_total = test_full_batch.batch_size
train_acc = 100. * n_correct/n_total
print("train accuracy - %f" % train_acc)
During the training of using examples/imagenet/main.py, I used the following command:
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python main.py [options] path/to/imagenetdir 1>a.log 2>a.err &
Then it starts 5 processes in the system, 1 main process appears in nvidia-smi.
Most of the Time (90% of the time) after I first kill the main process, GPU usage down to 0% so I can kill the other 4 to release GPU Mem to start a new training task. Sometimes (10% of the time), after I killed these 5 processes, the main process remained to be "python [defunct]" that cannot be killed even by sudo kill -s 9. The usage of GPU AND the GPU mem are not released.
Multi-gpu training happened at where I use the following line in my code:
model = torch.nn.DataParallel(model).cuda()
Please give some hint on "how to correctly kill multi-gpu training pytorch process[es]."
Thanks.
In the VAE example, although the model is converted to cuda if cuda support is present, the data and the loss functions are not converted t0 cuda, resulting in an error when training on GPU.
A barebones port from https://github.com/Mostafa-Samir/DNC-tensorflow showing just the read/write system in pytorch would be fantastic.
Thanks!
The test ppl didn't reach the ppl of 113 as documented.
GTX 1070
Driver Version: 367.57
cuDNN: 5
CUDA: 8.0
Intel i7 3770
| epoch 1 | 200/ 2323 batches | lr 20.00 | ms/batch 15.86 | loss 6.78 | ppl 883.54
| epoch 1 | 400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 6.11 | ppl 451.70
| epoch 1 | 600/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 5.81 | ppl 332.98
| epoch 1 | 800/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 5.65 | ppl 283.32
| epoch 1 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.53 | ppl 252.06
| epoch 1 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.47 | loss 5.45 | ppl 232.68
| epoch 1 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.29 | ppl 197.84
| epoch 1 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.40 | loss 5.27 | ppl 193.50
| epoch 1 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 5.26 | ppl 192.84
| epoch 1 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.52 | loss 5.11 | ppl 165.52
| epoch 1 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 5.00 | ppl 149.01
-----------------------------------------------------------------------------------------
| end of epoch 1 | time: 24.19s | valid loss 5.15 | valid ppl 172.34
-----------------------------------------------------------------------------------------
| epoch 2 | 200/ 2323 batches | lr 20.00 | ms/batch 9.50 | loss 5.01 | ppl 150.18
| epoch 2 | 400/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 5.07 | ppl 159.75
| epoch 2 | 600/ 2323 batches | lr 20.00 | ms/batch 9.48 | loss 4.97 | ppl 143.50
| epoch 2 | 800/ 2323 batches | lr 20.00 | ms/batch 9.71 | loss 4.92 | ppl 137.16
| epoch 2 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.92 | ppl 136.96
| epoch 2 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.89 | ppl 133.62
| epoch 2 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.78 | ppl 118.79
| epoch 2 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.83 | ppl 125.03
| epoch 2 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.87 | ppl 130.80
| epoch 2 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.69 | ppl 109.35
| epoch 2 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.64 | ppl 103.29
-----------------------------------------------------------------------------------------
| end of epoch 2 | time: 22.96s | valid loss 4.96 | valid ppl 142.18
-----------------------------------------------------------------------------------------
| epoch 3 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.67 | ppl 106.62
| epoch 3 | 400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.79 | ppl 120.30
| epoch 3 | 600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.68 | ppl 107.72
| epoch 3 | 800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.65 | ppl 104.60
| epoch 3 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.67 | ppl 106.95
| epoch 3 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.66 | ppl 105.12
| epoch 3 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.55 | ppl 94.70
| epoch 3 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.62 | ppl 101.98
| epoch 3 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.68 | ppl 108.26
| epoch 3 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.48 | ppl 88.55
| epoch 3 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.45 | ppl 85.87
-----------------------------------------------------------------------------------------
| end of epoch 3 | time: 22.89s | valid loss 4.90 | valid ppl 133.71
-----------------------------------------------------------------------------------------
| epoch 4 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.48 | ppl 88.58
| epoch 4 | 400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.63 | ppl 102.72
| epoch 4 | 600/ 2323 batches | lr 20.00 | ms/batch 9.48 | loss 4.52 | ppl 91.82
| epoch 4 | 800/ 2323 batches | lr 20.00 | ms/batch 9.58 | loss 4.50 | ppl 89.90
| epoch 4 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.57 | loss 4.53 | ppl 92.52
| epoch 4 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.59 | loss 4.52 | ppl 91.63
| epoch 4 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.42 | ppl 82.96
| epoch 4 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.50 | ppl 90.31
| epoch 4 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.57 | ppl 96.44
| epoch 4 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.37 | ppl 78.93
| epoch 4 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.34 | ppl 77.00
-----------------------------------------------------------------------------------------
| end of epoch 4 | time: 23.00s | valid loss 4.89 | valid ppl 133.30
-----------------------------------------------------------------------------------------
| epoch 5 | 200/ 2323 batches | lr 20.00 | ms/batch 9.47 | loss 4.38 | ppl 79.91
| epoch 5 | 400/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.53 | ppl 92.42
| epoch 5 | 600/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.42 | ppl 83.08
| epoch 5 | 800/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.40 | ppl 81.46
| epoch 5 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.44 | ppl 84.81
| epoch 5 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.44 | ppl 84.47
| epoch 5 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.34 | ppl 76.87
| epoch 5 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.42 | ppl 83.43
| epoch 5 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.49 | ppl 89.41
| epoch 5 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.30 | ppl 73.41
| epoch 5 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 4.28 | ppl 71.96
-----------------------------------------------------------------------------------------
| end of epoch 5 | time: 22.90s | valid loss 4.89 | valid ppl 132.54
-----------------------------------------------------------------------------------------
| epoch 6 | 200/ 2323 batches | lr 20.00 | ms/batch 9.49 | loss 4.32 | ppl 74.99
| epoch 6 | 400/ 2323 batches | lr 20.00 | ms/batch 9.46 | loss 4.47 | ppl 87.01
| epoch 6 | 600/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.36 | ppl 77.89
| epoch 6 | 800/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.34 | ppl 76.46
| epoch 6 | 1000/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.38 | ppl 79.95
| epoch 6 | 1200/ 2323 batches | lr 20.00 | ms/batch 9.53 | loss 4.37 | ppl 79.05
| epoch 6 | 1400/ 2323 batches | lr 20.00 | ms/batch 9.44 | loss 4.29 | ppl 72.78
| epoch 6 | 1600/ 2323 batches | lr 20.00 | ms/batch 9.43 | loss 4.37 | ppl 79.35
| epoch 6 | 1800/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.44 | ppl 84.42
| epoch 6 | 2000/ 2323 batches | lr 20.00 | ms/batch 9.45 | loss 4.24 | ppl 69.63
| epoch 6 | 2200/ 2323 batches | lr 20.00 | ms/batch 9.42 | loss 4.23 | ppl 68.58
-----------------------------------------------------------------------------------------
| end of epoch 6 | time: 22.92s | valid loss 4.89 | valid ppl 132.85
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss 4.86 | test ppl 128.44
In the word_language_model example, the suggestions for a RNN_TANH
or RNN_RELU
are not valid arguments. It looks like these haven't been valid members of the API since October 2016: pytorch/pytorch@b5d1329, instead the API determines the type of RNN based on the mode
argument to the RNN class.
An easy fix would be to check for either RNN_TANH
or RNN_RELU
and then pass that as the mode for the class, or use the LSTM
or GRU
models directly as is done currently in the code (by accessing the classes from getattribute(nn, 'LSTM'/'GRU')
Bad form to need to set magic envvars. I know in Torch there was an issue with other packages going funny, but seems to matter less in pytorch. If that is standard practice, we should do omp_set_num_threads(1) in the code unless overridden by the user.
I'm running the examples/mnist/ code at https://github.com/pytorch/examples/tree/master/mnist.
When I follow the README's instructions I get (running from bash):
(py35) ~/pytorch/examples/mnist$ CUDA_VISIBLE_DEVICES=2 python main.py
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=109 error=38 : no CUDA-capable device is detected
Files already downloaded
Train Epoch: 1 [0/60000 (0%)] Loss: 2.297210
Train Epoch: 1 [640/60000 (1%)] Loss: 2.318286
Train Epoch: 1 [1280/60000 (2%)] Loss: 2.298914
Train Epoch: 1 [1920/60000 (3%)] Loss: 2.317417
Train Epoch: 1 [2560/60000 (4%)] Loss: 2.295015
^C
But my cards are enabled...
~/pytorch/examples/mnist$ nvidia-smi
Sun Mar 19 15:50:14 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.39 Driver Version: 375.39 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN X (Pascal) Off | 0000:01:00.0 On | N/A |
| 23% 27C P8 10W / 250W | 63MiB / 12186MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 TITAN X (Pascal) Off | 0000:02:00.0 Off | N/A |
| 23% 27C P8 8W / 250W | 1MiB / 12189MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1094 G /usr/lib/xorg/Xorg 60MiB |
+-----------------------------------------------------------------------------+
...and I've been running other PyTorch code that correctly detects my devices, for example,
(py35) ~/pytorch/examples/mnist$ python
Python 3.5.2 |Anaconda 4.3.1 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.cuda.is_available():
... print("Using CUDA, number of devices = ",torch.cuda.device_count())
...
Using CUDA, number of devices = 2
>>>
Seems like the use of the flag CUDA_VISIBLE_DEVICES=2 actually disables CUDA, as shown...
(py35) ~/pytorch/examples/mnist$ CUDA_VISIBLE_DEVICES=2 python
Python 3.5.2 |Anaconda 4.3.1 (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> if torch.cuda.is_available():
... print("Using CUDA, number of devices = ",torch.cuda.device_count())
... else:
... print("Can't find CUDA")
...
THCudaCheck FAIL file=torch/csrc/cuda/Module.cpp line=109 error=38 : no CUDA-capable device is detected
Can't find CUDA
Perhaps the README for this example should have that env flag deleted, or....?
Hello,
question1: in my understanding of the example, when we are predicting the first word in a sentence, the hidden state contains information with the previous sentence, this would lead to unfair comparison(in fact, in a lot of lm examples in famous toolkits, they don't care about this issue either, maybe the developer is not a lm people), for example, for standard N-gram evaluation, they don't utilize words with the previous sentence.
question2: in my understanding of the example, let's first assume sentence1 is [x1 x2 x3 x4 x5], sentence2 is [y1 y2 y3 y4 y5], then the first mb could be [x1 x2 x3], the second mb could be [x4 x5 y1], etc.
However, according the paper "Efficient GPU-based Training of Recurrent Neural Network Language Models Using Spliced Sentence Bunch", the better way is to set the mb like [(x1 y1) (x2 y2) (x3 y3)], so that more parallelism would be allowed.
Please correct me if my understanding is wrong. Thanks!
After switching to batch first, .size(1) is no longer the batch size.
https://github.com/pytorch/examples/blob/master/OpenNMT/train.py#L123
There is the code of reinforce.py
for action, r in zip(self.saved_actions, rewards): action.reinforce(r)
And there is the code of actor-critic.py:
for (action, value), r in zip(saved_actions, rewards): reward = r - value.data[0,0] action.reinforce(reward) value_loss += F.smooth_l1_loss(value, Variable(torch.Tensor([r])))
So i consider it is Asynchronous Advantage Actor-Critic, A3C, not Actor-critic
Hi
I am trying to train a variational auto-encoder as in the given example code. The training is working fine, but when I was trying out the decoder by giving it as input a random sample with the same dimensionality and prior as the original latent space (z in the paper), I ran into an error. I was testing the following method from the model:
model.decode()
z = torch.randn(150,16) z.cuda() v = Variable(z) out = model(v)
TypeError Traceback (most recent call last)
in ()
1 #test the model
2
----> 3 out = model.decode(v)
in decode(self, z)
38 def decode(self, z):
39 #pdb.set_trace()
---> 40 h1 = self.relu(self.fd1(z))
41 h2 = self.relu(self.fd2(h1))
42 h3 = self.fd3d( self.relu( self.fd3( h2 ) ) )
/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.pyc in call(self, *input, **kwargs)
208
209 def call(self, *input, **kwargs):
--> 210 result = self.forward(*input, **kwargs)
211 for hook in self._forward_hooks.values():
212 hook_result = hook(self, input, result)
/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/modules/linear.pyc in forward(self, input)
52 return self._backend.Linear()(input, self.weight)
53 else:
---> 54 return self._backend.Linear()(input, self.weight, self.bias)
55
56 def repr(self):
/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/functions/linear.pyc in forward(self, input, weight, bias)
8 self.save_for_backward(input, weight, bias)
9 output = input.new(input.size(0), weight.size(0))
---> 10 output.addmm(0, 1, input, weight.t())
11 if bias is not None:
12 # cuBLAS doesn't support 0 strides in sger, so we can't use expand
TypeError: addmm_ received an invalid combination of arguments - got (int, int, torch.FloatTensor, torch.cuda.FloatTensor), but expected one of:
/home/songbird/anaconda2/lib/python2.7/site-packages/torch/nn/functions/linear.py(10)forward()
8 self.save_for_backward(input, weight, bias)
9 output = input.new(input.size(0), weight.size(0))
---> 10 output.addmm(0, 1, input, weight.t())
11 if bias is not None:
12 # cuBLAS doesn't support 0 strides in sger, so we can't use expand
There are several bugs in these lines https://github.com/pytorch/examples/blob/master/imagenet/transforms.py#L48
(I only tested this class, was doing some visualizations)
I am a regular caffe user and all of my datasets are prepared in lmdb format. I would like to have an example to load data of lmdb/hdf5 format? Does pytorch support it?
Thanks
I tried to train AlexNet from scratch, and compare the training time using PyTorch and using Caffe. I'm on a Pascal Titan X, using PyTorch 0.1.11 from pip.
It looks to me that the time taken to compute each batch varies, from 0.1s to 2s (worst case). I used 256 batch, and 22 workers. The data is fetched from a PCIe NVMe SSD, so IO should not be an issue (I think?).
Is this expected, or is this something that can be addressed? Thanks.
Epoch: [0][905/5005] Time 1.317 (0.256) Data 1.264 (0.168) Loss 6.8791 (6.9048) Prec@1 0.000 (0.097) Prec@5 0.391 (0.498)
Epoch: [0][906/5005] Time 0.099 (0.255) Data 0.001 (0.168) Loss 6.9001 (6.9048) Prec@1 0.000 (0.096) Prec@5 0.000 (0.498)
Epoch: [0][907/5005] Time 0.103 (0.255) Data 0.000 (0.168) Loss 6.8702 (6.9047) Prec@1 0.781 (0.097) Prec@5 1.562 (0.499)
Epoch: [0][908/5005] Time 0.102 (0.255) Data 0.001 (0.167) Loss 6.8882 (6.9047) Prec@1 0.000 (0.097) Prec@5 0.000 (0.498)
Epoch: [0][909/5005] Time 0.257 (0.255) Data 0.206 (0.167) Loss 6.8977 (6.9047) Prec@1 0.391 (0.097) Prec@5 0.391 (0.498)
Epoch: [0][910/5005] Time 0.102 (0.255) Data 0.001 (0.167) Loss 6.8973 (6.9047) Prec@1 0.000 (0.097) Prec@5 0.000 (0.498)
Epoch: [0][911/5005] Time 0.603 (0.255) Data 0.552 (0.168) Loss 6.8929 (6.9047) Prec@1 0.000 (0.097) Prec@5 0.391 (0.498)
Epoch: [0][912/5005] Time 0.101 (0.255) Data 0.001 (0.168) Loss 6.8911 (6.9047) Prec@1 0.000 (0.097) Prec@5 0.781 (0.498)
Epoch: [0][913/5005] Time 0.497 (0.255) Data 0.445 (0.168) Loss 6.8757 (6.9046) Prec@1 0.000 (0.097) Prec@5 1.172 (0.499)
Epoch: [0][914/5005] Time 0.111 (0.255) Data 0.001 (0.168) Loss 6.8713 (6.9046) Prec@1 0.000 (0.097) Prec@5 1.172 (0.499)
Epoch: [0][915/5005] Time 0.100 (0.255) Data 0.001 (0.167) Loss 6.8851 (6.9046) Prec@1 0.000 (0.097) Prec@5 0.000 (0.499)
Epoch: [0][916/5005] Time 0.106 (0.255) Data 0.001 (0.167) Loss 6.8716 (6.9045) Prec@1 0.000 (0.097) Prec@5 1.172 (0.500)
Epoch: [0][917/5005] Time 0.156 (0.255) Data 0.105 (0.167) Loss 6.9136 (6.9046) Prec@1 0.000 (0.097) Prec@5 0.391 (0.500)
Epoch: [0][918/5005] Time 0.102 (0.255) Data 0.001 (0.167) Loss 6.8948 (6.9045) Prec@1 0.000 (0.096) Prec@5 0.000 (0.499)
Epoch: [0][919/5005] Time 0.101 (0.255) Data 0.001 (0.167) Loss 6.8860 (6.9045) Prec@1 0.000 (0.096) Prec@5 1.172 (0.500)
Epoch: [0][920/5005] Time 0.101 (0.254) Data 0.001 (0.167) Loss 6.8774 (6.9045) Prec@1 0.000 (0.096) Prec@5 0.391 (0.500)
Epoch: [0][921/5005] Time 0.108 (0.254) Data 0.001 (0.167) Loss 6.8833 (6.9045) Prec@1 0.391 (0.097) Prec@5 0.391 (0.500)
Epoch: [0][922/5005] Time 0.163 (0.254) Data 0.106 (0.166) Loss 6.8969 (6.9045) Prec@1 0.000 (0.096) Prec@5 0.000 (0.499)
Epoch: [0][923/5005] Time 0.251 (0.254) Data 0.194 (0.166) Loss 6.8844 (6.9044) Prec@1 0.000 (0.096) Prec@5 0.391 (0.499)
I noticed this because when I print timing info every 10 epochs, it seems that the average value is far from the value of the current batch.
I saw the codes here, https://github.com/pytorch/examples/blob/master/mnist/main.py#L31-L33
torch.manual_seed(args.seed)
if args.cuda:
torch.cuda.manual_seed(args.seed)
Why there are two random generators? Any reasons for using this in the design of PyTorch?
Attention can be sped up by pre-computing the linear transformation of context, instead of feeding it to the attention at every step.
https://github.com/pytorch/examples/blob/master/OpenNMT/onmt/Models.py#L119
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.