Giter Site home page Giter Site logo

pytorch-classification's Introduction

pytorch-classification

Classification on CIFAR-10/100 and ImageNet with PyTorch.

Features

  • Unified interface for different network architectures
  • Multi-GPU support
  • Training progress bar with rich info
  • Training log and training curve visualization code (see ./utils/logger.py)

Install

  • Install PyTorch
  • Clone recursively
    git clone --recursive https://github.com/bearpaw/pytorch-classification.git
    

Training

Please see the Training recipes for how to train the models.

Results

CIFAR

Top1 error rate on the CIFAR-10/100 benchmarks are reported. You may get different results when training your models with different random seed. Note that the number of parameters are computed on the CIFAR-10 dataset.

Model Params (M) CIFAR-10 (%) CIFAR-100 (%)
alexnet 2.47 22.78 56.13
vgg19_bn 20.04 6.66 28.05
ResNet-110 1.70 6.11 28.86
PreResNet-110 1.70 4.94 23.65
WRN-28-10 (drop 0.3) 36.48 3.79 18.14
ResNeXt-29, 8x64 34.43 3.69 17.38
ResNeXt-29, 16x64 68.16 3.53 17.30
DenseNet-BC (L=100, k=12) 0.77 4.54 22.88
DenseNet-BC (L=190, k=40) 25.62 3.32 17.17

cifar

ImageNet

Single-crop (224x224) validation error rate is reported.

Model Params (M) Top-1 Error (%) Top-5 Error (%)
ResNet-18 11.69 30.09 10.78
ResNeXt-50 (32x4d) 25.03 22.6 6.29

Validation curve

Pretrained models

Our trained models and training logs are downloadable at OneDrive.

Supported Architectures

CIFAR-10 / CIFAR-100

Since the size of images in CIFAR dataset is 32x32, popular network structures for ImageNet need some modifications to adapt this input size. The modified models is in the package models.cifar:

ImageNet

Contribute

Feel free to create a pull request if you find any bugs or you want to contribute (e.g., more datasets and more network structures).

pytorch-classification's People

Contributors

bearpaw avatar hongyi-zhang avatar lzx1413 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pytorch-classification's Issues

resnext 50 on imagenet load pretrain model failed

i try to modify resnext50 function to:

def resnext50(baseWidth, cardinality,pretrained=True):
    """
    Construct ResNeXt-50.
    """
    model = ResNeXt(baseWidth, cardinality, [3, 4, 6, 3], 1000)
    if pretrained:
        model.load_state_dict(torch.load('model_best.pth.tar')['state_dict'])
        print('loaded model')

    return model

the model_best.pth.tar file download from the link of readme.md

and some errors raised when i run:

Traceback (most recent call last):
  File "imagenet.py", line 348, in <module>
    main()
  File "imagenet.py", line 152, in main
    cardinality=args.cardinality,
  File "/media/amax/xgessd/yao/pytorch-classification/models/imagenet/resnext.py", line 160, in resnext50
    model.load_state_dict(torch.load('ResNext50_checkpoint_best.pth.tar')['state_dict'])
  File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNeXt:
	Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "lay........

accuracy rate of resnet-110 on cifar-100

Hi, I have trained resnet-110 on cifar-100, and I got a 77% test acc, which outperformed the result you reported result by 5 percents ( 77%(my result) v.s. 71.2%(you reported) ). I have not modified any code and configs. So, I'm really interested in how you get the result you reported?

running with a newer pytorch version

Hi, I hit the following error when I run cifar.py with pytorch 1.0.1.post2.

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

This is due to a pytorch version greater hthan 0.5 is used as reported here. I've posted a fix here in case other people encounter the same issue.

about imagenet training

Hi, @bearpaw ,

According imagenet training, you mention --data option is ~/dataset/ILSVRC2012/. However, after I download the ILSVRC17 dataset ILSVRC2017_CLS-LOC.tgz from current year challenge.
ilsvrc17_download

The *.JPEG are located inside train/val/test at the following folder structures:

~/dataset/ILSVRC/
------------------------Data/
-------------------------------CLS-LOC/
--------------------------------------------train/
--------------------------------------------------n01440764/.JPEG
--------------------------------------------------n01443537/
.JPEG
...

------------------------------------------- val/
------------------------------------------- test/

Any suggestions to modify --data option in imagenet training?

THX!

How can do I run other networks (ex: alexnet, vgg, inception_v3) ??

The torchvision includes many networks.
(alexnet, vgg11, etc...)

I could do run your sample according to TRAINING.md.
But I couldn't do run other networks.

my status )

$ python imagenet.py -a vgg11 --data /data1/mirero/TESTBOARD/DLUTIL/DLUTIL_V5/classification/keras/cifar10_256 --epochs 10 --schedule 31 61 --gamma 0.1 -c checkpoint/imagenet/resnet18 --train-batch 16 --test-batch 16 --gpu-id 1

Epoch: [1 | 10] LR: 0.100000
Processingimagenet.py:249: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], inputs.size(0))
imagenet.py:250: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top1.update(prec1[0], inputs.size(0))
imagenet.py:251: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top5.update(prec5[0], inputs.size(0))
Processing | | (33/3125) Data: 0.078s | Batch: 0.083s | Total: 0:00:03 | ETA: 0:04:16 | Loss: nan | top1: 8.9015 | top5: 45.2652

================================> Loss is very big... (i guess divergence)

$ python imagenet.py -a inception_v3 --data /data1/mirero/TESTBOARD/DLUTIL/DLUTIL_V5/classification/keras/cifar10_256 --epochs 10 --schedule 31 61 --gamma 0.1 -c checkpoint/imagenet/resnet18 --train-batch 16 --test-batch 16 --gpu-id 1

=>
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

Thanks,
Edward Cho.

Mean and Std in Normalization

I have a quick question regarding normalization transform. The mean and Std which is being used here, are calculated for CIFAR dataset or ImageNet dataset?
I saw a repo which was using the same values for ImageNet dataset.

Error(s) in loading state_dict for DataParallel:

When I download the pre_trained model and resume it. there is an error.
model.load_state_dict(checkpoint['state_dict'])
It seems that the name are not matched.(e.g. "module.features.0.weight" v.s. "features.module.0.weight")
How could I solve it if I wish to use the pre_trained model on Cifar10?
Thank you !

Traceback (most recent call last):
File "test_0.py", line 130, in
model = load_model()
File "test_0.py", line 104, in load_model
model.load_state_dict(checkpoint['state_dict'])
File "/home/cosmo/anaconda3/envs/tf8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.features.0.weight", "module.features.0.bias", "module.features.3.weight", "module.features.3.bias", "module.features.6.weight", "module.features.6.bias", "module.features.8.weight", "module.features.8.bias", "module.features.10.weight", "module.features.10.bias", "module.classifier.weight", "module.classifier.bias".
Unexpected key(s) in state_dict: "features.module.0.weight", "features.module.0.bias", "features.module.3.weight", "features.module.3.bias", "features.module.6.weight", "features.module.6.bias", "features.module.8.weight", "features.module.8.bias", "features.module.10.weight", "features.module.10.bias", "classifier.weight", "classifier.bias".

Why did you use Basic Block and Bottleneck Block?

Hi Bearpaw,

Nice implementation!
However, in the paper the authors for CIFAR10 just pad the input to match the dimensions before the summation if there was a downsampling - every 2n layers.

Why did you make use of Basic Block until layer 44 and Bottleneck Block beyond that?

Regards,
Pablo

ImportError: No module named progress.bar

In the file /utils/init.py, it seems that there is a spare dot in the code:
from progress**.**bar import Bar as Bar

May it should be changed into "from progressbar import Bar as Bar"?

cifar 100 feature extraction using WRN

Thank you for sharing this codes,
I want to use your WRN model to extract features from cifar 100 and feed them into the some recurrent neural network for some zero shot image tagging problem.
how can i do this?! i need feature vectors before the last layers of the model.
can you help me? please.

The pretrain cifar10 resnet110 indeed is resnet164 (BottleNeck)

I found the pretrain cifar10 model resnet110 is not resnet110, but resnet164.
The model is:
model = resnet(depth = 164, block_name='bottleNeck')
Use this model can load the state_dict sucessfully, but I haven't check the accuracy.
btw, the state_dict contain 'module', we can load the state_dict like this:

def load_parallel_weight(model, weight):
    state_dict = torch.load(weight)['state_dict']
    new_dict={}
    for w in state_dict:
        new_dict['.'.join(filter(lambda x:x!="module", w.split('.')))] = state_dict[w]
    model.load_state_dict(new_dict)

How about the shortcut A ?

Hi,thank you for your code firstly but I'm confused why you using ShortcutB(1*1 conv) when the size of feature map is halved. Because as far as I know, ShortcutA(insert 0) is used in the net of cifar10 dataset in the original paper. I have tried to train resnet with ShortcutA with Pytorch on cifar10 dataset, but I couldn't get such a good result as you reported. I don't think it's the problem of shotcut since I couldn't get the result as good as the original paper either. Have you tried ShortcutA, How about the result? Thank you.

Accuracy of training

Hi, I am try to use your modified model on cifar100.
But my alexnet's accuracy just can be around 42%; and vgg19_bn just can be 62%around ,63.46% most when the 130 epoch .
May be some parameters I set are wrong?

Learning Rate and Number of Epochs to run

Hello,

I was looking to use the Imagenet.py file for training a Resnet50 model from scratch. I was getting confused as to how many epochs should I train my network and how should the learning rate change with the epochs. I see in your code you have changed it after 120 and 225 epochs.

I see for Resnet Paper, they have used a different scheme, where seems to be chnaging learning rate after each 31 epoch. function Trainer:learningRate(epoch)

   -- Training schedule
   local decay = 0
   if self.opt.dataset == 'imagenet' then
      decay = math.floor((epoch - 1) / 30)
   elseif self.opt.dataset == 'cifar10' then
      decay = epoch >= 122 and 2 or epoch >= 81 and 1 or 0
   elseif self.opt.dataset == 'cifar100' then
      decay = epoch >= 122 and 2 or epoch >= 81 and 1 or 0
   end
   return self.opt.LR * math.pow(0.1, decay)
end

return M.Trainer

I was just confused at what epoch should I change my learning rate and how many epochs should I train for. Any pointer would be really appreciated.

Regards,
Nitin

Checkpoints are unaccessible

I was trying to unpack your "model_best.pth.tar" and "checkpoint.pth.tar", unsuccessfully.
I have tried using "tar -xvf checkpoint.pth.tar" on both my mac and on linux, but I got these errors:
on mac:
tar: Error opening archive: Unrecognized archive format

on linux:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Can you please upload the "checkpoint.pth" and "model_best.pth" in their unpacked format?
I need the paths for:
alexnet
Resnet101
DenseNet-BC(L=190,k=40)

run memory of DenseNet-BC?

if I want to run DenseNet-BC (L=190, k=40)  on cifar100 and batch_size=64,How much memory is needed?How about ResNeXt-29, 16x64?

About the depth of resnet56 with bottleneck as building block

Hi~ Thanks a lot for your excellent work!

I guess after executing the code below with depth = 56, we get n = 9, block = Bottleneck. But if fact, Bottleneck contains 3 conv, the number of weighted layer is 9*9+2 = 83 instead of 6*9+2 = 56.

        assert (depth - 2) % 6 == 0, 'depth should be 6n+2'
        n = (depth - 2) // 6

        block = Bottleneck if depth >=44 else BasicBlock

Thus, I suggest change it to make sure the number of parameters keep the same.

        if depth >= 44:
            assert (depth - 2) % 9 == 0
            n = (depth - 2) // 9
            block = Bottleneck

        else:
            assert (depth - 2) % 6 == 0, 'depth should be 6n+2'
            n = (depth - 2) // 6
            block = BasicBlock

Problems with the checkpoints of the PreResNet110

We use your checkpoints but I derive the best results at step 1. Is It the best model parameters or the pretrained models. I'm not sure about it and hope you could help me and clarify it. Thank you

Error loading pretrained model weights

When I try resuming from the pretrained model weight I get an error
This is what I am running:
python cifar.py -a preresnet --depth 110 --epochs 3 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar10/preresnet-110 --resume 'checkpoint.pth.tar'

and this is the error:

RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.bn.weight", "module.bn.bias", "module.bn.running_mean", "module.bn.running_var".
Unexpected key(s) in state_dict: "module.bn1.weight", "module.bn1.bias", "module.bn1.running_mean", "module.bn1.running_var", "module.layer1.0.conv3.weight", "module.layer1.0.bn3.weight", "module.layer1.0.bn3.bias", "module.layer1.0.bn3.running_mean", "module.layer1.0.bn3.running_var", "module.layer1.0.downsample.0.weight", "module.layer1.0.downsample.1.weight", "module.layer1.0.downsample.1.bias", "module.layer1.0.downsample.1.running_mean", "module.layer1.0.downsample.1.running_var", "module.layer1.1.conv3.weight", "module.layer1.1.bn3.weight", "module.layer1.1.bn3.bias", "module.layer1.1.bn3.running_mean", "module.layer1.1.bn3.running_var", "module.layer1.2.conv3.weight", "module.layer1.2.bn3.weight", "module.layer1.2.bn3.bias", "module.layer1.2.bn3.running_mean", "module.layer1.2.bn3.running_var", "module.layer1.3.conv3.weight", "module.layer1.3.bn3.weight", "module.layer1.3.bn3.bias", "module.layer1.3.bn3.running_mean", "module.layer1.3.bn3.running_var", "module.layer1.4.conv3.weight", "module.layer1.4.bn3.weight", "module.layer1.4.bn3.bias", "module.layer1.4.bn3.running_mean", "module.layer1.4.bn3.running_var", "module.layer1.5.conv3.weight", "module.layer1.5.bn3.weight", "module.layer1.5.bn3.bias", "module.layer1.5.bn3.running_mean", "module.layer1.5.bn3.running_var", "module.layer1.6.conv3.weight", "module.layer1.6.bn3.weight", "module.layer1.6.bn3.bias", "module.layer1.6.bn3.running_mean", "module.layer1.6.bn3.running_var", "module.layer1.7.conv3.weight", "module.layer1.7.bn3.weight", "module.layer1.7.bn3.bias", "module.layer1.7.bn3.running_mean", "module.layer1.7.bn3.running_var", "module.layer1.8.conv3.weight", "module.layer1.8.bn3.weight", "module.layer1.8.bn3.bias", "module.layer1.8.bn3.running_mean", "module.layer1.8.bn3.running_var", "module.layer1.9.conv3.weight", "module.layer1.9.bn3.weight", "module.layer1.9.bn3.bias", "module.layer1.9.bn3.running_mean", "module.layer1.9.bn3.running_var", "module.layer1.10.conv3.weight", "module.layer1.10.bn3.weight", "module.layer1.10.bn3.bias", "module.layer1.10.bn3.running_mean", "module.layer1.10.bn3.running_var", "module.layer1.11.conv3.weight", "module.layer1.11.bn3.weight", "module.layer1.11.bn3.bias", "module.layer1.11.bn3.running_mean", "module.layer1.11.bn3.running_var", "module.layer1.12.conv3.weight", "module.layer1.12.bn3.weight", "module.layer1.12.bn3.bias", "module.layer1.12.bn3.running_mean", "module.layer1.12.bn3.running_var", "module.layer1.13.conv3.weight", "module.layer1.13.bn3.weight", "module.layer1.13.bn3.bias", "module.layer1.13.bn3.running_mean", "module.layer1.13.bn3.running_var", "module.layer1.14.conv3.weight", "module.layer1.14.bn3.weight", "module.layer1.14.bn3.bias", "module.layer1.14.bn3.running_mean", "module.layer1.14.bn3.running_var", "module.layer1.15.conv3.weight", "module.layer1.15.bn3.weight", "module.layer1.15.bn3.bias", "module.layer1.15.bn3.running_mean", "module.layer1.15.bn3.running_var", "module.layer1.16.conv3.weight", "module.layer1.16.bn3.weight", "module.layer1.16.bn3.bias", "module.layer1.16.bn3.running_mean", "module.layer1.16.bn3.running_var", "module.layer1.17.conv3.weight", "module.layer1.17.bn3.weight", "module.layer1.17.bn3.bias", "module.layer1.17.bn3.running_mean", "module.layer1.17.bn3.running_var", "module.layer2.0.conv3.weight", "module.layer2.0.bn3.weight", "module.layer2.0.bn3.bias", "module.layer2.0.bn3.running_mean", "module.layer2.0.bn3.running_var", "module.layer2.0.downsample.1.weight", "module.layer2.0.downsample.1.bias", "module.layer2.0.downsample.1.running_mean", "module.layer2.0.downsample.1.running_var", "module.layer2.1.conv3.weight", "module.layer2.1.bn3.weight", "module.layer2.1.bn3.bias", "module.layer2.1.bn3.running_mean", "module.layer2.1.bn3.running_var", "module.layer2.2.conv3.weight", "module.layer2.2.bn3.weight", "module.layer2.2.bn3.bias", "module.layer2.2.bn3.running_mean", "module.layer2.2.bn3.running_var", "module.layer2.3.conv3.weight", "module.layer2.3.bn3.weight", "module.layer2.3.bn3.bias", "module.layer2.3.bn3.running_mean", "module.layer2.3.bn3.running_var", "module.layer2.4.conv3.weight", "module.layer2.4.bn3.weight", "module.layer2.4.bn3.bias", "module.layer2.4.bn3.running_mean", "module.layer2.4.bn3.running_var", "module.layer2.5.conv3.weight", "module.layer2.5.bn3.weight", "module.layer2.5.bn3.bias", "module.layer2.5.bn3.running_mean", "module.layer2.5.bn3.running_var", "module.layer2.6.conv3.weight", "module.layer2.6.bn3.weight", "module.layer2.6.bn3.bias", "module.layer2.6.bn3.running_mean", "module.layer2.6.bn3.running_var", "module.layer2.7.conv3.weight", "module.layer2.7.bn3.weight", "module.layer2.7.bn3.bias", "module.layer2.7.bn3.running_mean", "module.layer2.7.bn3.running_var", "module.layer2.8.conv3.weight", "module.layer2.8.bn3.weight", "module.layer2.8.bn3.bias", "module.layer2.8.bn3.running_mean", "module.layer2.8.bn3.running_var", "module.layer2.9.conv3.weight", "module.layer2.9.bn3.weight", "module.layer2.9.bn3.bias", "module.layer2.9.bn3.running_mean", "module.layer2.9.bn3.running_var", "module.layer2.10.conv3.weight", "module.layer2.10.bn3.weight", "module.layer2.10.bn3.bias", "module.layer2.10.bn3.running_mean", "module.layer2.10.bn3.running_var", "module.layer2.11.conv3.weight", "module.layer2.11.bn3.weight", "module.layer2.11.bn3.bias", "module.layer2.11.bn3.running_mean", "module.layer2.11.bn3.running_var", "module.layer2.12.conv3.weight", "module.layer2.12.bn3.weight", "module.layer2.12.bn3.bias", "module.layer2.12.bn3.running_mean", "module.layer2.12.bn3.running_var", "module.layer2.13.conv3.weight", "module.layer2.13.bn3.weight", "module.layer2.13.bn3.bias", "module.layer2.13.bn3.running_mean", "module.layer2.13.bn3.running_var", "module.layer2.14.conv3.weight", "module.layer2.14.bn3.weight", "module.layer2.14.bn3.bias", "module.layer2.14.bn3.running_mean", "module.layer2.14.bn3.running_var", "module.layer2.15.conv3.weight", "module.layer2.15.bn3.weight", "module.layer2.15.bn3.bias", "module.layer2.15.bn3.running_mean", "module.layer2.15.bn3.running_var", "module.layer2.16.conv3.weight", "module.layer2.16.bn3.weight", "module.layer2.16.bn3.bias", "module.layer2.16.bn3.running_mean", "module.layer2.16.bn3.running_var", "module.layer2.17.conv3.weight", "module.layer2.17.bn3.weight", "module.layer2.17.bn3.bias", "module.layer2.17.bn3.running_mean", "module.layer2.17.bn3.running_var", "module.layer3.0.conv3.weight", "module.layer3.0.bn3.weight", "module.layer3.0.bn3.bias", "module.layer3.0.bn3.running_mean", "module.layer3.0.bn3.running_var", "module.layer3.0.downsample.1.weight", "module.layer3.0.downsample.1.bias", "module.layer3.0.downsample.1.running_mean", "module.layer3.0.downsample.1.running_var", "module.layer3.1.conv3.weight", "module.layer3.1.bn3.weight", "module.layer3.1.bn3.bias", "module.layer3.1.bn3.running_mean", "module.layer3.1.bn3.running_var", "module.layer3.2.conv3.weight", "module.layer3.2.bn3.weight", "module.layer3.2.bn3.bias", "module.layer3.2.bn3.running_mean", "module.layer3.2.bn3.running_var", "module.layer3.3.conv3.weight", "module.layer3.3.bn3.weight", "module.layer3.3.bn3.bias", "module.layer3.3.bn3.running_mean", "module.layer3.3.bn3.running_var", "module.layer3.4.conv3.weight", "module.layer3.4.bn3.weight", "module.layer3.4.bn3.bias", "module.layer3.4.bn3.running_mean", "module.layer3.4.bn3.running_var", "module.layer3.5.conv3.weight", "module.layer3.5.bn3.weight", "module.layer3.5.bn3.bias", "module.layer3.5.bn3.running_mean", "module.layer3.5.bn3.running_var", "module.layer3.6.conv3.weight", "module.layer3.6.bn3.weight", "module.layer3.6.bn3.bias", "module.layer3.6.bn3.running_mean", "module.layer3.6.bn3.running_var", "module.layer3.7.conv3.weight", "module.layer3.7.bn3.weight", "module.layer3.7.bn3.bias", "module.layer3.7.bn3.running_mean", "module.layer3.7.bn3.running_var", "module.layer3.8.conv3.weight", "module.layer3.8.bn3.weight", "module.layer3.8.bn3.bias", "module.layer3.8.bn3.running_mean", "module.layer3.8.bn3.running_var", "module.layer3.9.conv3.weight", "module.layer3.9.bn3.weight", "module.layer3.9.bn3.bias", "module.layer3.9.bn3.running_mean", "module.layer3.9.bn3.running_var", "module.layer3.10.conv3.weight", "module.layer3.10.bn3.weight", "module.layer3.10.bn3.bias", "module.layer3.10.bn3.running_mean", "module.layer3.10.bn3.running_var", "module.layer3.11.conv3.weight", "module.layer3.11.bn3.weight", "module.layer3.11.bn3.bias", "module.layer3.11.bn3.running_mean", "module.layer3.11.bn3.running_var", "module.layer3.12.conv3.weight", "module.layer3.12.bn3.weight", "module.layer3.12.bn3.bias", "module.layer3.12.bn3.running_mean", "module.layer3.12.bn3.running_var", "module.layer3.13.conv3.weight", "module.layer3.13.bn3.weight", "module.layer3.13.bn3.bias", "module.layer3.13.bn3.running_mean", "module.layer3.13.bn3.running_var", "module.layer3.14.conv3.weight", "module.layer3.14.bn3.weight", "module.layer3.14.bn3.bias", "module.layer3.14.bn3.running_mean", "module.layer3.14.bn3.running_var", "module.layer3.15.conv3.weight", "module.layer3.15.bn3.weight", "module.layer3.15.bn3.bias", "module.layer3.15.bn3.running_mean", "module.layer3.15.bn3.running_var", "module.layer3.16.conv3.weight", "module.layer3.16.bn3.weight", "module.layer3.16.bn3.bias", "module.layer3.16.bn3.running_mean", "module.layer3.16.bn3.running_var", "module.layer3.17.conv3.weight", "module.layer3.17.bn3.weight", "module.layer3.17.bn3.bias", "module.layer3.17.bn3.running_mean", "module.layer3.17.bn3.running_var".
size mismatch for module.layer1.0.conv1.weight: copying a param with shape torch.Size([16, 16, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.1.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.2.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.3.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.4.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.5.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.6.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.7.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.8.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.9.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.10.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.11.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.12.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.13.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.14.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.15.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.16.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.17.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer2.0.bn1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.conv1.weight: copying a param with shape torch.Size([32, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 16, 3, 3]).
size mismatch for module.layer2.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 16, 1, 1]).
size mismatch for module.layer2.1.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.2.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.3.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.4.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.5.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.6.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.7.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.8.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.9.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.10.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.11.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.12.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.13.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.14.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.15.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.16.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.17.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer3.0.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.conv1.weight: copying a param with shape torch.Size([64, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 32, 3, 3]).
size mismatch for module.layer3.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 32, 1, 1]).
size mismatch for module.layer3.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.2.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.3.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.4.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.5.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.6.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.7.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.8.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.9.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.10.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.11.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.12.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.13.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.14.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.15.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.16.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.17.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.fc.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([10, 64]).
size mismatch for module.fc.bias: copying a param with shape torch.Size([100]) from checkpoint, the shape in current model is torch.Size([10]).

MNIST Dataset

Similar benchmarks on MNIST dataset would be great.

For vgg16, there are three classifier layers in the provided checkpoint but only one in the model

When I am loading the checkpoint of vgg16, which is downloaded from https://download.pytorch.org/models/vgg16-397923af.pth, there is an error:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "classifier.weight", "classifier.bias".
Unexpected key(s) in state_dict: "classifier.0.weight", "classifier.0.bias", "classifier.3.weight", "classifier.3.bias", "classifier.6.weight", "classifier.6.bias".

So I find that there are three classifier layers(classifier.0\3\6) in the provided checkpoint but only one classifier layers (classifier) in the model, can anyone tell me why there is a mismatch problem?
Thank you!

Accuracy of the training

Hi,

I ran the ResNeXt-29, 8x64 model on cifar100, but I cannot get the error rate 17.38%. What I got is around 18.5% (I ran it with different random seed for multiple times). Can anyone tell me what's wrong with it? I didn't change the code.

about how to inference

hi there, thanks for your work, I use your code in training my data, and it works well, but it seems doesn't have inference function, could you provide a copy or just give some insprit ? thanks!!

zero gpu usage when train imagenet

Hi, I have using the Resnet18 to training imagenet, however I find zero gpu usage when train imagenet. Is it also the same when you train it?

Alexnet CIFAR10 checkpoints don't load properly

Hi,
I've downloaded the checkpoints from your Onedrive about Alexnet CIFAR10 Bash recognize them as data and not tar archive, and in fact, you can load them directly in PyTorch and it will display all the weights tensors.
Maybe there was a mistake during the compression?

Anyway, when I load the state_dict (for both checkpoints) this error shows up:

 model = alexnet() 
 model.load_state_dict(torch.load('model_best.pth'))
 
 KeyError: 'unexpected key "acc" in state_dict'

Update: sorry for closing/reopening.

About tiny imagenet training problem

I have modified the imagenet.py by adding the lines after the "create model" codes:
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,200)
changing the net to train on tiny imagenet.
But it has a problem: training top1 accuracy is about 79.6110 and top5 accuracy is about 91.6950;
while validation accuracys are only 0.7(top1) and 1.47(top5). Do you have any ideas on this weird problem?

The results of ResNeXt-50 (32x4d) on ImageNet

Many thanks for your valuable work!

I have trained ResNeXt-50 (32x4d) on ImageNet following the training recipes, and I got a 75% valid accuracy (the reported result is 77%).

I have not modified any code and configs, except I train it on 8xGPUs with torch.nn.DataParallel. Do you have any hints about the performance drop?

Thanks again for your work, and I hope to hear some insights from you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.