bearpaw / pytorch-classification Goto Github PK

View Code? Open in Web Editor NEW

1.7K 29.0 561.0 438 KB

Classification with PyTorch.

License: MIT License

Python 100.00%

classification cifar10 cifar100 imagenet pytorch resnet resnext wide-residual-networks wrn densenet

pytorch-classification's Introduction

pytorch-classification

Classification on CIFAR-10/100 and ImageNet with PyTorch.

Features

Unified interface for different network architectures
Multi-GPU support
Training progress bar with rich info
Training log and training curve visualization code (see ./utils/logger.py)

Install

Install PyTorch

Clone recursively

git clone --recursive https://github.com/bearpaw/pytorch-classification.git

Training

Please see the Training recipes for how to train the models.

Results

CIFAR

Top1 error rate on the CIFAR-10/100 benchmarks are reported. You may get different results when training your models with different random seed. Note that the number of parameters are computed on the CIFAR-10 dataset.

Model	Params (M)	CIFAR-10 (%)	CIFAR-100 (%)
alexnet	2.47	22.78	56.13
vgg19_bn	20.04	6.66	28.05
ResNet-110	1.70	6.11	28.86
PreResNet-110	1.70	4.94	23.65
WRN-28-10 (drop 0.3)	36.48	3.79	18.14
ResNeXt-29, 8x64	34.43	3.69	17.38
ResNeXt-29, 16x64	68.16	3.53	17.30
DenseNet-BC (L=100, k=12)	0.77	4.54	22.88
DenseNet-BC (L=190, k=40)	25.62	3.32	17.17

ImageNet

Single-crop (224x224) validation error rate is reported.

Model	Params (M)	Top-1 Error (%)	Top-5 Error (%)
ResNet-18	11.69	30.09	10.78
ResNeXt-50 (32x4d)	25.03	22.6	6.29

Pretrained models

Our trained models and training logs are downloadable at OneDrive.

Supported Architectures

CIFAR-10 / CIFAR-100

Since the size of images in CIFAR dataset is 32x32, popular network structures for ImageNet need some modifications to adapt this input size. The modified models is in the package models.cifar:

ImageNet

All models in torchvision.models (alexnet, vgg, resnet, densenet, inception_v3, squeezenet)
ResNeXt
Wide Residual Networks

Contribute

Feel free to create a pull request if you find any bugs or you want to contribute (e.g., more datasets and more network structures).

pytorch-classification's People

Contributors

Stargazers

Watchers

Forkers

lpzhang ipsolar nianfudong hualitlc nieshaoshuai paojianghu baiyancheng20 dakeli wanjinchang allankevinrichie tpys felixmonkey hassyma lijiannuist cenaliu xiangzi1992 zhunzhong07 benjamesbabala 2prime wshenx zbxzc35 alanyannick puzzledqs yowhatever wyq0227 research-ai aditya08 ganghu1993 zt1112 miaowu16 thomasdic2000 xiaochaowei mlsdd huaijin-chen yao0801 hsd315 grseb9s bennyhm kuan-wang vseledkin wjerry5 munkim peterx7803 ml-lab liu3xing3long yaozhewei choiyeren liuzhuang13 shubhampachori12110095 b2220333 jerryzcn zhnidj yanghtr laoma023012 princesston cmhungsteve yli96 lturing centerqi rosenfeldamir luzai dansonc jxlijunhao airobotai xiaomi2008 aymenx17 tiffany940107 xilaili sikura nutszebra ruthcfong sampathweb jizongfox jind11 dylanwusee csjfwang queenie88 szupzp prichemond xuehaouwa rohitkeshari locussam daijucug kanbo0409 yangle15 dsp6414 hszhao autuanliu csrhddlam afcarl sytelus zorrocai gufeicang hmjianggatech codes-kzhan yingchao-mai sgflower66 wangguangyuan mesutyang97 wanggcong

pytorch-classification's Issues

'ProgressBar' object has no attribute 'elapsed_td'?

when I ran the Resnet model, it occurs 'ProgressBar' object has no attribute 'elapsed_td',what should I do to solve this problem?

resnext 50 on imagenet load pretrain model failed

i try to modify resnext50 function to:

def resnext50(baseWidth, cardinality,pretrained=True):
    """
    Construct ResNeXt-50.
    """
    model = ResNeXt(baseWidth, cardinality, [3, 4, 6, 3], 1000)
    if pretrained:
        model.load_state_dict(torch.load('model_best.pth.tar')['state_dict'])
        print('loaded model')

    return model

the model_best.pth.tar file download from the link of readme.md

and some errors raised when i run:

Traceback (most recent call last):
  File "imagenet.py", line 348, in <module>
    main()
  File "imagenet.py", line 152, in main
    cardinality=args.cardinality,
  File "/media/amax/xgessd/yao/pytorch-classification/models/imagenet/resnext.py", line 160, in resnext50
    model.load_state_dict(torch.load('ResNext50_checkpoint_best.pth.tar')['state_dict'])
  File "/home/amax/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 721, in load_state_dict
    self.__class__.__name__, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for ResNeXt:
	Missing key(s) in state_dict: "conv1.weight", "bn1.weight", "bn1.bias", "bn1.running_mean", "bn1.running_var", "lay........

why depth of resnet-110 are 164?

I found the layers of resnet-110 is 164, why called it resnet110 but not resnet164?

accuracy rate of resnet-110 on cifar-100

Hi, I have trained resnet-110 on cifar-100, and I got a 77% test acc, which outperformed the result you reported result by 5 percents ( 77%(my result) v.s. 71.2%(you reported) ). I have not modified any code and configs. So, I'm really interested in how you get the result you reported?

running with a newer pytorch version

Hi, I hit the following error when I run cifar.py with pytorch 1.0.1.post2.

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

This is due to a pytorch version greater hthan 0.5 is used as reported here. I've posted a fix here in case other people encounter the same issue.

about imagenet training

Hi, @bearpaw ,

According imagenet training, you mention --data option is ~/dataset/ILSVRC2012/. However, after I download the ILSVRC17 dataset ILSVRC2017_CLS-LOC.tgz from current year challenge.

The *.JPEG are located inside train/val/test at the following folder structures:

~/dataset/ILSVRC/
------------------------Data/
-------------------------------CLS-LOC/
--------------------------------------------train/
--------------------------------------------------n01440764/.JPEG
--------------------------------------------------n01443537/.JPEG
...

------------------------------------------- val/
------------------------------------------- test/

Any suggestions to modify --data option in imagenet training?

THX!

Pretrained Models folder is down

Hi,
It looks like the OneDrive folder containing the pretrained models cannot be loaded (timeout). Is it a temporary issue?

Thanks.

the result of resnet18 with imagenet is Test Acc: 0.09(I loaded your pretrained model)

How can do I run other networks (ex: alexnet, vgg, inception_v3) ??

The torchvision includes many networks.
(alexnet, vgg11, etc...)

I could do run your sample according to TRAINING.md.
But I couldn't do run other networks.

my status )

$ python imagenet.py -a vgg11 --data /data1/mirero/TESTBOARD/DLUTIL/DLUTIL_V5/classification/keras/cifar10_256 --epochs 10 --schedule 31 61 --gamma 0.1 -c checkpoint/imagenet/resnet18 --train-batch 16 --test-batch 16 --gpu-id 1

Epoch: [1 | 10] LR: 0.100000
Processingimagenet.py:249: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
losses.update(loss.data[0], inputs.size(0))
imagenet.py:250: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top1.update(prec1[0], inputs.size(0))
imagenet.py:251: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number
top5.update(prec5[0], inputs.size(0))
Processing | | (33/3125) Data: 0.078s | Batch: 0.083s | Total: 0:00:03 | ETA: 0:04:16 | Loss: nan | top1: 8.9015 | top5: 45.2652

================================> Loss is very big... (i guess divergence)

$ python imagenet.py -a inception_v3 --data /data1/mirero/TESTBOARD/DLUTIL/DLUTIL_V5/classification/keras/cifar10_256 --epochs 10 --schedule 31 61 --gamma 0.1 -c checkpoint/imagenet/resnet18 --train-batch 16 --test-batch 16 --gpu-id 1

=>
RuntimeError: Expected tensor for argument #1 'input' to have the same dimension as tensor for 'result'; but 4 does not equal 2 (while checking arguments for cudnn_convolution)

Thanks,
Edward Cho.

Mean and Std in Normalization

I have a quick question regarding normalization transform. The mean and Std which is being used here, are calculated for CIFAR dataset or ImageNet dataset?
I saw a repo which was using the same values for ImageNet dataset.

Extracting weights from Model Checkpoint & using in TF/Keras

Can anyone tell me how can I extract the weight values from the model_checkpoint.pth.tar and use those in Tensorflow, as pre-trained weights?

Error(s) in loading state_dict for DataParallel:

When I download the pre_trained model and resume it. there is an error.
model.load_state_dict(checkpoint['state_dict'])
It seems that the name are not matched.(e.g. "module.features.0.weight" v.s. "features.module.0.weight")
How could I solve it if I wish to use the pre_trained model on Cifar10?
Thank you !

Traceback (most recent call last):
File "test_0.py", line 130, in
model = load_model()
File "test_0.py", line 104, in load_model
model.load_state_dict(checkpoint['state_dict'])
File "/home/cosmo/anaconda3/envs/tf8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
self.class.name, "\n\t".join(error_msgs)))
RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.features.0.weight", "module.features.0.bias", "module.features.3.weight", "module.features.3.bias", "module.features.6.weight", "module.features.6.bias", "module.features.8.weight", "module.features.8.bias", "module.features.10.weight", "module.features.10.bias", "module.classifier.weight", "module.classifier.bias".
Unexpected key(s) in state_dict: "features.module.0.weight", "features.module.0.bias", "features.module.3.weight", "features.module.3.bias", "features.module.6.weight", "features.module.6.bias", "features.module.8.weight", "features.module.8.bias", "features.module.10.weight", "features.module.10.bias", "classifier.weight", "classifier.bias".

Why did you use Basic Block and Bottleneck Block?

Hi Bearpaw,

Nice implementation!
However, in the paper the authors for CIFAR10 just pad the input to match the dimensions before the summation if there was a downsampling - every 2n layers.

Why did you make use of Basic Block until layer 44 and Bottleneck Block beyond that?

Regards,
Pablo

Depth should be 6n+8 instead 6n+2 in CIFAR-10?

Hi, thanks for this nice code!
In my understanding, it seems that you didn't count downsampling blocks as depth 2 layers?

The parameters count is different from torchsion resnet.

I know that the resnet20 implementation has some modifications, but why the parameters count is different from torchvision by a large margin (0.79M vs. 11.6M). It is a huge difference.

Missing densenet-bc-100-12 weights for cifar100 on OneDrive

The OneDrive folder for densenet-bc-100-12 (cifar100) only contains log.txt.

How to use muti-gpus ？

what's the command to use muti-gpus?

ImportError: No module named progress.bar

In the file /utils/init.py, it seems that there is a spare dot in the code:
from progress**.**bar import Bar as Bar

May it should be changed into "from progressbar import Bar as Bar"?

tensor(89.0110, device='cuda:0')

cifar 100 feature extraction using WRN

Thank you for sharing this codes,
I want to use your WRN model to extract features from cifar 100 and feed them into the some recurrent neural network for some zero shot image tagging problem.
how can i do this?! i need feature vectors before the last layers of the model.
can you help me? please.

would you release the imagenet pretrain model and log file?

thx!

Inconsistent ResNeXt-29 (16x64) trained model on CIFAR10

The ResNeXt-29 (16x64) trained model on CIFAR10 in OneDrive seems to be inconsistent with the result listed in the table. Will it be possible to upload the correct version?

The pretrain cifar10 resnet110 indeed is resnet164 (BottleNeck)

I found the pretrain cifar10 model resnet110 is not resnet110, but resnet164.
The model is:
model = resnet(depth = 164, block_name='bottleNeck')
Use this model can load the state_dict sucessfully, but I haven't check the accuracy.
btw, the state_dict contain 'module', we can load the state_dict like this:

def load_parallel_weight(model, weight):
    state_dict = torch.load(weight)['state_dict']
    new_dict={}
    for w in state_dict:
        new_dict['.'.join(filter(lambda x:x!="module", w.split('.')))] = state_dict[w]
    model.load_state_dict(new_dict)

Have you tried densenet 40 on CIFAR100?

I have just get the acc of 70%, lower than the acc of 75.58% in paper.

How to change code for CPU

I don't have gpu in my laptop, is there anyway i can run it on my cpu?

I get this error:

How about the shortcut A ?

Hi，thank you for your code firstly but I'm confused why you using ShortcutB(1*1 conv) when the size of feature map is halved. Because as far as I know, ShortcutA(insert 0) is used in the net of cifar10 dataset in the original paper. I have tried to train resnet with ShortcutA with Pytorch on cifar10 dataset, but I couldn't get such a good result as you reported. I don't think it's the problem of shotcut since I couldn't get the result as good as the original paper either. Have you tried ShortcutA, How about the result? Thank you.

Accuracy of training

Hi, I am try to use your modified model on cifar100.
But my alexnet's accuracy just can be around 42%; and vgg19_bn just can be 62%around ,63.46% most when the 130 epoch .
May be some parameters I set are wrong?

Learning Rate and Number of Epochs to run

Hello,

I was looking to use the Imagenet.py file for training a Resnet50 model from scratch. I was getting confused as to how many epochs should I train my network and how should the learning rate change with the epochs. I see in your code you have changed it after 120 and 225 epochs.

I see for Resnet Paper, they have used a different scheme, where seems to be chnaging learning rate after each 31 epoch. function Trainer:learningRate(epoch)

   -- Training schedule
   local decay = 0
   if self.opt.dataset == 'imagenet' then
      decay = math.floor((epoch - 1) / 30)
   elseif self.opt.dataset == 'cifar10' then
      decay = epoch >= 122 and 2 or epoch >= 81 and 1 or 0
   elseif self.opt.dataset == 'cifar100' then
      decay = epoch >= 122 and 2 or epoch >= 81 and 1 or 0
   end
   return self.opt.LR * math.pow(0.1, decay)
end

return M.Trainer

I was just confused at what epoch should I change my learning rate and how many epochs should I train for. Any pointer would be really appreciated.

Regards,
Nitin

draw the accuracy curce such as ./utils/images?

how could i do to draw the accuracy curve like ./utils/image?i try to run visulize.py,which is helpless

Checkpoints are unaccessible

I was trying to unpack your "model_best.pth.tar" and "checkpoint.pth.tar", unsuccessfully.
I have tried using "tar -xvf checkpoint.pth.tar" on both my mac and on linux, but I got these errors:
on mac:
tar: Error opening archive: Unrecognized archive format

on linux:
tar: This does not look like a tar archive
tar: Skipping to next header
tar: Exiting with failure status due to previous errors

Can you please upload the "checkpoint.pth" and "model_best.pth" in their unpacked format?
I need the paths for:
alexnet
Resnet101
DenseNet-BC(L=190,k=40)

run memory of DenseNet-BC?

if I want to run DenseNet-BC (L=190, k=40) on cifar100 and batch_size=64,How much memory is needed？How about ResNeXt-29, 16x64？

About the depth of resnet56 with bottleneck as building block

Hi~ Thanks a lot for your excellent work!

I guess after executing the code below with depth = 56, we get n = 9, block = Bottleneck. But if fact, Bottleneck contains 3 conv, the number of weighted layer is 9*9+2 = 83 instead of 6*9+2 = 56.

        assert (depth - 2) % 6 == 0, 'depth should be 6n+2'
        n = (depth - 2) // 6

        block = Bottleneck if depth >=44 else BasicBlock

Thus, I suggest change it to make sure the number of parameters keep the same.

        if depth >= 44:
            assert (depth - 2) % 9 == 0
            n = (depth - 2) // 9
            block = Bottleneck

        else:
            assert (depth - 2) % 6 == 0, 'depth should be 6n+2'
            n = (depth - 2) // 6
            block = BasicBlock

Problems with the checkpoints of the PreResNet110

We use your checkpoints but I derive the best results at step 1. Is It the best model parameters or the pretrained models. I'm not sure about it and hope you could help me and clarify it. Thank you

Error loading pretrained model weights

When I try resuming from the pretrained model weight I get an error
This is what I am running:
python cifar.py -a preresnet --depth 110 --epochs 3 --schedule 81 122 --gamma 0.1 --wd 1e-4 --checkpoint checkpoints/cifar10/preresnet-110 --resume 'checkpoint.pth.tar'

and this is the error:

RuntimeError: Error(s) in loading state_dict for DataParallel:
Missing key(s) in state_dict: "module.bn.weight", "module.bn.bias", "module.bn.running_mean", "module.bn.running_var".
Unexpected key(s) in state_dict: "module.bn1.weight", "module.bn1.bias", "module.bn1.running_mean", "module.bn1.running_var", "module.layer1.0.conv3.weight", "module.layer1.0.bn3.weight", "module.layer1.0.bn3.bias", "module.layer1.0.bn3.running_mean", "module.layer1.0.bn3.running_var", "module.layer1.0.downsample.0.weight", "module.layer1.0.downsample.1.weight", "module.layer1.0.downsample.1.bias", "module.layer1.0.downsample.1.running_mean", "module.layer1.0.downsample.1.running_var", "module.layer1.1.conv3.weight", "module.layer1.1.bn3.weight", "module.layer1.1.bn3.bias", "module.layer1.1.bn3.running_mean", "module.layer1.1.bn3.running_var", "module.layer1.2.conv3.weight", "module.layer1.2.bn3.weight", "module.layer1.2.bn3.bias", "module.layer1.2.bn3.running_mean", "module.layer1.2.bn3.running_var", "module.layer1.3.conv3.weight", "module.layer1.3.bn3.weight", "module.layer1.3.bn3.bias", "module.layer1.3.bn3.running_mean", "module.layer1.3.bn3.running_var", "module.layer1.4.conv3.weight", "module.layer1.4.bn3.weight", "module.layer1.4.bn3.bias", "module.layer1.4.bn3.running_mean", "module.layer1.4.bn3.running_var", "module.layer1.5.conv3.weight", "module.layer1.5.bn3.weight", "module.layer1.5.bn3.bias", "module.layer1.5.bn3.running_mean", "module.layer1.5.bn3.running_var", "module.layer1.6.conv3.weight", "module.layer1.6.bn3.weight", "module.layer1.6.bn3.bias", "module.layer1.6.bn3.running_mean", "module.layer1.6.bn3.running_var", "module.layer1.7.conv3.weight", "module.layer1.7.bn3.weight", "module.layer1.7.bn3.bias", "module.layer1.7.bn3.running_mean", "module.layer1.7.bn3.running_var", "module.layer1.8.conv3.weight", "module.layer1.8.bn3.weight", "module.layer1.8.bn3.bias", "module.layer1.8.bn3.running_mean", "module.layer1.8.bn3.running_var", "module.layer1.9.conv3.weight", "module.layer1.9.bn3.weight", "module.layer1.9.bn3.bias", "module.layer1.9.bn3.running_mean", "module.layer1.9.bn3.running_var", "module.layer1.10.conv3.weight", "module.layer1.10.bn3.weight", "module.layer1.10.bn3.bias", "module.layer1.10.bn3.running_mean", "module.layer1.10.bn3.running_var", "module.layer1.11.conv3.weight", "module.layer1.11.bn3.weight", "module.layer1.11.bn3.bias", "module.layer1.11.bn3.running_mean", "module.layer1.11.bn3.running_var", "module.layer1.12.conv3.weight", "module.layer1.12.bn3.weight", "module.layer1.12.bn3.bias", "module.layer1.12.bn3.running_mean", "module.layer1.12.bn3.running_var", "module.layer1.13.conv3.weight", "module.layer1.13.bn3.weight", "module.layer1.13.bn3.bias", "module.layer1.13.bn3.running_mean", "module.layer1.13.bn3.running_var", "module.layer1.14.conv3.weight", "module.layer1.14.bn3.weight", "module.layer1.14.bn3.bias", "module.layer1.14.bn3.running_mean", "module.layer1.14.bn3.running_var", "module.layer1.15.conv3.weight", "module.layer1.15.bn3.weight", "module.layer1.15.bn3.bias", "module.layer1.15.bn3.running_mean", "module.layer1.15.bn3.running_var", "module.layer1.16.conv3.weight", "module.layer1.16.bn3.weight", "module.layer1.16.bn3.bias", "module.layer1.16.bn3.running_mean", "module.layer1.16.bn3.running_var", "module.layer1.17.conv3.weight", "module.layer1.17.bn3.weight", "module.layer1.17.bn3.bias", "module.layer1.17.bn3.running_mean", "module.layer1.17.bn3.running_var", "module.layer2.0.conv3.weight", "module.layer2.0.bn3.weight", "module.layer2.0.bn3.bias", "module.layer2.0.bn3.running_mean", "module.layer2.0.bn3.running_var", "module.layer2.0.downsample.1.weight", "module.layer2.0.downsample.1.bias", "module.layer2.0.downsample.1.running_mean", "module.layer2.0.downsample.1.running_var", "module.layer2.1.conv3.weight", "module.layer2.1.bn3.weight", "module.layer2.1.bn3.bias", "module.layer2.1.bn3.running_mean", "module.layer2.1.bn3.running_var", "module.layer2.2.conv3.weight", "module.layer2.2.bn3.weight", "module.layer2.2.bn3.bias", "module.layer2.2.bn3.running_mean", "module.layer2.2.bn3.running_var", "module.layer2.3.conv3.weight", "module.layer2.3.bn3.weight", "module.layer2.3.bn3.bias", "module.layer2.3.bn3.running_mean", "module.layer2.3.bn3.running_var", "module.layer2.4.conv3.weight", "module.layer2.4.bn3.weight", "module.layer2.4.bn3.bias", "module.layer2.4.bn3.running_mean", "module.layer2.4.bn3.running_var", "module.layer2.5.conv3.weight", "module.layer2.5.bn3.weight", "module.layer2.5.bn3.bias", "module.layer2.5.bn3.running_mean", "module.layer2.5.bn3.running_var", "module.layer2.6.conv3.weight", "module.layer2.6.bn3.weight", "module.layer2.6.bn3.bias", "module.layer2.6.bn3.running_mean", "module.layer2.6.bn3.running_var", "module.layer2.7.conv3.weight", "module.layer2.7.bn3.weight", "module.layer2.7.bn3.bias", "module.layer2.7.bn3.running_mean", "module.layer2.7.bn3.running_var", "module.layer2.8.conv3.weight", "module.layer2.8.bn3.weight", "module.layer2.8.bn3.bias", "module.layer2.8.bn3.running_mean", "module.layer2.8.bn3.running_var", "module.layer2.9.conv3.weight", "module.layer2.9.bn3.weight", "module.layer2.9.bn3.bias", "module.layer2.9.bn3.running_mean", "module.layer2.9.bn3.running_var", "module.layer2.10.conv3.weight", "module.layer2.10.bn3.weight", "module.layer2.10.bn3.bias", "module.layer2.10.bn3.running_mean", "module.layer2.10.bn3.running_var", "module.layer2.11.conv3.weight", "module.layer2.11.bn3.weight", "module.layer2.11.bn3.bias", "module.layer2.11.bn3.running_mean", "module.layer2.11.bn3.running_var", "module.layer2.12.conv3.weight", "module.layer2.12.bn3.weight", "module.layer2.12.bn3.bias", "module.layer2.12.bn3.running_mean", "module.layer2.12.bn3.running_var", "module.layer2.13.conv3.weight", "module.layer2.13.bn3.weight", "module.layer2.13.bn3.bias", "module.layer2.13.bn3.running_mean", "module.layer2.13.bn3.running_var", "module.layer2.14.conv3.weight", "module.layer2.14.bn3.weight", "module.layer2.14.bn3.bias", "module.layer2.14.bn3.running_mean", "module.layer2.14.bn3.running_var", "module.layer2.15.conv3.weight", "module.layer2.15.bn3.weight", "module.layer2.15.bn3.bias", "module.layer2.15.bn3.running_mean", "module.layer2.15.bn3.running_var", "module.layer2.16.conv3.weight", "module.layer2.16.bn3.weight", "module.layer2.16.bn3.bias", "module.layer2.16.bn3.running_mean", "module.layer2.16.bn3.running_var", "module.layer2.17.conv3.weight", "module.layer2.17.bn3.weight", "module.layer2.17.bn3.bias", "module.layer2.17.bn3.running_mean", "module.layer2.17.bn3.running_var", "module.layer3.0.conv3.weight", "module.layer3.0.bn3.weight", "module.layer3.0.bn3.bias", "module.layer3.0.bn3.running_mean", "module.layer3.0.bn3.running_var", "module.layer3.0.downsample.1.weight", "module.layer3.0.downsample.1.bias", "module.layer3.0.downsample.1.running_mean", "module.layer3.0.downsample.1.running_var", "module.layer3.1.conv3.weight", "module.layer3.1.bn3.weight", "module.layer3.1.bn3.bias", "module.layer3.1.bn3.running_mean", "module.layer3.1.bn3.running_var", "module.layer3.2.conv3.weight", "module.layer3.2.bn3.weight", "module.layer3.2.bn3.bias", "module.layer3.2.bn3.running_mean", "module.layer3.2.bn3.running_var", "module.layer3.3.conv3.weight", "module.layer3.3.bn3.weight", "module.layer3.3.bn3.bias", "module.layer3.3.bn3.running_mean", "module.layer3.3.bn3.running_var", "module.layer3.4.conv3.weight", "module.layer3.4.bn3.weight", "module.layer3.4.bn3.bias", "module.layer3.4.bn3.running_mean", "module.layer3.4.bn3.running_var", "module.layer3.5.conv3.weight", "module.layer3.5.bn3.weight", "module.layer3.5.bn3.bias", "module.layer3.5.bn3.running_mean", "module.layer3.5.bn3.running_var", "module.layer3.6.conv3.weight", "module.layer3.6.bn3.weight", "module.layer3.6.bn3.bias", "module.layer3.6.bn3.running_mean", "module.layer3.6.bn3.running_var", "module.layer3.7.conv3.weight", "module.layer3.7.bn3.weight", "module.layer3.7.bn3.bias", "module.layer3.7.bn3.running_mean", "module.layer3.7.bn3.running_var", "module.layer3.8.conv3.weight", "module.layer3.8.bn3.weight", "module.layer3.8.bn3.bias", "module.layer3.8.bn3.running_mean", "module.layer3.8.bn3.running_var", "module.layer3.9.conv3.weight", "module.layer3.9.bn3.weight", "module.layer3.9.bn3.bias", "module.layer3.9.bn3.running_mean", "module.layer3.9.bn3.running_var", "module.layer3.10.conv3.weight", "module.layer3.10.bn3.weight", "module.layer3.10.bn3.bias", "module.layer3.10.bn3.running_mean", "module.layer3.10.bn3.running_var", "module.layer3.11.conv3.weight", "module.layer3.11.bn3.weight", "module.layer3.11.bn3.bias", "module.layer3.11.bn3.running_mean", "module.layer3.11.bn3.running_var", "module.layer3.12.conv3.weight", "module.layer3.12.bn3.weight", "module.layer3.12.bn3.bias", "module.layer3.12.bn3.running_mean", "module.layer3.12.bn3.running_var", "module.layer3.13.conv3.weight", "module.layer3.13.bn3.weight", "module.layer3.13.bn3.bias", "module.layer3.13.bn3.running_mean", "module.layer3.13.bn3.running_var", "module.layer3.14.conv3.weight", "module.layer3.14.bn3.weight", "module.layer3.14.bn3.bias", "module.layer3.14.bn3.running_mean", "module.layer3.14.bn3.running_var", "module.layer3.15.conv3.weight", "module.layer3.15.bn3.weight", "module.layer3.15.bn3.bias", "module.layer3.15.bn3.running_mean", "module.layer3.15.bn3.running_var", "module.layer3.16.conv3.weight", "module.layer3.16.bn3.weight", "module.layer3.16.bn3.bias", "module.layer3.16.bn3.running_mean", "module.layer3.16.bn3.running_var", "module.layer3.17.conv3.weight", "module.layer3.17.bn3.weight", "module.layer3.17.bn3.bias", "module.layer3.17.bn3.running_mean", "module.layer3.17.bn3.running_var".
size mismatch for module.layer1.0.conv1.weight: copying a param with shape torch.Size([16, 16, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.1.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.2.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.3.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.4.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.5.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.6.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.7.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.8.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.9.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.10.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.11.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.12.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.13.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.14.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.15.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.16.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer1.17.conv1.weight: copying a param with shape torch.Size([16, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).
size mismatch for module.layer2.0.bn1.weight: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.bias: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.running_mean: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.bn1.running_var: copying a param with shape torch.Size([32]) from checkpoint, the shape in current model is torch.Size([16]).
size mismatch for module.layer2.0.conv1.weight: copying a param with shape torch.Size([32, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 16, 3, 3]).
size mismatch for module.layer2.0.downsample.0.weight: copying a param with shape torch.Size([128, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 16, 1, 1]).
size mismatch for module.layer2.1.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.2.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.3.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.4.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.5.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.6.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.7.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.8.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.9.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.10.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.11.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.12.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.13.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.14.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.15.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.16.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer2.17.conv1.weight: copying a param with shape torch.Size([32, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).
size mismatch for module.layer3.0.bn1.weight: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.bias: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.running_mean: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.bn1.running_var: copying a param with shape torch.Size([64]) from checkpoint, the shape in current model is torch.Size([32]).
size mismatch for module.layer3.0.conv1.weight: copying a param with shape torch.Size([64, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 32, 3, 3]).
size mismatch for module.layer3.0.downsample.0.weight: copying a param with shape torch.Size([256, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 32, 1, 1]).
size mismatch for module.layer3.1.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.2.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.3.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.4.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.5.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.6.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.7.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.8.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.9.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.10.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.11.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.12.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.13.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.14.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.15.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.16.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.layer3.17.conv1.weight: copying a param with shape torch.Size([64, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).
size mismatch for module.fc.weight: copying a param with shape torch.Size([100, 256]) from checkpoint, the shape in current model is torch.Size([10, 64]).
size mismatch for module.fc.bias: copying a param with shape torch.Size([100]) from checkpoint, the shape in current model is torch.Size([10]).

about visualization about training

Hi, @bearpaw,

Thanks for releasing this useful package. BTW, during the training , how could I visualize the loss diagram like the Tensorboard used in Tensorflow?

THX!

MNIST Dataset

Similar benchmarks on MNIST dataset would be great.

RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

For vgg16, there are three classifier layers in the provided checkpoint but only one in the model

When I am loading the checkpoint of vgg16, which is downloaded from https://download.pytorch.org/models/vgg16-397923af.pth, there is an error:

RuntimeError: Error(s) in loading state_dict for VGG:
Missing key(s) in state_dict: "classifier.weight", "classifier.bias".
Unexpected key(s) in state_dict: "classifier.0.weight", "classifier.0.bias", "classifier.3.weight", "classifier.3.bias", "classifier.6.weight", "classifier.6.bias".

So I find that there are three classifier layers(classifier.0\3\6) in the provided checkpoint but only one classifier layers (classifier) in the model, can anyone tell me why there is a mismatch problem?
Thank you!

Accuracy of the training

Hi,

I ran the ResNeXt-29, 8x64 model on cifar100, but I cannot get the error rate 17.38%. What I got is around 18.5% (I ran it with different random seed for multiple times). Can anyone tell me what's wrong with it? I didn't change the code.

CIFAR-10 does not have resnet18

Why CIFAR-10 does not have resnet18? Is the model architecture do not suit or some other reasons

about how to inference

hi there, thanks for your work, I use your code in training my data, and it works well, but it seems doesn't have inference function, could you provide a copy or just give some insprit ? thanks!!

zero gpu usage when train imagenet

Hi, I have using the Resnet18 to training imagenet, however I find zero gpu usage when train imagenet. Is it also the same when you train it?

Alexnet CIFAR10 checkpoints don't load properly

Hi,
I've downloaded the checkpoints from your Onedrive about Alexnet CIFAR10 Bash recognize them as data and not tar archive, and in fact, you can load them directly in PyTorch and it will display all the weights tensors.
Maybe there was a mistake during the compression?

Anyway, when I load the state_dict (for both checkpoints) this error shows up:

 model = alexnet() 
 model.load_state_dict(torch.load('model_best.pth'))
 
 KeyError: 'unexpected key "acc" in state_dict'

Update: sorry for closing/reopening.

About tiny imagenet training problem

I have modified the imagenet.py by adding the lines after the "create model" codes:
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs,200)
changing the net to train on tiny imagenet.
But it has a problem: training top1 accuracy is about 79.6110 and top5 accuracy is about 91.6950;
while validation accuracys are only 0.7(top1) and 1.47(top5). Do you have any ideas on this weird problem?

The results of ResNeXt-50 (32x4d) on ImageNet

Many thanks for your valuable work!

I have trained ResNeXt-50 (32x4d) on ImageNet following the training recipes, and I got a 75% valid accuracy (the reported result is 77%).

I have not modified any code and configs, except I train it on 8xGPUs with torch.nn.DataParallel. Do you have any hints about the performance drop?

Thanks again for your work, and I hope to hear some insights from you!

PreResNet-110 on Cifar100, Top1 error rate is 26.47 rather than 23.65.

I trained PreResNet-110 on Cifar100 as training recipes. And I get top1 error rate: 26.47%.

How long does ImageNet take to train?

I am on a tight deadline. Could you tell me about how long ImageNet takes to train?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.