hysts / pytorch_image_classification Goto Github PK

PyTorch implementation of image classification models for CIFAR-10/CIFAR-100/MNIST/FashionMNIST/Kuzushiji-MNIST/ImageNet

License: MIT License

Python 72.03% Jupyter Notebook 27.97%

cifar10 computer-vision fashion-mnist imagenet pytorch

pytorch_image_classification's Introduction

Hi there 👋

pytorch_image_classification's People

Stargazers

Watchers

Forkers

dabiaoma guolz-ml jingliraysightmed noeagles agdolla kevinlemon hzhang57 qiaolin1992 lynnhongliu chituma110 barbecacov guanlongtianzi chzhour xianweilv fallingdust yaminibansal dsp6414 haiminzhang bneyshabur christinaliang allenmujie sarikayamehmet stas00 crazystoneonroad 201528014227051 garyfanhku janzd fighterfong ddeeppnneett xuchen86 hotshotabrog hungsing92 blakecheng apple635471 cxxpython9 mousechen riemannzeta1191 millerjohnp amose-yao janciswang yifan-guo-cwru lyzl2010 leon967 zhangkehua fengzifrank oniani yangsenwxy chengyuegongr zrh0712 yet124 glorytune wangbingok1118 chasingstar95 chenwanqianlxl wuzhan11 baopuzi ffa jackqu qingfengmingyue luckydoggy zengqi0730 antonizhubar jinkyukim-me caoliangjie martin2c51 zhangsushen1992 jizhihang lin-zhipeng chaos1992 chocl8camellia pulkitsingh lgl603 hcl66666 hust-vegetablebird mengtianxiang-mskj vsehwag junhua-zhang rongzhq tahsin314 ramasesh chicm atikur aryanraj315 akashshingha850 yuv4r4j soonhwan-kwon lilujunai dudu-github hell-to-heaven chilicy alientqh gyq716 chizhu pgsrv intjun stephenfang51 yifeng1992 huangtao36 smartparrot danleiq

pytorch_image_classification's Issues

What is WRN20-4 in the results?

This repo is very detailed and informative, thanks!
However, I have a little bit confused about the WRN20-4 in the results. Since the depth of WRN should satisfy (6*n)+4, and 20 depth is not satisfied. I want to know how to get the result of WRN-20-4?

https://github.com/hysts/pytorch_image_classification#results

Best!

how to train with my own dataset?

I have a dataset including many folders, each folder (as a class) contains images. So I want to train with my own dataset but I don't know how to set up my data structure. Thank you so much!

Very good job, the experiment is very detailed

this

how to add more algorithm for imagenet from pytorch-image-models

Thank you for your excellent work and share !!!

The result of pyramidnet or resnext is very good !
However, I want to try some more algorithms such as efficientnet, the github project pytorch-image-models you mentioned before has many algorithms, but the project is build to train and test in open dataset, not good at run our own dataset. Even I changed the code to run my own dataset, the result is worse than your project with the same algorithms.
So, could you give me some advice about how to add efficientnet from pytorch-image-models to your project for imagenet?
Thanks a lot !!!

how to create the label?

how to train custome datasets?

Do you have training logs?

Are there training logs available?

Slight inaccuracy in VGG model definition?

First of all, thanks for the great repository!

I think there might be a slight inaccuracy in the way VGG is defined here: https://github.com/hysts/pytorch_image_classification/blob/master/pytorch_image_classification/models/cifar/vgg.py

Specifically, in the make_stage function:

    def _make_stage(self, in_channels, out_channels, n_blocks):
        stage = nn.Sequential()
        for index in range(n_blocks):
            if index == 0:
                conv = nn.Conv2d(
                    in_channels,
                    out_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                )
            else:
                conv = nn.Conv2d(
                    out_channels,
                    out_channels,
                    kernel_size=3,
                    stride=1,
                    padding=1,
                )
            stage.add_module(f'conv{index}', conv)
            if self.use_bn:
                stage.add_module(f'bn{index}', nn.BatchNorm2d(out_channels))
            stage.add_module('relu', nn.ReLU(inplace=True))
        stage.add_module('pool', nn.MaxPool2d(kernel_size=2, stride=2))
        return stage

I think currently, only one relu module is being added, i.e. not one for each value of index. Whereas in the VGG paper, there should be a relu for each value of index. It seems like the fix for this should just be modifying the line:
stage.add_module('relu', nn.ReLU(inplace=True)), to make it
stage.add_module(f'relu{index}', nn.ReLU(inplace=True)).

problem when running command in the read.me

I'm having the following problem when running the read.me command and would really appreciate your help :

Traceback (most recent call last):
File "train.py", line 445, in
main()
File "train.py", line 371, in main
model, optimizer, opt_level=config.train.precision)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/frontend.py", line 358, in initialize
return _initialize(models, optimizers, _amp_state.opt_properties, num_losses, cast_model_outputs)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 171, in _initialize
check_params_fp32(models)
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_initialize.py", line 116, in check_params_fp32
name, buf.type()))
File "/mnt/cephfs/training/users/lilujun/miniconda3/envs/py/lib/python3.7/site-packages/apex-0.1-py3.7.egg/apex/amp/_amp_state.py", line 32, in warn_or_err
raise RuntimeError(msg)
RuntimeError: Found buffer total_ops with type torch.DoubleTensor, expected torch.cuda.FloatTensor.
When using amp.initialize, you need to provide a model with buffers
located on a CUDA device before passing it no matter what optimization level
you chose. Use model.to('cuda') to use the default device.

Minor typos

Two minor typos in se_resnet_preact :

se_resnet_preact.yaml: model.resnet_preact should be model.se_resnet_preact
se_resnet_preact.py: in initialize_weights(), change all module.biasd.* to module.bias.* (without d)

normalization

the normalization step might be wrong.

https://github.com/hysts/pytorch_image_classification/blob/master/dataloader.py#L134

the correct order is
https://github.com/kuangliu/pytorch-cifar/blob/master/main.py#L35

Any idea?

Own DataSet

Hi,

I am a new in this area. Could you please give me some suggestions about how can I train the model using my own dataset. Thanks in advance.

Kind regards,

Sean

分布式训练怎么设置？

您好，请问下分布式训练怎么设置？谢谢

Tset command

Hello, what is the test command?

Can you tell me the content of the config.yaml(in #16) when i test my own test dataset with any model

The question is If I want to use the trained model to test my own test dataset, what should I do?
thanks!

How could I test my own dataset?

I have a folder containing the images which need to evaluate. All images where stored as
"train dataset folder is:"
/path/female_bag/female_bag*.jpg,
/path/makeup_bag/makeup_bag*.jpg,

" test dataset folder is:"
/path/female_bag/female_bag*.jpg,
/path/makeup_bag/makeup_bag*.jpg,

How should I write the code to test those images?

Which files that I need to change if I want to test with only an image?

Hi @hysts , I've trained and evaluated successfully with my own dataset. And now I want to test with only an image, but I'm not sure about any files I need to change. Can you help me this problem? I've tried to change some files and some functions but it's not work.
Thank you so much!

Do you use a val/test split?

Hi, in your reported results, are these (1) best performing test accuracies without a validation set, (2) final test accuracies on the last epoch of training, or (3) test accuracies on the model checkpoint with the best performing validation accuracy?

Is there any pre-trained model?like pretrained on ImageNet.Because I have trained CIFAR100 from the head,but the accuarcy only reaches 60%

evaluate.py CheckPointer not found

https://github.com/hysts/pytorch_image_classification/blob/master/evaluate.py#L22
when I use ebaluate.py , cannot find CheckPointer

How could I change the code?

When I trying to use self-changed evaluate.py to evaluate my own dataset this error always came first. Do you know how could I change the code? I already add this :
transforms.ToTensor()in evaluate.py and changed the code in dataset.py as return self.transform(self.x[index]), self.transform(self.y[index])
Is there any other way to eliminate the error?

pretrained weights?

Hi,
Any chance to save the weights of those trainings?

if n_channels: 1 then RuntimeError

Thank you for your excellent work and share !!!
I have own dataset with channel 1, 64X64 gray scale images.
For all network(vgg16, resnet18 ...), if I set n_channels: 1 in yaml file, following error shows:

Traceback (most recent call last):
File "train.py", line 436, in
main()
File "train.py", line 404, in main
validate(0, config, model, val_loss, val_loader, logger,
File "train.py", line 259, in validate
outputs = model(data)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/apex/amp/_initialize.py", line 196, in new_fwd
output = old_fwd(*applier(args, input_caster),
File "/media/zzks/xi/2020PJ/lab/pytorch_image_classification/pytorch_image_classification/models/imagenet/vgg.py", line 80, in forward
x = self._forward_conv(x)
File "/media/zzks/xi/2020PJ/lab/pytorch_image_classification/pytorch_image_classification/models/imagenet/vgg.py", line 72, in _forward_conv
x = self.stage1(x)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/container.py", line 100, in forward
input = module(input)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 345, in forward
return self.conv2d_forward(input, self.weight)
File "/home/zzks/anaconda3/envs/dbnet/lib/python3.8/site-packages/torch/nn/modules/conv.py", line 341, in conv2d_forward
return F.conv2d(input, weight, self.bias, self.stride,
RuntimeError: Given groups=1, weight of size 64 1 3 3, expected input[100, 3, 64, 64] to have 1 channels, but got 3 channels instead

How to resume from early epoch's weights?

I can't find the way to resume from early epoch.

pytorch_image_classification/train.py

Line 573 in 1d76092

for epoch, seed in zip(range(1, optim_config['epochs'] + 1), epoch_seeds):

It always begin with epoch 1 from this code.

no_weight_decay_on_bn removes weight decay on FC

I notice that if no_weight_decay_on_bn is set to True, weight decay will only apply to conv.weight. It seems that weight decay on fc layers are also removed at the same time. Is there any reason to do so?

how to train use distributed pattern?

Assertion Error

Hi,

I was trying to run the code from terminal directly using

pip install -r requirements.txt
python train.py --config configs/cifar/resnet.yaml

However, I kept getting this assertion error:

AssertionError: Key env_info.cuda_version with value <class 'NoneType'> is not a valid type; 
valid types: {<class 'bool'>, <class 'float'>, <class 'tuple'>, <class 'int'>, <class 'list'>, <class 'str'>}

Below is the traceback message:

Traceback (most recent call last):
  File "train.py", line 436, in <module>
    main()
  File "train.py", line 340, in main
    save_config(get_env_info(config), output_dir / 'env.yaml')
  File "/Users/charlotte/Desktop/classification/pytorch_image_classification/utils/env_info.py", line 19, in get_env_info
    return ConfigNode({'env_info': info})
  File "/Users/charlotte/Desktop/classification/pytorch_image_classification/config/config_node.py", line 6, in __init__
    super().__init__(init_dict, key_list, new_allowed)
  File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 86, in __init__
    init_dict = self._create_config_tree_from_dict(init_dict, key_list)
  File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 126, in _create_config_tree_from_dict
    dic[k] = cls(v, key_list=key_list + [k])
  File "/Users/charlotte/Desktop/classification/pytorch_image_classification/config/config_node.py", line 6, in __init__
    super().__init__(init_dict, key_list, new_allowed)
  File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 86, in __init__
    init_dict = self._create_config_tree_from_dict(init_dict, key_list)
  File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 132, in _create_config_tree_from_dict
    ".".join(key_list + [str(k)]), type(v), _VALID_TYPES
  File "/Users/charlotte/opt/miniconda3/lib/python3.7/site-packages/yacs/config.py", line 525, in _assert_with_logging
    assert cond, msg

Can you please give me some hints about how to fix this? Thanks!

can I run the script without GPU&apex?

I'm on my ubuntu server(without GPU), trying to run your script.
pip install -r requirements.txt is successful (i ignored the apex requirement, don't have gpu now)
but when I runned the following command, I meet the following issue
and all other .yaml throwed me the same issue.

~/code/pytorch_image_classification# python train.py --config configs/cifar/resnet.yaml
Traceback (most recent call last):
File "train.py", line 449, in
main()
File "train.py", line 353, in main
save_config(get_env_info(config), output_dir / 'env.yaml')
File "/root/code/pytorch_image_classification/pytorch_image_classification/utils/env_info.py", line 19, in get_env_info
return ConfigNode({'env_info': info})
File "/root/code/pytorch_image_classification/pytorch_image_classification/config/config_node.py", line 6, in init
super().init(init_dict, key_list, new_allowed)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 86, in init
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 126, in _create_config_tree_from_dict
dic[k] = cls(v, key_list=key_list + [k])
File "/root/code/pytorch_image_classification/pytorch_image_classification/config/config_node.py", line 6, in init
super().init(init_dict, key_list, new_allowed)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 86, in init
init_dict = self._create_config_tree_from_dict(init_dict, key_list)
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 129, in _create_config_tree_from_dict
_assert_with_logging(
File "/root/archiconda3/envs/python38/lib/python3.8/site-packages/yacs/config.py", line 545, in _assert_with_logging
assert cond, msg
AssertionError: Key env_info.pytorch_version with value <class 'torch.torch_version.TorchVersion'> is not a valid type; valid types: {<class 'NoneType'>, <class 'list'>, <class 'bool'>, <class 'int'>, <class 'tuple'>, <class 'float'>, <class 'str'>}

where to find each epoch's error and draw the graph

Thanks hysts for sharing the package.
One question is where can I find the error rate of each epoch?