shichenliu / condensenet Goto Github PK

View Code? Open in Web Editor NEW

695.0 24.0 132.0 29 KB

CondenseNet: Light weighted CNN for mobile devices

License: MIT License

Python 97.21% Shell 2.79%

deep-learning pytorch mobile-device

condensenet's People

Contributors

Stargazers

Watchers

Forkers

grseb9s liuguoyou zgsxwsdxg fendaq adolfoeliazat jdc08161063 wanjinchang likeucode 646677064 10183308 ml-lab labimage kingofoz bazhijing liviust shubhampachori12110095 yhyu13 lukejiaor geevi hbcbh1999 andrewyakovenko suzhenghang dengshuo chuanxinlan keyky hengqujushi xinhandi armstrongyang xshhhm avirambh kjeanclaude conleykong terrych1995 deepxkn 3dmm-icme2023 objectdetection farmingyard mahlermozart xuhan0917 b2220333 hxl1990 tigercouple c-peng secretdragon modricwang xilaili myljimcia codes-kzhan longchuan1985 alexliyang lan1991xu dansonc vponcelo yenchih murari023 runauto xinw1012 changss hubert0527 xyt2008 leiloong xiahaifeng1995 yanangu cvtower shlpu ironfist2 boosting tangyoubao buaacyh scapeqin htw2012 chenbohua3 cgomez11 lawrencewxj rotorliu abhinavcode netphone shiyongde hzhang57 happog ashutosh-adhikari shirleyxting noelleivonette saigege codingboo hwenjun18 yuchaoli looson horaccefeng facexteam abhinavgoel95 huangwenwenlili gsx0 sirius083 cpsxhao bikeshc xugithub1 zhe-meng facex-team-for-learning czstudydl

condensenet's Issues

object detection using condensenet?

@ShichenLiu

Do you have a plan to implement object detection, like SSD or faster-rcnn using condensenet?

Thanks,

Accuracy in comparison with DenseNet?

Thanks for sharing your work. I am working with Densenet in the non-mobile environment (such as supper computer, self-driving...) for classification task. I am wondering about Condensenet and DenseNet performance in accuracy. Do you think which network should I used in term of accuracy, densenet or condensenet? Thanks for your advice

Testing on ARM

How did you test speed on ARM? Using Caffe2, pytorch compiled for ARM?

_dropping

A question about the _dropping function in the LearnedGroupConv class.
In 88 line: self.mask[i::self.groups, d, :, :].fill(0)
Why the first dimension of the mask is "i::self.groups"?
I thought it should be "i * d_out:(i + 1) * d_out".
Thank you!

cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/THCGeneral.cpp:383 I have only one 2080ti,but I encounter this bug, could you give me some suggestions,please?

+FDC (full dense connectivity) version

Hello,
would it be possible to upload +FDC model?
Since I am new to pytorch I am afraid of making mistakes when applying FDC by myself.

Pretrained models

Would you please share the pretrained model in Baidu disk or Google Drive? I could not access Drop Box.

How to run Condensenet without GPU?

@ShichenLiu and @gaohuang : I want to run this model on my machine without GPU. How do I exactly do this? What lines of code should I modify, if any?

Command: python main.py --model condensenet -b 64 -j 12 cifar10 --stages 14-14-14 --growth 8-16-32 --gpu 0 --resume requires my machine to have a GPU which my machine doesn’t have.

My goal is to run this model on CPU (machine without GPU). Thank you!

Running pretrained networks

Hi,
Cool ideas in regard to condense net!

I seem to have an issue running the pretrained network 'converted_condensenet_8.pth.tar'. Running the command:

python main.py --model condensenet_converted -b 32 -j 20 ~/imagenet/ --stages 4-6-8-10-8 --growth 8-16-32-64-128 --gpu 1 --resume --evaluate-from ../converted_condensenet_8.pth.tar

Results in an error:

Traceback (most recent call last):

  File "main.py", line 479, in <module>
    main()
  File "main.py", line 168, in main
    model.load_state_dict(state_dict)
  File "/usr/local/lib/python3.5/dist-packages/torch/nn/modules/module.py", line 487, in load_state_dict
    .format(name, own_state[name].size(), param.size()))
RuntimeError: While copying the parameter named module.features.denseblock_1.denselayer_1.conv_1.conv.weight, whose dimensions in the model are torch.Size([32, 4, 1, 1]) and whose dimensions in the checkpoint are torch.Size([32, 2, 1, 1]).

It appears that perhaps the parameters for 'stages' or 'growth' may not be set correctly? Please let me know if you have any insights!

dropout before convolution layer

Hi, I noticed that the dropout is placed before convolution layer,
In the original densenet-torch implementation, the order in each block is
BN-->relu-->conv-->dropout
Is there a particular reason for doing so?

    def forward(self, x):
        self._check_drop()
        x = self.norm(x)
        x = self.relu(x)
        if self.dropout_rate > 0:
            x = self.drop(x)
        ### Masked output
        weight = self.conv.weight * self.mask
        return F.conv2d(x, weight, None, self.conv.stride,
                        self.conv.padding, self.conv.dilation, 1)

is_best defining problem

Hi, I notice there is a minor problem in the code
https://github.com/ShichenLiu/CondenseNet/blob/master/main.py#L246

is_best = val_prec1 < best_prec1
best_prec1 = max(val_prec1, best_prec1)

should be changed to

is_best = val_prec1 < best_prec1
best_prec1 = min(val_prec1, best_prec1)

Since the val_prec1 returned from the train function is classification error rate
Otherwise, the trainer will save results in every epoch

should also change best_prec1 = 100 in the begin

about the inference time

Hi, i see the benchmark from README. I have some questions.
What's the platform used for inference time testing?
It's there any neon acceleration for depthwise conv in mobilenet?
There is a great difference between theoretical acceleration and actual acceleration. Although the amount of computation of mobilenet is twice as much as that of condensenet, I still want to know the speed difference after specific optimization.

Inference time on ARM platform

Model	FLOPs	Top-1	Time(s)
VGG-16	15,300M	28.5	354
ResNet-18	1,818M	30.2	8.14
1.0 MobileNet-224	569M	29.4	1.96
CondenseNet-74 (C=G=4)	529M	26.2	1.89
CondenseNet-74 (C=G=8)	274M	29.0	0.99

No shuffle layer when training condensenet?

Dear @ShichenLiu ,
I did not found any shuffle layer related stuff in models.condensenet, which use layers.LearnedGroupConv as LGC. However, the paper says we should use it clearly. Is it a mismatch?
Thanks

Transfer-learning with CondenseNet

Hello,
I am getting good accuracy with CondenseNet on my dataset when training from scratch, but I feel like I could boost the results if I could train from a checkpoint pretrained on ImageNet.
You offer the converted checkpoints on ImageNet, but my understanding of the situation is that I can't use that for transfer learning because its missing some dicts like the optimizer.
Since I am new to pytorch I feel like I am missing something. Is it possible to train from the converted checkpoint? If no, would it be possible to upload the unconverted model?

Question on CondensingConv from layers.py

Hi, thanks for your work, I have one question on the function CondensingConv from layers.py
https://github.com/ShichenLiu/CondenseNet/blob/master/layers.py#165

self.in_channels = model.conv.in_channels * model.groups // model.condense_factor

The input channel in a given convolutional layer in the paper is floor(R/C)
why is it different here?

Thanks in advance

Testing on ARM without CUDA and GPU

I am aware that you have tested CondenseNet model with PyTorch on CPU (which is an ARM processor) of Jetson TX2 which has Nvidia CUDA support.

However, can this model be tested with PyTorch on a ARM CPU/system without CUDA support i.e. using only CPU resources? We have NXP BlueBox 2.0 and it does not support CUDA.

At the moment, I am getting this error on my non-CUDA system:
RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.

After I add map_location=torch.device('cpu') to torch.load I get this error:
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

When I run the base script on a GPU machine with Nvidia CUDA, model testing runs without any issue.

Request to update to PyTorch Version 1.6.0 (Latest)

Can you please update CondenseNet to support PyTorch's latest version 1.6.0? Thanks

RuntimeError: INDICES element is out of DATA bounds

Whenever I run the following command:

python main.py --model condensenet_converted -b 64 -j 4 C:\CondenseNet-master --stages 4-6-8-10-8 --growth 8-16-32-64-128 --group-1x1 4 --group-3x3 4 --condense-factor 4 --evaluate-from C:\CondenseNet-master/converted_condensenet_4.pth.tar --gpu 0

I get the following error:

RuntimeError: INDICES element is out of DATA bounds, id=53888868763566084 axis_dim=2064

any idea how to solve this issue?

Thank you in advance.

condensenet-86 parameters number different from torchsummary

Hi, I noticed that condensenet-86 on cifar10 is 0.52M on cifar10
However using torchsummary package, the total calculated params are as follow:

the parameters are calculated as

    from torchsummary import summary
    summary(model, (3, 32, 32), device="cpu")
    exit(0)

Do you know why is there the difference?
Thanks in advance

Lambda value for group lasso

Hi, this question might be trivial but what is the exact value for group_lasso_lambda that you are using? Is it 1e-5 according to the paper? Thanks!

How about speed in pytorch implementation

Question on CondensingLinear

Hi I notice the aim of the function(CondensingLinear) is mentioned in this answer #6 (comment),
Does CondensingLinear in layers.py only be used at convert_model phase, not in the training phase?
Therefore the evaluation error on converted model will be higher than the model in the
training phase, right? It is like directly pruning out 50% of the last fully connected layers
without finetuning.

Architecture of CondenseNet{light-160, 182, light-94, 84}

Hi,
The paper mentions CondenseNet{light-160*, 182*, light-94, 84} for CIFAR, though is not clear about the details of the architecture. Could you share the architectures and how those results can be reproduced?

Group lasso regularization effect for ImageNet

Hi, the paper states that the group lasso term is added to the total cost function, with the coefficient of 1e-5 on ImageNet Dataset. Have you compared the model without group lasso term on the same Dataset? In other words, the group lasso term improves the final validation accuracy by how much percent?

Thanks in advance

Error message: variables needed for gradient computation has been modified by an inplace operation

Hello dear brother,
i am running condensenet and densent_LGC in pycahem IDE with Python 3.6.5 :: Anaconda, Inc. with following configurations:
dataset - cifar10
epochs - 200
bottleneck - 3
growth - 12-12-12
C=G=4
batch size = 64

i am getting this error exactly at 34th epoch every time. i have tried to remove 'inplace addition operations' and ' inplace=True' from Relu, but nothing has worked . Can you please help, how can it be fixed. Regards

Epoch - 31
* Accuracy@1 89.150 Accuracy@5 99.520
Epoch - 32
* Accuracy@1 88.120 Accuracy@5 99.610
Epoch - 33
* Accuracy@1 88.640 Accuracy@5 99.630
Epoch - 34
Traceback (most recent call last):
File "/home/supernet/PycharmProjects/untitled/nets/CondenseNet-Nauman/main.py", line 497, in
main()
File "/home/supernet/PycharmProjects/untitled/nets/CondenseNet-Nauman/main.py", line 257, in main
train(train_loader, model, criterion, optimizer, epoch)
File "/home/supernet/PycharmProjects/untitled/nets/CondenseNet-Nauman/main.py", line 340, in train
loss.backward()
File "/home/supernet/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/supernet/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

DenseNet-121 is faster than CondenseNet-74 (C=G=4) on GTX 1080 Ti

I compared the forward pass speed of the larger ImageNet model with DenseNet-121 and the latter actually works faster. After benchmarking my guess is that CondenseConv layer is the cause of the slowdown due to memory transfers in ShuffleLayer and torch.index_select.
@ShichenLiu can you comment on this, did you get better performance compared to DenseNet-121 in your experiments?

Questions on implementation of dropping

Hi,

I have some questions on the consistency of implementation of dropping and paper.

When you take the sum, you did not use absolute values as written in the paper.

CondenseNet/layers.py

Line 86 in 833a91d

di = wi.sum(0).sort()[1][self.count:self.count + delta]
You drop during the stage, not when the stage finishes, as written in the paper.

CondenseNet/layers.py

Line 62 in 833a91d

if not self._reach_stage(stage):

Am I wrong or would you explain about it ? Thank you.

CondeseNet-182* on Cifar100 validation 1 error rate is 19.73% where in paper is 18.47%

Hi, I run CondenseNet-182* using command provided by issue 11 on Cfiar100

python main.py --model condensenet -b 64 -j 2 cifar100 --epochs 600 --stages 30-30-30 --growth 12-24-48

The result of the first run is 19.73%, the result in the second run is 19. 86%
The result in the paper is 18.47% (Table)
I just used all the default arguments in the code provide, do we need to make other changes?

Thanks in advance

Training Time issue when training Condensenet-light on cifar100

Hi,
I am reproducing your work in tensorflow, but I found that dropping during training has taken a lot of time. I would like to ask if you have encountered such a problem. What do you think might be the reason?

Cuda runtime error ClassNLLCriterion assertion

Hi,

I am facing the following problem when I attempt to train the network with:

python main.py --model condensenet -b 256 -j 26 person-reid/market1501 --stages 4-6-8-10-8 --growth 8-16-32-64-128 --gpu 0,1,2,3 --savedir person-reid/results_market1501-24kgen --resume

/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [22,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [23,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [24,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [25,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [26,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [29,0,0] Assertion `t >= 0 && t < n_classes` failed.
/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THCUNN/ClassNLLCriterion.cu:57: void cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [30,0,0] Assertion `t >= 0 && t < n_classes` failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.c line=32 error=59 : device-side assert triggered
Traceback (most recent call last):
  File "main.py", line 480, in <module>
    main()
  File "main.py", line 239, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "main.py", line 314, in train
    prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
  File "main.py", line 474, in accuracy
    correct_k = correct[:k].view(-1).float().sum(0)
  File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/tensor.py", line 43, in float
    return self.type(type(self).__module__ + '.FloatTensor')
  File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/cuda/__init__.py", line 278, in type
    return super(_CudaBase, self).type(*args, **kwargs)
  File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/_utils.py", line 35, in _type
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/THCTensorCopy.cu:204
terminate called without an active exception
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/THCTensorCopy.cu line=204 error=59 : device-side assert triggered

[...]

The assertion error also occurs in the line 362:

Traceback (most recent call last):
  File "main.py", line 480, in <module>
    main()
  File "main.py", line 242, in main
    val_prec1, val_prec5 = validate(val_loader, model, criterion)
  File "main.py", line 362, in validate
    losses.update(loss.data[0], input.size(0))
RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.c:32
THCudaCheckWarn FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down
THCudaCheckWarn FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/THCStream.cpp line=50 error=29 : driver shutting down

Training and testing images are 64x128, and I also tried by resizing only the training images to 256x256.

It seems to be caused by an inconsistency with the number of classes that I am trying to figure out. In the evaluation, it might occur that there are no samples for some of the test classes, which I noticed it can be problematic for your network if the directory classes do not match properly. A successful solution I have found for this in another dataset I am working is to create the same set of class-directories with exactly the same name both in training train and testing val partitions, leaving empty those class-directories where there are no samples for that class in testing. In this dataset, however, I get that error which is a bit confusing to me.

Another question I have is whether your network can be used to classify images of classes that exist in the test partition but not in the train partition. For instance, in a dataset where half of the classes are used for training and the other half for testing.

I would appreciate any comment if you have any clue about what might be causing that error and the last question about the classes.

Thanks a lot

Question on dropping function

Hi, I have one question on function dropping in layers.py.
I don't understand why learned group convolution still needs the shuffling operation?

        weight = weight.view(d_out, self.groups, self.in_channels)
        weight = weight.transpose(0, 1).contiguous()
        weight = weight.view(self.out_channels, self.in_channels)

https://github.com/ShichenLiu/CondenseNet/blob/master/layers.py#L78

I notice there is a shuffle operation mentioned in 4.1's first graph:
"we permute the output channels of the first 1x1_conv learned group convolution layer,
such that the features generated by each of its groups are evenly used by all the groups of
the subsequent 3x3 group convolutional layer"
However, this operation aims to shuffle feature maps, not convolutional kernels.

Can you explain a little bit?
Thanks in advance

Question on clamp

Hi, I have a question on clap weights
https://github.com/ShichenLiu/CondenseNet/blob/master/layers.py#L125

weight = weight.sum(0).clamp(min=1e-6).sqrt()

I don't understand the clamp function here. I tried to train condensenet-86 on cifar10 . with and without clamp functions
with clamp: error rate = 95.06
without clamp: error rate = 94.96

Thanks in advance

Out of memory issue when training a new dataset

Hi,

I am attempting to reproduce your code for the CondenseNet for training a dataset of 7 classes, and approximately between 100K - 150K training images splitted (non-equally) for those clases. My images consist of bounding boxes of different sizes. For that, first I'm using a similar setting you use to train the ImageNet, pointing to my dataset and preparing the class folders to find the paths properly. I resized all images to 256x256 as you did in your paper. Therefore, this is the command line I use for training the new dataset:

python main.py --model condensenet -b 256 -j 28 lima_train --stages 4-6-8-10-8 --growth 8-16-32-64-128 --gpu 0 --resume

where lima_train is a link file pointing to the folder containing all training data splitted in class subfolders as required.

I'm using a datacenter whose GPU nodes use NVIDIA Tesla P100 of 16 GB each, and CUDA 8 with cuDNN. In this sense, I presume the training should not be a problem. I understand that a GPU of 16GB or even 8GB should be enough to train this network, shouldn't be? However, I'm getting the out of memory problem shown below. I modified the parameters to reduce the batch size to 64 and the number of workers according to the machine. Probably I am missing some step or I should modify the command line according to the settings of my data.

I would appreciate any feedback.

Thanks in advance and congratulations for this work.

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "main.py", line 479, in <module> main() File "main.py", line 239, in main train(train_loader, model, criterion, optimizer, epoch) File "main.py", line 303, in train output = model(input_var, progress) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 58, in forward return self.module(*inputs[0], **kwargs[0]) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/CondenseNet/models/condensenet.py", line 127, in forward features = self.features(x) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/container.py", line 67, in forward input = module(input) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/CondenseNet/models/condensenet.py", line 33, in forward x = self.conv_1(x) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/CondenseNet/layers.py", line 42, in forward x = self.norm(x) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(*input, **kwargs) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/modules/batchnorm.py", line 37, in forward self.training, self.momentum, self.eps) File "/mnt/storage/home/vp17941/.conda/envs/condensenet/lib/python3.6/site-packages/torch/nn/functional.py", line 639, in batch_norm return f(input, weight, bias) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503970438496/work/torch/lib/THC/generic/THCStorage.cu:66 srun: error: gpu09: task 0: Exited with exit code 1

Full Dense Connectivity

Hi this is not an issue. I am trying to reproduce condenseNet in tensorflow. But in which part of the code is the implementation of we connect input layers to all subsequent layers in the network, even if these layers are located in different dense blocks as mentioned in the paper available. I can see only one place inside the dense layers where torch.cat function is used where it connects the inputs inside one dense block. Thanks in advance.

What are the Training Arguments for ImageNet Pre-Trained Model?

Getting following error message when running the trained ImageNet model for image classification on my machine, which I downloaded from author's Dropbox link posted in this repo's readme link:

model.load_state_dict(torch.load(PATH, map_location=torch.device("cpu"))[\'state_dict\'])\n', ' File "C:\\Program Files\\Python36\\lib\\site-packages\\torch\\nn\\modules\\module.py", line 1052, in load_state_dict\n self.__class__.__name__, "\\n\\t".join(error_msgs)))\n', 'RuntimeError: Error(s) in loading state_dict for DataParallel:\n\tMissing key(s) in state_dict: "module.features.denseblock_1.denselayer_1.conv_1._count", "module.features.denseblock_1.denselayer_1.conv_1._stage", "module.features.denseblock_1.denselayer_1.conv_1._mask", "module.features.denseblock_1.denselayer_2.conv_1._count", "module.features.denseblock_1.denselayer_2.conv_1._stage", "module.features.denseblock_1.denselayer_2.conv_1._mask", "module.features.denseblock_1.denselayer_3.conv_1._count", "module.features.denseblock_1.denselayer_3.conv_1._stage", "module.features.denseblock_1.denselayer_3.conv_1._mask", "module.features.denseblock_1.denselayer_4.conv_1._count", "module.features.denseblock_1.denselayer_4.conv_1._stage", "module.features.denseblock_1.denselayer_4.conv_1._mask", "module.features.denseblock_2.denselayer_1.conv_1._count", "module.features.denseblock_2.denselayer_1.conv_1._stage", "module.features.denseblock_2.denselayer_1.conv_1._mask", "module.features.denseblock_2.denselayer_2.conv_1._count", "module.features.denseblock_2.denselayer_2.conv_1._stage", "module.features.denseblock_2.denselayer_2.conv_1._mask", "module.features.denseblock_2.denselayer_3.conv_1._count", "module.features.denseblock_2.denselayer_3.conv_1._stage", "module.features.denseblock_2.denselayer_3.conv_1._mask", "module.features.denseblock_2.denselayer_4.conv_1._count", "module.features.denseblock_2.denselayer_4.conv_1._stage", "module.features.denseblock_2.denselayer_4.conv_1._mask", "module.features.denseblock_2.denselayer_5.conv_1._count", "module.features.denseblock_2.denselayer_5.conv_1._stage", "module.features.denseblock_2.denselayer_5.conv_1._mask", "module.features.denseblock_2.denselayer_6.conv_1._count", "module.features.denseblock_2.denselayer_6.conv_1._stage", "module.features.denseblock_2.denselayer_6.conv_1._mask", "module.features.denseblock_3.denselayer_1.conv_1._count", "module.features.denseblock_3.denselayer_1.conv_1._stage", "module.features.denseblock_3.denselayer_1.conv_1._mask", "module.features.denseblock_3.denselayer_2.conv_1._count", "module.features.denseblock_3.denselayer_2.conv_1._stage", "module.features.denseblock_3.denselayer_2.conv_1._mask", "module.features.denseblock_3.denselayer_3.conv_1._count", "module.features.denseblock_3.denselayer_3.conv_1._stage", "module.features.denseblock_3.denselayer_3.conv_1._mask", "module.features.denseblock_3.denselayer_4.conv_1._count", "module.features.denseblock_3.denselayer_4.conv_1._stage", "module.features.denseblock_3.denselayer_4.conv_1._mask", "module.features.denseblock_3.denselayer_5.conv_1._count", "module.features.denseblock_3.denselayer_5.conv_1._stage", "module.features.denseblock_3.denselayer_5.conv_1._mask", "module.features.denseblock_3.denselayer_6.conv_1._count", "module.features.denseblock_3.denselayer_6.conv_1._stage", "module.features.denseblock_3.denselayer_6.conv_1._mask", "module.features.denseblock_3.denselayer_7.conv_1._count", "module.features.denseblock_3.denselayer_7.conv_1._stage", "module.features.denseblock_3.denselayer_7.conv_1._mask", "module.features.denseblock_3.denselayer_8.conv_1._count", "module.features.denseblock_3.denselayer_8.conv_1._stage", "module.features.denseblock_3.denselayer_8.conv_1._mask", "module.features.denseblock_4.denselayer_1.conv_1._count", "module.features.denseblock_4.denselayer_1.conv_1._stage", "module.features.denseblock_4.denselayer_1.conv_1._mask", "module.features.denseblock_4.denselayer_2.conv_1._count", "module.features.denseblock_4.denselayer_2.conv_1._stage", "module.features.denseblock_4.denselayer_2.conv_1._mask", "module.features.denseblock_4.denselayer_3.conv_1._count", "module.features.denseblock_4.denselayer_3.conv_1._stage", "module.features.denseblock_4.denselayer_3.conv_1._mask", "module.features.denseblock_4.denselayer_4.conv_1._count", "module.features.denseblock_4.denselayer_4.conv_1._stage", "module.features.denseblock_4.denselayer_4.conv_1._mask", "module.features.denseblock_4.denselayer_5.conv_1._count", "module.features.denseblock_4.denselayer_5.conv_1._stage", "module.features.denseblock_4.denselayer_5.conv_1._mask", "module.features.denseblock_4.denselayer_6.conv_1._count", "module.features.denseblock_4.denselayer_6.conv_1._stage", "module.features.denseblock_4.denselayer_6.conv_1._mask", "module.features.denseblock_4.denselayer_7.conv_1._count", "module.features.denseblock_4.denselayer_7.conv_1._stage", "module.features.denseblock_4.denselayer_7.conv_1._mask", "module.features.denseblock_4.denselayer_8.conv_1._count", "module.features.denseblock_4.denselayer_8.conv_1._stage", "module.features.denseblock_4.denselayer_8.conv_1._mask", "module.features.denseblock_4.denselayer_9.conv_1._count", "module.features.denseblock_4.denselayer_9.conv_1._stage", "module.features.denseblock_4.denselayer_9.conv_1._mask", "module.features.denseblock_4.denselayer_10.conv_1._count", "module.features.denseblock_4.denselayer_10.conv_1._stage", "module.features.denseblock_4.denselayer_10.conv_1._mask", "module.features.denseblock_5.denselayer_1.conv_1._count", "module.features.denseblock_5.denselayer_1.conv_1._stage", "module.features.denseblock_5.denselayer_1.conv_1._mask", "module.features.denseblock_5.denselayer_2.conv_1._count", "module.features.denseblock_5.denselayer_2.conv_1._stage", "module.features.denseblock_5.denselayer_2.conv_1._mask", "module.features.denseblock_5.denselayer_3.conv_1._count", "module.features.denseblock_5.denselayer_3.conv_1._stage", "module.features.denseblock_5.denselayer_3.conv_1._mask", "module.features.denseblock_5.denselayer_4.conv_1._count", "module.features.denseblock_5.denselayer_4.conv_1._stage", "module.features.denseblock_5.denselayer_4.conv_1._mask", "module.features.denseblock_5.denselayer_5.conv_1._count", "module.features.denseblock_5.denselayer_5.conv_1._stage", "module.features.denseblock_5.denselayer_5.conv_1._mask", "module.features.denseblock_5.denselayer_6.conv_1._count", "module.features.denseblock_5.denselayer_6.conv_1._stage", "module.features.denseblock_5.denselayer_6.conv_1._mask", "module.features.denseblock_5.denselayer_7.conv_1._count", "module.features.denseblock_5.denselayer_7.conv_1._stage", "module.features.denseblock_5.denselayer_7.conv_1._mask", "module.features.denseblock_5.denselayer_8.conv_1._count", "module.features.denseblock_5.denselayer_8.conv_1._stage", "module.features.denseblock_5.denselayer_8.conv_1._mask", "module.classifier.weight", "module.classifier.bias". \n\tUnexpected key(s) in state_dict: "module.features.denseblock_1.denselayer_1.conv_1.index", "module.features.denseblock_1.denselayer_2.conv_1.index", "module.features.denseblock_1.denselayer_3.conv_1.index", "module.features.denseblock_1.denselayer_4.conv_1.index", "module.features.denseblock_2.denselayer_1.conv_1.index", "module.features.denseblock_2.denselayer_2.conv_1.index", "module.features.denseblock_2.denselayer_3.conv_1.index", "module.features.denseblock_2.denselayer_4.conv_1.index", "module.features.denseblock_2.denselayer_5.conv_1.index", "module.features.denseblock_2.denselayer_6.conv_1.index", "module.features.denseblock_3.denselayer_1.conv_1.index", "module.features.denseblock_3.denselayer_2.conv_1.index", "module.features.denseblock_3.denselayer_3.conv_1.index", "module.features.denseblock_3.denselayer_4.conv_1.index", "module.features.denseblock_3.denselayer_5.conv_1.index", "module.features.denseblock_3.denselayer_6.conv_1.index", "module.features.denseblock_3.denselayer_7.conv_1.index", "module.features.denseblock_3.denselayer_8.conv_1.index", "module.features.denseblock_4.denselayer_1.conv_1.index", "module.features.denseblock_4.denselayer_2.conv_1.index", "module.features.denseblock_4.denselayer_3.conv_1.index", "module.features.denseblock_4.denselayer_4.conv_1.index", "module.features.denseblock_4.denselayer_5.conv_1.index", "module.features.denseblock_4.denselayer_6.conv_1.index", "module.features.denseblock_4.denselayer_7.conv_1.index", "module.features.denseblock_4.denselayer_8.conv_1.index", "module.features.denseblock_4.denselayer_9.conv_1.index", "module.features.denseblock_4.denselayer_10.conv_1.index", "module.features.denseblock_5.denselayer_1.conv_1.index", "module.features.denseblock_5.denselayer_2.conv_1.index", "module.features.denseblock_5.denselayer_3.conv_1.index", "module.features.denseblock_5.denselayer_4.conv_1.index", "module.features.denseblock_5.denselayer_5.conv_1.index", "module.features.denseblock_5.denselayer_6.conv_1.index", "module.features.denseblock_5.denselayer_7.conv_1.index", "module.features.denseblock_5.denselayer_8.conv_1.index", "module.classifier.index", "module.classifier.linear.weight", "module.classifier.linear.bias". \n\tsize mismatch for module.features.denseblock_1.denselayer_1.conv_1.conv.weight: copying a param with shape torch.Size([32, 2, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 16, 1, 1]).\n\tsize mismatch for module.features.denseblock_1.denselayer_1.conv_2.conv.weight: copying a param with shape torch.Size([8, 4, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 3, 3]).\n\tsize mismatch for module.features.denseblock_1.denselayer_2.conv_1.conv.weight: copying a param with shape torch.Size([32, 3, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 24, 1, 1]).\n\tsize mismatch for module.features.denseblock_1.denselayer_2.conv_2.conv.weight: copying a param with shape torch.Size([8, 4, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 3, 3]).\n\tsize mismatch for module.features.denseblock_1.denselayer_3.conv_1.conv.weight: copying a param with shape torch.Size([32, 4, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 32, 1, 1]).\n\tsize mismatch for module.features.denseblock_1.denselayer_3.conv_2.conv.weight: copying a param with shape torch.Size([8, 4, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 3, 3]).\n\tsize mismatch for module.features.denseblock_1.denselayer_4.conv_1.conv.weight: copying a param with shape torch.Size([32, 5, 1, 1]) from checkpoint, the shape in current model is torch.Size([32, 40, 1, 1]).\n\tsize mismatch for module.features.denseblock_1.denselayer_4.conv_2.conv.weight: copying a param with shape torch.Size([8, 4, 3, 3]) from checkpoint, the shape in current model is torch.Size([8, 8, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_1.conv_1.conv.weight: copying a param with shape torch.Size([64, 6, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 48, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_1.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_2.conv_1.conv.weight: copying a param with shape torch.Size([64, 8, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 64, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_2.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_3.conv_1.conv.weight: copying a param with shape torch.Size([64, 10, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 80, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_3.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_4.conv_1.conv.weight: copying a param with shape torch.Size([64, 12, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 96, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_4.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_5.conv_1.conv.weight: copying a param with shape torch.Size([64, 14, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 112, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_5.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_2.denselayer_6.conv_1.conv.weight: copying a param with shape torch.Size([64, 16, 1, 1]) from checkpoint, the shape in current model is torch.Size([64, 128, 1, 1]).\n\tsize mismatch for module.features.denseblock_2.denselayer_6.conv_2.conv.weight: copying a param with shape torch.Size([16, 8, 3, 3]) from checkpoint, the shape in current model is torch.Size([16, 16, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_1.conv_1.conv.weight: copying a param with shape torch.Size([128, 18, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 144, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_1.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_2.conv_1.conv.weight: copying a param with shape torch.Size([128, 22, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 176, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_2.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_3.conv_1.conv.weight: copying a param with shape torch.Size([128, 26, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 208, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_3.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_4.conv_1.conv.weight: copying a param with shape torch.Size([128, 30, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 240, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_4.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_5.conv_1.conv.weight: copying a param with shape torch.Size([128, 34, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 272, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_5.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_6.conv_1.conv.weight: copying a param with shape torch.Size([128, 38, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 304, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_6.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_7.conv_1.conv.weight: copying a param with shape torch.Size([128, 42, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 336, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_7.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_3.denselayer_8.conv_1.conv.weight: copying a param with shape torch.Size([128, 46, 1, 1]) from checkpoint, the shape in current model is torch.Size([128, 368, 1, 1]).\n\tsize mismatch for module.features.denseblock_3.denselayer_8.conv_2.conv.weight: copying a param with shape torch.Size([32, 16, 3, 3]) from checkpoint, the shape in current model is torch.Size([32, 32, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_1.conv_1.conv.weight: copying a param with shape torch.Size([256, 50, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 400, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_1.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_2.conv_1.conv.weight: copying a param with shape torch.Size([256, 58, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 464, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_2.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_3.conv_1.conv.weight: copying a param with shape torch.Size([256, 66, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 528, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_3.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_4.conv_1.conv.weight: copying a param with shape torch.Size([256, 74, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 592, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_4.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_5.conv_1.conv.weight: copying a param with shape torch.Size([256, 82, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 656, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_5.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_6.conv_1.conv.weight: copying a param with shape torch.Size([256, 90, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 720, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_6.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_7.conv_1.conv.weight: copying a param with shape torch.Size([256, 98, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 784, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_7.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_8.conv_1.conv.weight: copying a param with shape torch.Size([256, 106, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 848, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_8.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_9.conv_1.conv.weight: copying a param with shape torch.Size([256, 114, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 912, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_9.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_4.denselayer_10.conv_1.conv.weight: copying a param with shape torch.Size([256, 122, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 976, 1, 1]).\n\tsize mismatch for module.features.denseblock_4.denselayer_10.conv_2.conv.weight: copying a param with shape torch.Size([64, 32, 3, 3]) from checkpoint, the shape in current model is torch.Size([64, 64, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_1.conv_1.conv.weight: copying a param with shape torch.Size([512, 130, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1040, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_1.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_2.conv_1.conv.weight: copying a param with shape torch.Size([512, 146, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1168, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_2.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_3.conv_1.conv.weight: copying a param with shape torch.Size([512, 162, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1296, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_3.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_4.conv_1.conv.weight: copying a param with shape torch.Size([512, 178, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1424, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_4.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_5.conv_1.conv.weight: copying a param with shape torch.Size([512, 194, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1552, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_5.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_6.conv_1.conv.weight: copying a param with shape torch.Size([512, 210, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1680, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_6.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_7.conv_1.conv.weight: copying a param with shape torch.Size([512, 226, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1808, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_7.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n\tsize mismatch for module.features.denseblock_5.denselayer_8.conv_1.conv.weight: copying a param with shape torch.Size([512, 242, 1, 1]) from checkpoint, the shape in current model is torch.Size([512, 1936, 1, 1]).\n\tsize mismatch for module.features.denseblock_5.denselayer_8.conv_2.conv.weight: copying a param with shape torch.Size([128, 64, 3, 3]) from checkpoint, the shape in current model is torch.Size([128, 128, 3, 3]).\n']

This is the training argument I have used in my image classification prediction script:
args = parser.parse_args(["--model", "condensenet_converted", "-b", "64", "-j", "20", "imagenet", "--stages", "4-6-8-10-8", "--growth", "8-16-32-64-128", "--gpu", "0"]). I have tried both, (C=G=4) and (C=G=8) pre-trained models from this repo. Thank you.

Dropping issue with pytorch v0.4

See:

CondenseNet/layers.py

Line 88 in 3b4398e

self._mask[i::self.groups, d, :, :].fill_(0)

Weird stuff in the Pytorch API:

self._mask[i::self.groups, d, :, :].fill_(0)

... does not fill in place. So you must do:

self._mask[i::self.groups, d, :, :] = self._mask[i::self.groups, d, :, :].fill_(0)

pytorch/pytorch#2599 (comment)

Issues with PyTorch 1.9.0

Getting the following error message:

Traceback (most recent call last):
File "main.py", line 479, in <module>
main()
File "main.py", line 239, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 314, in train
prec1, prec5 = accuracy(output.data, target, topk=(1, 5))
File "main.py", line 473, in accuracy
correct_k = correct[:k].view(-1).float().sum(0)
RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

@lvdmaaten @ShichenLiu @gaohuang @ironfist2 Can CondenseNet be updated to be compatible with the latest PyTorch version 1.9.0? Or can you please tell us what changes need to be made?

Thank you!

EDIT: I just replaced the view function with reshape as suggested in the error and it works. Though I am still not sure of the difference between the two functions in this context.

Some concerns of approach to prune network.

Hi @gaohuang and @ShichenLiu ,

Thank you for great work. I have the following concerns when I try running your code and read your paper:

Condensation criterion: In the paper, you use L1-norm value of weights within the same group to find column indices for pruning weights with small values. I saw these indices were also applied for the other groups in the code self.mask[i::self.groups, d, :, :].fill(0).
Have you tested learned group convolution for larger kernel filters (3x3)? If yes, how 's about the efficiency?
In the code, why do you shuffle weights for group lasso loss?
Why do you drop 50% input channels by CondensingLinear(child, 0.5) for converting models?

Thanks,
Hai

the version of pytorch

There are several problems of train-loader from torch.vision using my torch version (1.0.1), so could you show the requirements.txt or a latest code version using torch 1.0.1 ?

shichenliu / condensenet Goto Github PK

condensenet's People

Contributors

Stargazers

Watchers

Forkers

condensenet's Issues

Inference time on ARM platform

Recommend Projects

Recommend Topics

Recommend Org