Giter Site home page Giter Site logo

densenet's Introduction

Densely Connected Convolutional Networks (DenseNets)

This repository contains the code for DenseNet introduced in the following paper

Densely Connected Convolutional Networks (CVPR 2017, Best Paper Award)

Gao Huang*, Zhuang Liu*, Laurens van der Maaten and Kilian Weinberger (* Authors contributed equally).

Now with much more memory efficient implementation! Please check the technical report and code for more infomation.

The code is built on fb.resnet.torch.

Citation

If you find DenseNet useful in your research, please consider citing:

@inproceedings{DenseNet2017,
  title={Densely connected convolutional networks},
  author={Huang, Gao and Liu, Zhuang and van der Maaten, Laurens and Weinberger, Kilian Q },
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  year={2017}
}

Other Implementations

Our [Caffe], Our memory-efficient [Caffe], Our memory-efficient [PyTorch], [PyTorch] by Andreas Veit, [PyTorch] by Brandon Amos, [PyTorch] by Federico Baldassarre, [MXNet] by Nicatio, [MXNet] by Xiong Lin, [MXNet] by miraclewkf, [Tensorflow] by Yixuan Li, [Tensorflow] by Laurent Mazare, [Tensorflow] by Illarion Khlestov, [Lasagne] by Jan Schlüter, [Keras] by tdeboissiere,
[Keras] by Roberto de Moura Estevão Filho, [Keras] by Somshubra Majumdar, [Chainer] by Toshinori Hanya, [Chainer] by Yasunori Kudo, [Torch 3D-DenseNet] by Barry Kui, [Keras] by Christopher Masch, [Tensorflow2] by Gaston Rios and Ulises Jeremias Cornejo Fandos.

Note that we only listed some early implementations here. If you would like to add yours, please submit a pull request.

Some Following up Projects

  1. Multi-Scale Dense Convolutional Networks for Efficient Prediction
  2. DSOD: Learning Deeply Supervised Object Detectors from Scratch
  3. CondenseNet: An Efficient DenseNet using Learned Group Convolutions
  4. Fully Convolutional DenseNets for Semantic Segmentation
  5. Pelee: A Real-Time Object Detection System on Mobile Devices

Contents

  1. Introduction
  2. Usage
  3. Results on CIFAR
  4. Results on ImageNet and Pretrained Models
  5. Updates

Introduction

DenseNet is a network architecture where each layer is directly connected to every other layer in a feed-forward fashion (within each dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. This connectivity pattern yields state-of-the-art accuracies on CIFAR10/100 (with or without data augmentation) and SVHN. On the large scale ILSVRC 2012 (ImageNet) dataset, DenseNet achieves a similar accuracy as ResNet, but using less than half the amount of parameters and roughly half the number of FLOPs.

Figure 1: A dense block with 5 layers and growth rate 4.

densenet Figure 2: A deep DenseNet with three dense blocks.

Usage

  1. Install Torch and required dependencies like cuDNN. See the instructions here for a step-by-step guide.
  2. Clone this repo: git clone https://github.com/liuzhuang13/DenseNet.git

As an example, the following command trains a DenseNet-BC with depth L=100 and growth rate k=12 on CIFAR-10:

th main.lua -netType densenet -dataset cifar10 -batchSize 64 -nEpochs 300 -depth 100 -growthRate 12

As another example, the following command trains a DenseNet-BC with depth L=121 and growth rate k=32 on ImageNet:

th main.lua -netType densenet -dataset imagenet -data [dataFolder] -batchSize 256 -nEpochs 90 -depth 121 -growthRate 32 -nGPU 4 -nThreads 16 -optMemory 3

Please refer to fb.resnet.torch for data preparation.

DenseNet and DenseNet-BC

By default, the code runs with the DenseNet-BC architecture, which has 1x1 convolutional bottleneck layers, and compresses the number of channels at each transition layer by 0.5. To run with the original DenseNet, simply use the options -bottleneck false and -reduction 1

Memory efficient implementation (newly added feature on June 6, 2017)

There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function (with small modifications from here). There are two extreme memory efficient modes (-optMemory 3 or -optMemory 4) which use a customized densely connected layer. With -optMemory 4, the largest 190-layer DenseNet-BC on CIFAR can be trained on a single NVIDIA TitanX GPU (uses 8.3G of 12G) instead of fully using four GPUs with the standard (recursive concatenation) implementation .

More details about the memory efficient implementation are discussed here.

Results on CIFAR

The table below shows the results of DenseNets on CIFAR datasets. The "+" mark at the end denotes for standard data augmentation (random crop after zero-padding, and horizontal flip). For a DenseNet model, L denotes its depth and k denotes its growth rate. On CIFAR-10 and CIFAR-100 without data augmentation, a Dropout layer with drop rate 0.2 is introduced after each convolutional layer except the very first one.

Model Parameters CIFAR-10 CIFAR-10+ CIFAR-100 CIFAR-100+
DenseNet (L=40, k=12) 1.0M 7.00 5.24 27.55 24.42
DenseNet (L=100, k=12) 7.0M 5.77 4.10 23.79 20.20
DenseNet (L=100, k=24) 27.2M 5.83 3.74 23.42 19.25
DenseNet-BC (L=100, k=12) 0.8M 5.92 4.51 24.15 22.27
DenseNet-BC (L=250, k=24) 15.3M 5.19 3.62 19.64 17.60
DenseNet-BC (L=190, k=40) 25.6M - 3.46 - 17.18

Results on ImageNet and Pretrained Models

Torch

Note: the pre-trained models in Torch are deprecated and no longer maintained. Please use PyTorch's pre-trained DenseNet models instead.

Models in the original paper

The Torch models are trained under the same setting as in fb.resnet.torch. The error rates shown are 224x224 1-crop test errors.

Network Top-1 error Torch Model
DenseNet-121 (k=32) 25.0 [Download (64.5MB)]
DenseNet-169 (k=32) 23.6 [Download (114.4MB)]
DenseNet-201 (k=32) 22.5 [Download (161.8MB)]
DenseNet-161 (k=48) 22.2 [Download (230.8MB)]

Models in the tech report

More accurate models trained with the memory efficient implementation in the technical report.

Network Top-1 error Torch Model
DenseNet-264 (k=32) 22.1 [Download (256MB)]
DenseNet-232 (k=48) 21.2 [Download (426MB)]
DenseNet-cosine-264 (k=32) 21.6 [Download (256MB)]
DenseNet-cosine-264 (k=48) 20.4 [Download (557MB)]

Caffe

https://github.com/shicai/DenseNet-Caffe.

PyTorch

PyTorch documentation on models. We would like to thank @gpleiss for this nice work in PyTorch.

Keras, Tensorflow and Theano

https://github.com/flyyufelix/DenseNet-Keras.

MXNet

https://github.com/miraclewkf/DenseNet.

Wide-DenseNet for better Time/Accuracy and Memory/Accuracy Tradeoff

If you use DenseNet as a model in your learning task, to reduce the memory and time consumption, we recommend use a wide and shallow DenseNet, following the strategy of wide residual networks. To obtain a wide DenseNet we set the depth to be smaller (e.g., L=40) and the growthRate to be larger (e.g., k=48).

We test a set of Wide-DenseNet-BCs and compared the memory and time with the DenseNet-BC (L=100, k=12) shown above. We obtained the statistics using a single TITAN X card, with batch size 64, and without any memory optimization.

Model Parameters CIFAR-10+ CIFAR-100+ Time per Iteration Memory
DenseNet-BC (L=100, k=12) 0.8M 4.51 22.27 0.156s 5452MB
Wide-DenseNet-BC (L=40, k=36) 1.5M 4.58 22.30 0.130s 4008MB
Wide-DenseNet-BC (L=40, k=48) 2.7M 3.99 20.29 0.165s 5245MB
Wide-DenseNet-BC (L=40, k=60) 4.3M 4.01 19.99 0.223s 6508MB

Obersevations:

  1. Wide-DenseNet-BC (L=40, k=36) uses less memory/time while achieves about the same accuracy as DenseNet-BC (L=100, k=12).
  2. Wide-DenseNet-BC (L=40, k=48) uses about the same memory/time as DenseNet-BC (L=100, k=12), while is much more accurate.

Thus, for practical use, we suggest picking one model from those Wide-DenseNet-BCs.

Updates

08/23/2017:

  1. Add supporting code, so one can simply git clone and run.

06/06/2017:

  1. Support ultra memory efficient training of DenseNet with customized densely connected layer.

  2. Support memory efficient training of DenseNet with standard densely connected layer (recursive concatenation) by fixing the shareGradInput function.

05/17/2017:

  1. Add Wide-DenseNet.
  2. Add keras, tf, theano link for pretrained models.

04/20/2017:

  1. Add usage of models in PyTorch.

03/29/2017:

  1. Add the code for imagenet training.

12/03/2016:

  1. Add Imagenet results and pretrained models.
  2. Add DenseNet-BC structures.

Contact

liuzhuangthu at gmail.com
Any discussions, suggestions and questions are welcome!

densenet's People

Contributors

ajschumacher avatar cmasch avatar gaohuang avatar liuzhuang13 avatar nikhil-kasukurthi avatar okason97 avatar taineleau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

densenet's Issues

Why did you use MomentumOptimizer? and dropout...

Hello
When I saw DenseNet, I implemented it with Tensorflow. (Using MNIST data)

The Questions are :

  1. When I experimented, AdamOptimizer performed better than MomentumOptimizer.
    Is this just MNIST? I do not yet have an experiment with CIFAR.

  2. In the case of dropout, I apply only to the bottleneck layer, not to the transition layer. is this right?

  3. Does Batch Normalization only apply when training? Or does it apply to both test and training?

  4. I wonder what global average pooling is.
    And I wonder how to do it in tensorflow.

Please advise if you have any special reason.
And if you can see the tensorflow code, I'd like you to see if I implemented it correctly.
https://github.com/taki0112/Densenet-Tensorflow

Thank you

DenseNet on ImageNet

I've just read your paper which is really interesting.
I was wondering whether you tried learning a DenseNet version on ImageNet ?
Thank you

Covolution before entering the first dense block for imagenet dataset

Hi, there

For the imagenet dataset, DenseNet use 7x7 Conv before entering the first dense block;
I also read the following paper condensenet , which use 3x3 conv before entering the first block.
I wonder if i can change the 7x7 conv to 3x3, and keep the pooling unchanged (since it makes densenet more parameter- efficient). Does it hurt DenseNet's performance on imagenet?

error using CAddTable and ConcatTable

There are a lot of residual blocks in your densenet, I tried to build a few in my own network, but why torch just gives me errors when using CAddTable and ConcatTable. Could you please give some advice? The code I use is here:

    a=image.load('test.png')
    input=torch.Tensor(1,3,60,60)
    input[1]=a
    input=input:cuda()
local conv_block_1 = nn.Sequential()
conv_block_1:add(cudnn.SpatialConvolution(3, 16, 5, 5, 1, 1, 2, 2))--  ,(60+2*2-5)/1+1=60
conv_block_1:add(cudnn.SpatialBatchNormalization(16))
conv_block_1:add(cudnn.ReLU(true))

local conv_block_2 = nn.Sequential()
conv_block_2:add(cudnn.SpatialConvolution(16, 32, 5, 5, 1, 1, 2, 2)) -- (60+2*2-5)/1+1=60
conv_block_2:add(cudnn.SpatialBatchNormalization(32))
conv_block_2:add(cudnn.ReLU(true))

local conv_block_3 = nn.Sequential()
conv_block_3:add(cudnn.SpatialConvolution(32, 16, 5, 5, 1, 1, 2, 2)) -- (60+2*2-5)/1+1=60
conv_block_3:add(cudnn.SpatialBatchNormalization(16))
conv_block_3:add(cudnn.ReLU(true))

local concat_block_1 = nn.ConcatTable()
concat_block_1:add(conv_block_1)  ----
concat_block_1:add(conv_block_3 )

local add_block_1 = nn.Sequential()
add_block_1:add(concat_block_1)
add_block_1:add(nn.CAddTable(true))
add_block_1:add(cudnn.ReLU(true))

local model=nn.Sequential()
model:add(conv_block_1)
 
model:add(conv_block_2)
    model:add(conv_block_3)
   model:add(add_block_1)
    model:cuda()
    model:forward(input)

and the error reads like this:
In 4 module of nn.Sequential:
In 1 module of nn.Sequential:
In 1 module of nn.ConcatTable:
In 1 module of nn.Sequential:
...torch/install/share/lua/5.1/cudnn/SpatialConvolution.lua:102: input has to contain: 3 feature maps, but received input of size: 1 x 16 x 60 x 60
stack traceback:

validation top1 error is odd

Hi
I trained a densenet according the paper on cifar10.The top1 error of test set is odd. The result is saved in this file.
net-train.pdf
How can i make the top1 error curve smooth ?
Thanks

about nninit package

I ran into a bug said 'could not found nninit package', I am not sure if this package is what I need.
It seems like the package is not used at all,shoud we just remove it?
And......Since the nninit package is not defaultly installed, I think the README file should mention it.

Memory efficient implementation of Caffe

Hi,
I saw this caffe implementation which is memory efficient.
https://github.com/Tongcheng/DN_CaffeScript

And I also notice this in wiki

Memory efficient implementation (newly added feature on June 6, 2017)

There is an option -optMemory which is very useful for reducing GPU memory footprint when training a DenseNet. By default, the value is set to 2, which activates the shareGradInput function 
....

Does that caffe use the above memory efficient way to implementation?

Thanks.

Pretrained weights for the 0.8M parameters config

Hi,
could you please upload weights for imageNet for DenseNet-BC(L=100, k=12) which has only 0.8M parameters? It is compact and when we expand the network for the task of semantic segmentation, this really helps for controlling the number of parameters.

If you know any other resources for finding weights for this config, I would be grateful if you let me know.

DenseNet on Pascal VOC

Hi, I think DenseNet is a promising model and I try to take it as the backbone of Faster R-CNN in object detection. I chose DenseNet169 pretrained on ImageNet to replace the ResNet50 backbone of Faster R-CNN and used the same hyperparameters configuration as ResNet50 version of Faster R-CNN. However, the training result of DenseNet version is worse than ResNet50 version(roughly 3% lower on VOC2012 test). Can you help me analyse the reason or give me any advice? @liuzhuang13 Many thanks!

Why 3 dense blocks, instead of downsampling

Hey there,

First of all, let me congratulate the authors. This is a very solid architecture that resembles cortical computation.

I have a question regarding the choice of dense blocks.
Due to the spatial size of the feature maps, dense connections are partitioned into blocks, creating iso-resolution maps in each block and transition layers that downsample between blocks.

Another option would be getting rid of blocks, connecting every layer with every other layer regardless of spatial size by using downsampling when there is a resolution mismatch.

Is there an experimental (i.e. worse performance, overfitting) or computational (i.e. more parameters) reason for not reporting this?

Thanks,
Ozgur

./checkpoints.lua:52: attempt to call method 'clearState' (a nil value)

  1. Great work, a milestone! Never seen top5 go down so seamlessly.

  2. little bug:

th main.lua -netType densenet -depth 40 -dataset cifar10 -batchSize 64 -nEpochs 300 -optnet true
...
 | Test: [1][157/157]    Time 0.021  Data 0.000  top1  87.500 ( 83.420)  top5  50.000 ( 55.870) 
 * Finished epoch # 1     top1:  83.420  top5:  55.870
 * Best model   83.42   55.87   

./checkpoints.lua:52: attempt to call method 'clearState' (a nil value)

The amount of parameters

I use the following setting, as suggested in the github.
L=40,k=12, no bottleneck
However, the parameter number is not 1M, it's 0.6M.
This problem also happen when I turn bottelneck on. I got different parameter number than the reported one.
Please tell me where do I miss. Thank you.

Calling the model:

dn_opt = {}
dn_opt.depth = 40
dn_opt.dataset = 'cifar10'
model = paths.dofile('densenet.lua')(dn_opt)
model:cuda()
print(model:getParameters():size())

In densenet.lua

local growthRate = 12

    --dropout rate, set it to 0 to disable dropout, non-zero number to enable dropout and set drop rate
    local dropRate = 0

    --#channels before entering the first denseblock
    local nChannels = 2 * growthRate

    --compression rate at transition layers
    local reduction = 0.5

    --whether to use bottleneck structures
    local bottleneck = false

Output of the parameter size

599050
[torch.LongStorage of size 1]

TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Traceback (most recent call last):
File "densenet.py", line 162, in
run()
File "densenet.py", line 160, in run
run_model(data, image_dim, label_count, 40)
File "densenet.py", line 94, in run_model
current, features = block(current, layers, 16, 12, is_training, keep_prob)
File "densenet.py", line 72, in block
current = tf.concat(3, (current, tmp))
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/ops/array_ops.py", line 1061, in concat
dtype=dtypes.int32).get_shape(
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 611, in convert_to_tensor
as_ref=False)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 676, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 121, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/constant_op.py", line 102, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 376, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/fp/anaconda2/lib/python2.7/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

can you tell me , what's the problem?

Great results! CIFAR-100 top1 accuracy ~ 100%

Not a bug, just a praise: Within a few hours the CIFAR-100 accuracy goes up to 100% !

 | Epoch: [190][423/782]    Time 0.166  Data 0.000  Err 0.0518  top1   0.000  top5   0.000
 | Epoch: [190][424/782]    Time 0.166  Data 0.000  Err 0.1035  top1   3.125  top5   0.000
 | Epoch: [190][425/782]    Time 0.166  Data 0.000  Err 0.0389  top1   1.562  top5   0.000

Legendary! Time to increase the test set.

Why not share the first BN and ReLU?

Hi,

The features go through BN-ReLU-Conv-BN-ReLU-Conv , then concatenate the features from different layers. Since BN is applied to each channel, and ReLU applies element-wisely. Why not share the first BN-ReLU? The features go through Conv-BN-ReLU-Conv-BN-ReLU, then concatenate the output of ReLU features? Is their any difference?

Thanks.

What is proper way of counting parameters?

Hi author, as you claimed in both repo and paper, the numer of parameters of densenet-100-12 is 7.0M and densenet-100-24 is 27.72M. However when I examine the parameters in following way

-- main.lua, line 32
-- Create model
local model, criterion = models.setup(opt, checkpoint)

params = model:getParameters()
print(#params)

I got 4.06M for densenet-100-12 and 16.11M for densenet-100-24. Did I count it in a wrong way?

error when loading pretrained model??

When i load the model with 201 layers which are pretrained on ImageNet, it output error message below:

torch/install/share/lua/5.1/nn/SpatialMaxPooling.lua:47: attempt to index field 'THNN' (a nil value)

it seems that the version of the pretrained model is not compatible with the latest torch packages. By the way, i am using the latest version of torch packages. can you provide a pretrained model using the lastest version of torch?

Convolution after ReLU in Dense Layer Question

I've seen that you use:

BN -> ReLU -> Conv3x3 -> Dropout

on the normal case, or

BN -> ReLU -> Conv1x1 -> Dropout -> BN -> ReLU -> Conv3x3 -> Dropout

when using bottleneck. The question is why? Most networks use e.g.

Conv3x3 -> BN -> ReLU -> Dropout

Why did you invert the order? Did you get better results this way?

Thanks in advance!

cifar validation loss decrease than increase after learning rate change

Hello, I have one question when training denseNet: the validation loss get a sharp decrease than increase after learning rate changed from 0.1 to 0.01
I trained the densenet (depth_40_k_12) on cifar100 by tensorflow implementation
https://github.com/YixuanLi/densenet-tensorflow
I just modifed the code follow your data augmentation step (subtract channel mean, then divide by std)
However the validation loss seems werid(In figure)I have following two questions

cifar100_d_40_k_12
(1) Do you met the same problem when training cifar100 dataset(or it may be some tensorflow implementation error)
(2) Did your validation loss include L2 loss part?
Since the validation error seems no problem here(25.53%, 1.1% higher than that in the paper)
Thanks in advance

Nice figures !

Hey,
I am sorry to ask this, but your figures are really nice, I have no experience for drawing nn figure and I would like to follow your style if you let me :)
Could you please tell me what did you use to make such simple and nice looking figures ?
Thanks !

results on cifar100

Hi,

Thanks for the great work and released code. I have tried to run several times on cifar100+ with the DenseNet-BC (L=190, k=40), but it's hard to reproduce the result 17.18. My training script looks like this, simply replace the dataset to be cifar100 and without the efficient setting:
python demo.py --depth 190 --growth_rate 40 --save ckpts --batch_size 64 --valid_size 0
The best result I got is about 17.3x, did you think this result is also acceptable or did I miss anything? Thanks a lot.

About a tensorflow implementation

I've followed one of Tensorflow implementations of DenseNet (https://github.com/ikhlestov/vision_networks) to reproduce DenseNet-BC-100-12.
It seemed to me that the tensorflow implementation is nearly equivalent with one from this repo,
but I couldn't reach to ~4.5 % error (the best one was about ~4.8 %, by the way)
Could you give me any reasons why it is? I already compared two codes very carefully, but couldn't find.

Deep-Narrow DenseNet

I was wondering if you ever tried the extreme case growth_rate = 1 with a very deep network? Just as an exercise I implemented a fully-connected dense block with growth_rate = 1 and depth = 50 on a 2D dataset so I could visualize what each neuron was learning, the results where very nice.

Purpose using first convolution

In your network architecture for CIFAR and imagenet dataset, what does purpose use the first convolution (before pooling-denseblock1)? In the imagenet, you use two convolutions block before entering the dense-block, while CIFAR just one, any reason? Thanks

DenseNet architecture question

I may be misunderstanding the architecture, but why does DenseNet decide to concatenate feature maps from the current layer to pass backward instead of using "true" residual connections?

The layers within the second and third dense block don't assign the least weight to the outputs of the transition layer in my trained model

I am not sure if it's appropriate to open this issue in github project, this is a question about the heatmap in your paper.

I trained a DenseNet on C10+ with L = 40 and k = 12, which is same as yours , and then I verified the weights on a trained model with 94.6% accuracy, but I didn't get the same result as your observation 3. In my test, the layers within the second and third dense block assign considerable weight to the outputs of the transition layer.

For example, the first conv layer in the second dense block has 0.013281956 average weight on the 1st transition layer output(168 channels, i.e. all the input channels), the second conv layer has 0.011933382 average weight on the 1st transition layer output(first 168 channels), and 0.024417713 average weight on the 12 channels outputted from the first conv layer. This is reasonable because closer channels are more important. The rest layers have similar weights distributions on the old channels and the new channels. And similar condition is in dense block 3.

My densenet and training code is aligned to yours, including augmentation and input norm, see https://github.com/seasonyc/densenet/blob/master/densenet.py and https://github.com/seasonyc/densenet/blob/master/cifar10-test.py. The model file is in https://github.com/seasonyc/densenet/blob/master/dense_augmodel-ep0300-loss0.112-acc0.999-val_loss0.332-val_acc0.946.h5, and my code to count the weights is in https://github.com/seasonyc/densenet/blob/master/weights-verify.py.

I know the models trained in different times are different, even the features of conv filters are different, but I believe the weights distributions are similar in statistics. So although we have different models, we should have similar result.

I did this verification because I feel the observation 3 is a little unreasonable. The 1st conv layer uses the information from the previous dense block very much, and then the 2nd conv layer ignores the information from hundreds of channels but only uses the information from 12 channels, can the 1st conv layer really concentrate hundreds of channels into 12 channels by training?

Do you want to double-check this?

Thanks
YC

Add layer bugs

Hi,

check out the code at

https://github.com/liuzhuang13/DenseNet/blob/master/densenet.lua#L54

param requires nOutChannels, but you pass growth rate.

Also add layer only adds direct connection to the next layer, so you form n-1 connections, as far as I understood from your paper it should have n-2, n-3 , etc... for every layer in the block.
I'm guessing this is not the version you used to train the models.

Thanks for sharing.

A question about network structure

Hi, I want to design a DenseNet which has the smalle number of layers.(e.g. L=30 or another). How to set each dense block and k (L denotes the network depth and k its growth rate.)? Are there any rules?

Parameters and computation

Hi there and great work! I've actually also figured out the very same concept myself prior to finding out you guys have already tested and published it. ✍(◔◡◔) Some of the design decisions I've made were different, so I'd like to compare.

Where you're reporting results on Cifars, if you could also add the number of parameters you are using and, possibly, an estimated amount of computation, that would be highly beneficial. It's really necessary for serious comparisons and ability to perfect even this very architecture. Also, if you could add your training logs that would also be of great insight.

As for how to measure the amount of computation, that's quite a tough thing to do, so I'd recommend to at least measure training time, which is a very inexact measure, but, well, provides at least some insights.

I've had 19.5% on Cifar-100+ with mean and std not adjusted (whole dataset just scaled to [0..1] values) with 24m params and forward+backwards running for 220 sec / epoch on GTX Titan X with the best dense-type architecture that I designed prior (I could only experiment on a single GTX Titan X, don't really have a lot of computational resources). It didn't have preactivation. It would most likely at least match the results those you've published for DenseNet (L=100, k=24) CIFAR-100+ if I used the right dataset (with std and mean adjusted). My code https://github.com/ibmua/Breaking-Cifar/blob/master/models/hoard-2-x.lua (uses 4-spaced tabs. To achieve that result I used depth=2 sequences=2 , here's a log of the end of training https://github.com/ibmua/Breaking-Cifar/blob/master/logs/load_59251794/log.txt ). Mind that I used groups, which are only accessible via Soumith's "cudnn", so if you'll want to try this you probably want to clone the whole thing. Also, not that I didn't use any Droupout (haven't even tried)

Wide-DenseNet

Congrats for best paper on CVPR 2017!
I'm troubled with the memory problem with densenet. Would you share your wide-densenet implementation and pre-train models publicly?

Best!

DenseNet structure on imagenet

Hi author,

I notice that in your experiment on imagenet, instead of having same repeated layers in each block, you set different layers for each block. Do you design it following some patterns, or just trying different combinations?

question about standardization

I find that in the training and testing phase, the dataset is standardized as a whole batch, including computing the mean and variance from the entire cifar10 dataset.
However, when the model is deployed, the image is feed individually, how should we preprocess the image?
What mean and variance should we use when the input image is of the same category but not included in the cifar10 dataset?

ImageNet test

I'm trying to make DenseNet for ImageNet dataset. But, it doesn't converge well.
Have you ever try DenseNet to ImageNet dataset?
Please share it if you have any successful densenet network for imagenet.

Cuda out of memory when use memory efficient densenet for deploying

WARNING: Logging before InitGoogleLogging() is written to STDERR
F0129 15:24:34.494936 153936 DenseBlock_layer.cu:203] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0129 18:17:47.501026 32543 syncedmem.cpp:71] Check failed: error == cudaSuccess (2 vs. 0) out of memory
*** Check failure stack trace: ***

When i deploy the trained model with memory efficient densenet with matlab, i always counter this problem . I think it will be solved to reset cuda memory after each time calculation, can you give me some advice to solve this problem.

Median of best test error or test error after training?

Hi, guys. In your paper, you compared your results with some other methods. Did you report the median of best test error during training or median of test error after training? What is the common way to report results?
Besides, why did not you report the results using both data augmentation and dropout?

I tried to reproduce Wide-DenseNet-BC results on cifar10, but got 0.5% more than your error

I tested Wide-DenseNet-BC (L=40, k=48) on CIFAR-10 augmentation, see https://github.com/seasonyc/densenet/blob/bf99d7f459ca7754c37ff58c6610eb76e93f7990/cifar10-test.py#L217 in https://github.com/seasonyc/densenet
but could only get 4.5% error rate.

I tried to tune some hyper parameters, e.g. dropout, weight decay, learning rate... but always couldn't get better result. Now I am testing to follow the lr decay of wide resnet training, i.e. initial 0.1, by 0.2 per 60 epochs, but I very suspect if it will take effect...

Would you like to give me any suggestion for it?

Thanks
YC

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.