cvjena / cnn-models Goto Github PK

View Code? Open in Web Editor NEW

362.0 36.0 161.0 3.44 MB

ImageNet pre-trained models with batch normalization for the Caffe framework

Home Page: https://arxiv.org/abs/1612.01452

License: BSD 2-Clause "Simplified" License

Python 100.00%

cnn-model resnet imagenet alexnet batch-normalization caffe-framework vgg16 vgg19 vggnet vgg

cnn-models's Introduction

CNN Models by CVGJ

Intro

This repository contains convolutional neural network (CNN) models trained on ImageNet by Marcel Simon at the Computer Vision Group Jena (CVGJ) using the Caffe framework as published in the accompanying technical report. Each model is in a separate subfolder and contains everything needed to reproduce the results. This repository focuses currently contains the batch-normalization-variants of AlexNet and VGG19 as well as the training code for Residual Networks (Resnet).

How to use

No mean subtraction is required for the pre-trained models! We have a batch-normalization layer which basically does the same.

The pre-trained models can be obtained by the download link written in model_download_link.txt.

If you want to train on your own dataset, simply execute caffe train --solver train.solver --gpu 0 2> train.log to start the training and write the output to the log file train.log. Please note, that we pre-scaled all images of the dataset such that the smaller side has a length of 256 pixels while keeping the aspect ratio. You can use convert input.jpg -resize 256x256^ output.jpg to convert the images using the command line.

To evaluate the final model, execute caffe train --solver test.solver --gpu 0 2> test.log.

Accuracy on ImageNet

Single-crop error rates on the validation set of the ILSVRC 2012--16 classification task.

Model	Top-1 error (vs. original)	Top-5 error (vs. original)
AlexNet_cvgj	39.9% (vs. 42.6%)	18.1% (vs. 19.6%)
VGG19_cvgj	26.9% (vs. 28.7%)	8.8% (vs. 9.9%)
ResNet10_cvgj	36.1%	14.8%
ResNet50_cvgj	24.6% (vs. 24.7%)	7.6% (vs. 7.8%)

Citation

Please cite the following technical report if our models helped your research:

@article{simon2016cnnmodels,
  Author = {Simon, Marcel and Rodner, Erik and Denzler, Joachim},
  Journal = {arXiv preprint arXiv:1612.01452},
  Title = {ImageNet pre-trained models with batch normalization},
  Year = {2016}
}

The report also contains an overview and analysis of the models shown here.

Appendix: convergence plots

AlexNet_cvgj

VGG19_cvgj

ResNet10_cvgj

Further details

Please see the accompanying technical report for further information about the models and the training procedure:

@article{simon2016cnnmodels,
  Author = {Simon, Marcel and Rodner, Erik and Denzler, Joachim},
  Journal = {arXiv preprint arXiv:1612.01452},
  Title = {ImageNet pre-trained models with batch normalization},
  Year = {2016}
}

License and support

The models are released under BSD 2-clause license allowing both academic and commercial use. I would appreciate if you give credit to this work by citing our paper in academic works and referencing to this Github repository in commercial works. If you need any support, please open an issue or contact Marcel Simon.

cnn-models's People

Contributors

Stargazers

Watchers

Forkers

kli-casia pchank ouya-bytes codeaudit kuyun-zhangyang jjzhfamily zhunzhong07 jdc08161063 wanjinchang bityangke vyraun ml-lab benjamesbabala lyk125 xuanhan863 sunyz lyimage nhzlx gaffey gezhiwei westamine livst liviust fabiocesari yhmh mariusmez tinyloop mduser kvananth pavelgonchar wcastil xiongduan liyancas rainstrom jianweilin pansygmy guoshengcv sguuaa peterlee08 marwasaid suzhenghang dreameryu shengyudingli goodluckcwl dousong madnavs always01 yiweichen04 pustar lubinboooos cm-unit xperzy biexingle my777777 aojunzhou irvingshu flyflywang ccv-edward zbxzc35 jingcx phenixi laoma023012 hiroki11x dmxj cain4318 wronskia zjj-cathy ajaycharan vvmurthy cold-blue inkimage lakehui stainless-steel-rat wlw208dzy wzchan armstrongyang carabob yowhatever beanmagic jieli1994 mengmengmiao liangxi627 shubhampachori12110095 haoyang11 tiananan gittigxuy lawrencewxj zhujianing legolas123 denethor1997 seunghwancho maxwell2017 davidwuzc ai-token tchen7 inmgjim xun-yang a462428 paul-lipu michaelshing

cnn-models's Issues

Difference between ResNet and VGG train.prototxt?

In VGG, ReLU follows BatchNorm Layer immediately but in ResNet, a Scale Layer is inserted between BatchNorm and ReLU layer. Why is the inconsistency? What is the difference between these two usage?

Clarification on "validation set of ILSVRC 2012 - 2016 classiifcation task"

Hello
Very nice work. I am trying to reproduce the error rates for Alexnet.
Before I get started, I thought I will get a clarification on the training and validation data set you have used. There was a mention of ILSVRC 2012-2016 classification task in the "Accuracy on ImageNet" section, does that mean

The accuracy mentioned is cummulative accuracy across the validation data-sets from 2012 to 2016, 4 different validation sets? If yes, then was the training also performed on 4 different data-sets?
The train_log seems to be generated accross 5 days, which I think is the training time for Alexnet on the 1.2million dataset. So was the training performed on 2012 training data-set and accuracy measured across 2012-2016 validation datasets?

Thanks
Vamsi

BatchNorm: Before or after activation?

I see you are placing your BatchNorm before the activation layer. While ResNet does this, it's been demonstrated that placing the BN after the non-linearity actually increases prediction accuracy. Other deep learning experts recommend this as well.

I'm just curious why for AlexNet and VGGNet the BN came before the non-linearity? Was this an implementation choice and determined by experiments?

Unable to download model file from uni-jeana.de

Using the link provided here I was unable to download the pre-trained weights for the ResNet10 model:

$ wget https://upload.uni-jena.de/data/58493041de6f79.63214979/resnet10_cvgj_iter_320000.caffemodel

--2019-09-03 18:36:32--  https://upload.uni-jena.de/data/58493041de6f79.63214979/resnet10_cvgj_iter_320000.caffemodel

Resolving upload.uni-jena.de (upload.uni-jena.de)... 141.35.105.30, 2001:638:1558:2369:1:5ee:bad:c0de

Connecting to upload.uni-jena.de (upload.uni-jena.de)|141.35.105.30|:443... connected.

HTTP request sent, awaiting response... 404 Not Found
2019-09-03 18:36:32 ERROR 404: Not Found.

I was able to get the file from the Google Drive (thanks!) so that link is still good.

Questions of source and root_folder in train.prototxt

I have two questions when I review train.prototxt.
1.

image_data_param {
   source: "/home/atlas2_ssd/simon/ilsvrc12/train.txt"
   batch_size: 128
   shuffle: true
   root_folder: "/home/atlas1_ssd/simon/ilsvrc12-scaled/train/"
   }

Can I replace the source type to LMDB?
2. How to form root_folder(scaled) of my own data?

Thanks a lot !

ResNet50: bottom blob of expand layer

Hi there,

Thanks for sharing the pre-trained models.
I am learning the ResNets50 and have a question about the architecture. It sames that there are quite few places different with original ResNets.

The data preprocess is changed from mean subtraction to batch normalization, which has been noted.

However I aware another main difference in the expanding convolution layer. For example the first one:

layer {
name: "layer_64_1_conv_expand"
type: "Convolution"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv_expand"
.......

It shows that the bottom blob come from "layer_64_1_conv1", which was "conv1_pool" in the original architecture. Is this a modification? As shown by your results that you can consistently improve the accuracy compared to the original implementation, it this the reason?

VGG19 bn model ,great accuracy gap between training phase and test phase

in taining phase,I got val accuracy=0.9 on val set,but when I test on single image on val set,the acc is only 0.6，I wonder if anyone can share a piece of test code

Are there any problems using BN with iter_size?

BVLC/caffe#4716

Worse performance than the reported one in Res50

Hi Developers,

I have trained the Res50 using the provided scripts, however I get worse performance as below:

Test net output #0: acc/top-1 = 0.739239
Test net output #1: acc/top-5 = 0.919044

In comparison, the reported performance is top-1/top-5 = 24.6%/7.6%. The only difference between my training and yours is I use 4 gpus in parallel as well as change the iter_size in train.solver from 8 to 4 such that the overall batch size remains the same. Could you please tell me where the gap comes from? Thanks in advance!

Question about the license

Hi,

since the pretrained model is trained with ImageNet, Is it ok to release it as a commercial model without the permission from ImageNet?

train on multi gpus

Hi,

I tried to finetune with your model. It works well on single gpu but not on multi gpus. Is it a inhere limitation or there is something I need to work around? Thanks.

Why there is no scale layer after each batch norm layer?

I found only the BN layer after the data layer is followed by a scale layer, but other BN layers are not cooperating with scale layers? Does it because the VGG is using ReLU instead of Sigmoid as its activation function? Thanks.

No image mean subtraction for models

I see you that there is no image mean subtraction. I'm just wonder if this is an implementation choice or determined by experiments.

BN0 mean and variance

I extracted the model parameters for the pre trained model and noticed that the computed mean and variance for the input BN layer is between 1000 - 1200 and 40k+ to 50k+ respectively. How is this possible given that the image RGB layers should have values between 0-255 ?

Would you mind share resnet34.caffemodel?

I have trained the resnet34,but the accuracy only 62.2 (top-1) my picture size is 224*224

can not reproduce validation accuracy with pretrained model

On ImageNet, I only get a top-5 accuracy of 83.86%.
Any idea what's wrong?

VGG19BN: weird values of 'data/bn' parameters.

I found that the parameters of data/bn in vgg19bm has mean value 1127.633, a very weird value.

I know the mean of imagenet images are nearby 112, so why the parameters of data/bn is 1127, ten times of 112.

And I noticed that you datalayer reads data from /home/atlas1_ssd/simon/ilsvrc12-scaled/train/ (https://github.com/cvjena/cnn-models/blob/master/VGG19_cvgj/train.prototxt#L18). There is a scaled in your folder name.

So I guess you have scaled the raw image with factor 10, right?

Would you mind share labels file?

Hello Simon
Would you mind share labels file?
I want use Digits for cnn-models test, It required a labels file.
Thank you

No scale layers after batchnorm

Hello! Thanks for these pre-trained models. I was wondering why you did not include the Scale layer after the BN layer in your AlexNet model. None of the convolutional layers have a bias term, so because there is no scale layer added, it turns out there are no bias terms throughout the network. Was that intentional?

Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn

When I load the ResNet10 model and the associated weights (from here) into DIGITS for training on a custom image dataset I get the following output:

ERROR: Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn

layer_64_1_relu2 does not need backward computation.
layer_64_1_scale2 does not need backward computation.
layer_64_1_bn2 does not need backward computation.
layer_64_1_conv1 does not need backward computation.
conv1_pool_conv1_pool_0_split does not need backward computation.
conv1_pool does not need backward computation.
conv1_relu does not need backward computation.
conv1_scale does not need backward computation.
conv1_bn does not need backward computation.
conv1 does not need backward computation.
data_scale does not need backward computation.
data_bn does not need backward computation.
label does not need backward computation.
data does not need backward computation.
This network produces output label
This network produces output prob
Network initialization done.
Solver scaffolding done.
Finetuning from /resnet10_cvgj/resnet10/resnet10_cvgj_iter_320000.caffemodel
Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn

I have seen similar issues with comments describing this sort of error as indicative of an incompatibility between the model architecture (deploy.prototxt) and the pre-trained model weights.

Can anyone suggest how to resolve or work around this issue? Thanks in advance for any suggestions or insight.

Where is the vgg16 pretrained modelS with BN?

Hi，
Thanks for your sharing！ But I cannot find the VGG16 pretrained models with BN. I'm very appreciated it if you release it.

Inception models & weights

Hi,

Do you have any plans to train & release Inception v1, v3 & v4 models?

Thanks,

-Vimal

Which version of Caffe do you use for training?

Hi, interesting work. I want to know which version of Caffe you use for training. And could you give a detail on how to train the resnet model?

Would you mind sharing VGG16 model?

Would you mind sharing VGG16 model with BN since many works are based on that model?

running VGG19_BN pretrained model need so much GPU memory

Hi thanks for your sharing, I running VGG19_BN caffemodel in py-faster-rcnn , the input size is 550, the wierd thing is GPU memory reaching up to 10G. VGG19_BN need more GPU memory than ResNet101?
Thanks for your reply

Regarding Image pre-processing.

Hello Simon
I was able to replicate the published result of 18.1% top5 error-rate on the ILSVRC 2012 classification task using the uploaded alexnet_cvgj_iter_320000.caffemodel.
But when I tried recreating my own .caffemodel by training (2 GPUs, batch size 64 per GPU) from scratch, I was only able to achieve top5 error-rate 20.6%. One of the difference I thought could be in the image pre-processing. (Batch size 128 per GPU on 2 GPUs gave me worse results 21.4% top5 error-rate)
I have been using https://github.com/BVLC/caffe/blob/master/examples/imagenet/create_imagenet.sh to create the database for training and validation, which I believe only does resizing the Imagenet data to 256x256 images

What kind of image processing was done for the training and validation image set, for training the alexnet_cvgj model?