Giter Site home page Giter Site logo

cvjena / cnn-models Goto Github PK

View Code? Open in Web Editor NEW
362.0 36.0 161.0 3.44 MB

ImageNet pre-trained models with batch normalization for the Caffe framework

Home Page: https://arxiv.org/abs/1612.01452

License: BSD 2-Clause "Simplified" License

Python 100.00%
cnn-model resnet imagenet alexnet batch-normalization caffe-framework vgg16 vgg19 vggnet vgg

cnn-models's Introduction

CNN Models by CVGJ

Intro

This repository contains convolutional neural network (CNN) models trained on ImageNet by Marcel Simon at the Computer Vision Group Jena (CVGJ) using the Caffe framework as published in the accompanying technical report. Each model is in a separate subfolder and contains everything needed to reproduce the results. This repository focuses currently contains the batch-normalization-variants of AlexNet and VGG19 as well as the training code for Residual Networks (Resnet).

How to use

No mean subtraction is required for the pre-trained models! We have a batch-normalization layer which basically does the same.

The pre-trained models can be obtained by the download link written in model_download_link.txt.

If you want to train on your own dataset, simply execute caffe train --solver train.solver --gpu 0 2> train.log to start the training and write the output to the log file train.log. Please note, that we pre-scaled all images of the dataset such that the smaller side has a length of 256 pixels while keeping the aspect ratio. You can use convert input.jpg -resize 256x256^ output.jpg to convert the images using the command line.

To evaluate the final model, execute caffe train --solver test.solver --gpu 0 2> test.log.

Accuracy on ImageNet

Single-crop error rates on the validation set of the ILSVRC 2012--16 classification task.

Model Top-1 error (vs. original) Top-5 error (vs. original)
AlexNet_cvgj 39.9% (vs. 42.6%) 18.1% (vs. 19.6%)
VGG19_cvgj 26.9% (vs. 28.7%) 8.8% (vs. 9.9%)
ResNet10_cvgj 36.1% 14.8%
ResNet50_cvgj 24.6% (vs. 24.7%) 7.6% (vs. 7.8%)

Citation

Please cite the following technical report if our models helped your research:

@article{simon2016cnnmodels,
  Author = {Simon, Marcel and Rodner, Erik and Denzler, Joachim},
  Journal = {arXiv preprint arXiv:1612.01452},
  Title = {ImageNet pre-trained models with batch normalization},
  Year = {2016}
}

The report also contains an overview and analysis of the models shown here.

Appendix: convergence plots

AlexNet_cvgj

Convergence plot of AlexNet with batch normalization

VGG19_cvgj

Convergence plot of AlexNet with batch normalization

ResNet10_cvgj

Convergence plot of AlexNet with batch normalization

Further details

Please see the accompanying technical report for further information about the models and the training procedure:

@article{simon2016cnnmodels,
  Author = {Simon, Marcel and Rodner, Erik and Denzler, Joachim},
  Journal = {arXiv preprint arXiv:1612.01452},
  Title = {ImageNet pre-trained models with batch normalization},
  Year = {2016}
}

License and support

The models are released under BSD 2-clause license allowing both academic and commercial use. I would appreciate if you give credit to this work by citing our paper in academic works and referencing to this Github repository in commercial works. If you need any support, please open an issue or contact Marcel Simon.

cnn-models's People

Contributors

hiroki11x avatar marcelsimon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cnn-models's Issues

Difference between ResNet and VGG train.prototxt?

In VGG, ReLU follows BatchNorm Layer immediately but in ResNet, a Scale Layer is inserted between BatchNorm and ReLU layer. Why is the inconsistency? What is the difference between these two usage?

Clarification on "validation set of ILSVRC 2012 - 2016 classiifcation task"

Hello
Very nice work. I am trying to reproduce the error rates for Alexnet.
Before I get started, I thought I will get a clarification on the training and validation data set you have used. There was a mention of ILSVRC 2012-2016 classification task in the "Accuracy on ImageNet" section, does that mean

  1. The accuracy mentioned is cummulative accuracy across the validation data-sets from 2012 to 2016, 4 different validation sets? If yes, then was the training also performed on 4 different data-sets?

  2. The train_log seems to be generated accross 5 days, which I think is the training time for Alexnet on the 1.2million dataset. So was the training performed on 2012 training data-set and accuracy measured across 2012-2016 validation datasets?

Thanks
Vamsi

BatchNorm: Before or after activation?

I see you are placing your BatchNorm before the activation layer. While ResNet does this, it's been demonstrated that placing the BN after the non-linearity actually increases prediction accuracy. Other deep learning experts recommend this as well.

I'm just curious why for AlexNet and VGGNet the BN came before the non-linearity? Was this an implementation choice and determined by experiments?

Unable to download model file from uni-jeana.de

Using the link provided here I was unable to download the pre-trained weights for the ResNet10 model:

$ wget https://upload.uni-jena.de/data/58493041de6f79.63214979/resnet10_cvgj_iter_320000.caffemodel

--2019-09-03 18:36:32--  https://upload.uni-jena.de/data/58493041de6f79.63214979/resnet10_cvgj_iter_320000.caffemodel

Resolving upload.uni-jena.de (upload.uni-jena.de)... 141.35.105.30, 2001:638:1558:2369:1:5ee:bad:c0de

Connecting to upload.uni-jena.de (upload.uni-jena.de)|141.35.105.30|:443... connected.

HTTP request sent, awaiting response... 404 Not Found
2019-09-03 18:36:32 ERROR 404: Not Found.

I was able to get the file from the Google Drive (thanks!) so that link is still good.

Questions of source and root_folder in train.prototxt

I have two questions when I review train.prototxt.
1.

image_data_param {
   source: "/home/atlas2_ssd/simon/ilsvrc12/train.txt"
   batch_size: 128
   shuffle: true
   root_folder: "/home/atlas1_ssd/simon/ilsvrc12-scaled/train/"
   }

Can I replace the source type to LMDB?
2. How to form root_folder(scaled) of my own data?

Thanks a lot !

ResNet50: bottom blob of expand layer

Hi there,

Thanks for sharing the pre-trained models.
I am learning the ResNets50 and have a question about the architecture. It sames that there are quite few places different with original ResNets.

  1. The data preprocess is changed from mean subtraction to batch normalization, which has been noted.

However I aware another main difference in the expanding convolution layer. For example the first one:

layer {
name: "layer_64_1_conv_expand"
type: "Convolution"
bottom: "layer_64_1_conv1"
top: "layer_64_1_conv_expand"
.......

It shows that the bottom blob come from "layer_64_1_conv1", which was "conv1_pool" in the original architecture. Is this a modification? As shown by your results that you can consistently improve the accuracy compared to the original implementation, it this the reason?

Worse performance than the reported one in Res50

Hi Developers,

I have trained the Res50 using the provided scripts, however I get worse performance as below:

Test net output #0: acc/top-1 = 0.739239
Test net output #1: acc/top-5 = 0.919044

In comparison, the reported performance is top-1/top-5 = 24.6%/7.6%. The only difference between my training and yours is I use 4 gpus in parallel as well as change the iter_size in train.solver from 8 to 4 such that the overall batch size remains the same. Could you please tell me where the gap comes from? Thanks in advance!

Question about the license

Hi,

since the pretrained model is trained with ImageNet, Is it ok to release it as a commercial model without the permission from ImageNet?

train on multi gpus

Hi,

I tried to finetune with your model. It works well on single gpu but not on multi gpus. Is it a inhere limitation or there is something I need to work around? Thanks.

Why there is no scale layer after each batch norm layer?

I found only the BN layer after the data layer is followed by a scale layer, but other BN layers are not cooperating with scale layers? Does it because the VGG is using ReLU instead of Sigmoid as its activation function? Thanks.

BN0 mean and variance

I extracted the model parameters for the pre trained model and noticed that the computed mean and variance for the input BN layer is between 1000 - 1200 and 40k+ to 50k+ respectively. How is this possible given that the image RGB layers should have values between 0-255 ?

VGG19BN: weird values of 'data/bn' parameters.

I found that the parameters of data/bn in vgg19bm has mean value 1127.633, a very weird value.

I know the mean of imagenet images are nearby 112, so why the parameters of data/bn is 1127, ten times of 112.

And I noticed that you datalayer reads data from /home/atlas1_ssd/simon/ilsvrc12-scaled/train/ (https://github.com/cvjena/cnn-models/blob/master/VGG19_cvgj/train.prototxt#L18). There is a scaled in your folder name.

So I guess you have scaled the raw image with factor 10, right?

tim 20180116145158

No scale layers after batchnorm

Hello! Thanks for these pre-trained models. I was wondering why you did not include the Scale layer after the BN layer in your AlexNet model. None of the convolutional layers have a bias term, so because there is no scale layer added, it turns out there are no bias terms throughout the network. Was that intentional?

Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn

When I load the ResNet10 model and the associated weights (from here) into DIGITS for training on a custom image dataset I get the following output:

ERROR: Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn

layer_64_1_relu2 does not need backward computation.
layer_64_1_scale2 does not need backward computation.
layer_64_1_bn2 does not need backward computation.
layer_64_1_conv1 does not need backward computation.
conv1_pool_conv1_pool_0_split does not need backward computation.
conv1_pool does not need backward computation.
conv1_relu does not need backward computation.
conv1_scale does not need backward computation.
conv1_bn does not need backward computation.
conv1 does not need backward computation.
data_scale does not need backward computation.
data_bn does not need backward computation.
label does not need backward computation.
data does not need backward computation.
This network produces output label
This network produces output prob
Network initialization done.
Solver scaffolding done.
Finetuning from /resnet10_cvgj/resnet10/resnet10_cvgj_iter_320000.caffemodel
Check failed: target_blobs.size() == source_layer.blobs_size() (5 vs. 3) Incompatible number of blobs for layer data_bn   

I have seen similar issues with comments describing this sort of error as indicative of an incompatibility between the model architecture (deploy.prototxt) and the pre-trained model weights.

Can anyone suggest how to resolve or work around this issue? Thanks in advance for any suggestions or insight.

Regarding Image pre-processing.

Hello Simon
I was able to replicate the published result of 18.1% top5 error-rate on the ILSVRC 2012 classification task using the uploaded alexnet_cvgj_iter_320000.caffemodel.
But when I tried recreating my own .caffemodel by training (2 GPUs, batch size 64 per GPU) from scratch, I was only able to achieve top5 error-rate 20.6%. One of the difference I thought could be in the image pre-processing. (Batch size 128 per GPU on 2 GPUs gave me worse results 21.4% top5 error-rate)
I have been using https://github.com/BVLC/caffe/blob/master/examples/imagenet/create_imagenet.sh to create the database for training and validation, which I believe only does resizing the Imagenet data to 256x256 images

What kind of image processing was done for the training and validation image set, for training the alexnet_cvgj model?

Thanks
Vamsi

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.