B-CNN: Bilinear CNNs for fine-grained visual recognition

Created by Tsung-Yu Lin, Aruni RoyChowdhury and Subhransu Maji at UMass Amherst

Introduction

This repository contains the code for reproducing the results in ICCV 2015 paper:

@inproceedings{lin2015bilinear,
    Author = {Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji},
    Title = {Bilinear CNNs for Fine-grained Visual Recognition},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2015}
}

The code is tested on Ubuntu 14.04 using NVIDIA K40 GPU and MATLAB R2014b.

Link to the project page.

Fine-grained classification results

Method	Birds	Birds + box	Aircrafts	Cars
B-CNN [M,M]	78.1%	80.4%	77.9%	86.5%
B-CNN [D,M]	84.1%	85.1%	83.9%	91.3%
B-CNN [D,D]	84.0%	84.8%	84.1%	90.6%

Dataset details:
- Birds: CUB-200-2011 dataset. Birds + box uses bounding-boxes at training and test time.
- Aircrafts: FGVC aircraft dataset
- Cars: Stanford cars dataset
These results are with domain specific fine-tuning. For more details see the updated B-CNN tech report.
The pre-trained models are available (see below).

Installation

This code depends on VLFEAT and MatConvNet. Follow instructions on their project pages to install them first. Our code is built on MatConvNet version 1.0-beta8. To retrieve a particular version of MatConvNet using git type:

>> git fetch --tags
>> git checkout tags/v1.0-beta8

Once these are installed edit the setup.m to run the corresponding setup scripts.

The code implements the bilinear combination layer in symmetic and assymetic CNNs and contains scripts to fine-tune models and run experiments on several fine-grained recognition datasets. We also provide pre-trained models.

Pre-trained models

ImageNet LSVRC 2012 pre-trained models: Since we don't support the latest MatConvNet implementation, the pre-trained models download from MatConvNet page don't work properly here. We provide the links to download vgg-m and vgg-verydeep-16 in old format.

Fine-tuned models: We provide three B-CNN fine-trained models ([M,M], [D,M], and [D,D]) and SVM models trained on respective bcnn features for each of CUB-200-2011, FGVC Aircraft and Cars dataset. Note that for [M,M] and [D,D], we run the symmetric model, where you can simply use the same network for both two streams. These can be downloaded individually here.

You can also download all the model files as a tar.gz here.

Fine-grained datasets

To run experiments download the datasets from various places and edit the model_setup.m file to point it to the location of each dataset. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'.

Classification demo

The script bird_demo takes an image and runs our pre-trained fine-grained bird classifier to predict the top five species and shows some examples images of the class with the highest score. If you haven't already done so, download our pre-trained B-CNN [D,M] and SVM models for this demo and locate them in data/models. In addition, download the CUB-200-2011 dataset to data/cub as well. You can follow our default setting or edit opts in the script to point it to the models and dataset. If you have GPU installed on your machine, set opts.useGpu=true to speedup the computation. You should see the following output when you run bird_demo():

>> bird_demo();
0.09s to load imdb.
1.63s to load models into memory.
Top 5 prediction for test_image.jpg:
064.Ring_billed_Gull
059.California_Gull
147.Least_Tern
062.Herring_Gull
060.Glaucous_winged_Gull
3.80s to make predictions [GPU=0]

To run it on your own images run bird_demo('imgPath', 'favorite-bird.jpg');. Classification roughlly takes 4s per image on my laptop on a CPU. On an NVIDIA K40 GPU with bigger batch sizes you should roughly get a throughput of 8 images/second with the B-CNN [D,M] model.

Fine-tuning B-CNN models

See run_experiments_bcnn_train.m for fine-tuning a B-CNN model. Note that this code caches all the intermediate results during fine-tuning which takes about 200GB disk space.

Here are the steps to fine-tuning a B-CNN [M,M] model on the CUB dataset:

Download CUB-200-2011 dataset (see link above)
Edit opts.cubDir=CUBROOT in model_setup.m, CUBROOT is the location of CUB dataset.
Download imagenet-vgg-m model (see link above)
Set the path of the model in run_experiments_bcnn_train.m. For example, set PRETRAINMODEL='data/model/imagenet-vgg-m.mat', to use the Oxford's VGG-M model trained on ImageNet LSVRC 2012 dataset. You also have to set the bcnnmm.opts to:
```
 bcnnmm.opts = {..
    'type', 'bcnn', ...
    'modela', PRETRAINMODEL, ...
    'layera', 14,...
    'modelb', PRETRAINMODEL, ...
    'layerb', 14,...
    'shareWeight', true,...
 } ;
```
The option shareWeight=true implies that the blinear model uses the same CNN to extract both features resulting in a symmetric model. For assymetric models set shareWeight=false. Note that this roughly doubles the GPU memory requirement.
Once the fine-tuning is complete, you can train a linear SVM on the extracted features to evaluate the model. See run_experiments.m for training/testing using SVMs. You can simply set the MODELPATH to the location of the fine-tuned model by setting MODELPATH='data/ft-models/bcnn-cub-mm.mat' and the bcnnmm.opts to:
```
 bcnnmm.opts = {..
    'type', 'bcnn', ...
    'modela', MODELPATH, ...
    'layera', 14,...
    'modelb', MODELPATH, ...
    'layerb', 14,...
 } ;
```
And type >> run_experiments() on the MATLAB command line. The results with be saved in the opts.resultPath.

Implementation details

The asymmetric B-CNN model is implemented using two networks whose feature outputs are bilinearly combined followed by a shallow network for normalization and computing softmax loss. This implementation runs forward and backward passes through two networks separatey. You can find the details in bcnn_train().

When the same network is used to extract both features, the symmetric B-CNN model is implemented as a single network architecture consisting of bilinearpool, sqrt, and l2norm layers on the top of convolutional layers. This implementation is about twice as fast and memory efficient than asymmetric implementaion.

The code for B-CNN is implemented in the following MATLAB functions:

vl_bilinearnn() : This extends vl_simplenn() of the MatConvNet library to include the bilinear layers.
vl_nnbilinearpool(): Bilinear feature pooling with outer product with itself.
vl_nnbilinearclpool(): Bilinear feature pooling with outer product of two different features. Current version only supports the same resolution of two feature outputs.
vl_nnsqrt(): Signed square-root normalization.
vl_nnl2norm(): L2 normalization.

Running B-CNN on other datasets

The code can be used for other classification datasets as well. You have to implement the corresponding >> imdb = <dataset-name>_get_database() function that returns the imdb structure in the right format. Take a look at the cub_get_database.m file as an example.

Acknowldgements

We thank MatConvNet and VLFEAT teams for creating and maintaining these excellent packages.

watwang / bcnn Goto Github PK

bcnn's Introduction

B-CNN: Bilinear CNNs for fine-grained visual recognition

Introduction

Fine-grained classification results

Installation

Pre-trained models

Fine-grained datasets

Classification demo

Fine-tuning B-CNN models

Implementation details

Running B-CNN on other datasets

Acknowldgements

bcnn's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent