Giter Site home page Giter Site logo

bcnn's Introduction

B-CNN: Bilinear CNNs for fine-grained visual recognition

Created by Tsung-Yu Lin, Aruni RoyChowdhury and Subhransu Maji at UMass Amherst

Introduction

This repository contains the code for reproducing the results in ICCV 2015 paper:

@inproceedings{lin2015bilinear,
    Author = {Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji},
    Title = {Bilinear CNNs for Fine-grained Visual Recognition},
    Booktitle = {International Conference on Computer Vision (ICCV)},
    Year = {2015}
}

The code is tested on Ubuntu 14.04 using NVIDIA K40 GPU and MATLAB R2014b.

Link to the project page.

Fine-grained classification results

Method Birds Birds + box Aircrafts Cars
B-CNN [M,M] 78.1% 80.4% 77.9% 86.5%
B-CNN [D,M] 84.1% 85.1% 83.9% 91.3%
B-CNN [D,D] 84.0% 84.8% 84.1% 90.6%

Installation

This code depends on VLFEAT and MatConvNet. Follow instructions on their project pages to install them first. Our code is built on MatConvNet version 1.0-beta8. To retrieve a particular version of MatConvNet using git type:

>> git fetch --tags
>> git checkout tags/v1.0-beta8

Once these are installed edit the setup.m to run the corresponding setup scripts.

The code implements the bilinear combination layer in symmetic and assymetic CNNs and contains scripts to fine-tune models and run experiments on several fine-grained recognition datasets. We also provide pre-trained models.

Pre-trained models

ImageNet LSVRC 2012 pre-trained models: Since we don't support the latest MatConvNet implementation, the pre-trained models download from MatConvNet page don't work properly here. We provide the links to download vgg-m and vgg-verydeep-16 in old format.

Fine-tuned models: We provide three B-CNN fine-trained models ([M,M], [D,M], and [D,D]) and SVM models trained on respective bcnn features for each of CUB-200-2011, FGVC Aircraft and Cars dataset. Note that for [M,M] and [D,D], we run the symmetric model, where you can simply use the same network for both two streams. These can be downloaded individually here.

You can also download all the model files as a tar.gz here.

Fine-grained datasets

To run experiments download the datasets from various places and edit the model_setup.m file to point it to the location of each dataset. For instance, you can point to the birds dataset directory by setting opts.cubDir = 'data/cub'.

Classification demo

The script bird_demo takes an image and runs our pre-trained fine-grained bird classifier to predict the top five species and shows some examples images of the class with the highest score. If you haven't already done so, download our pre-trained B-CNN [D,M] and SVM models for this demo and locate them in data/models. In addition, download the CUB-200-2011 dataset to data/cub as well. You can follow our default setting or edit opts in the script to point it to the models and dataset. If you have GPU installed on your machine, set opts.useGpu=true to speedup the computation. You should see the following output when you run bird_demo():

>> bird_demo();
0.09s to load imdb.
1.63s to load models into memory.
Top 5 prediction for test_image.jpg:
064.Ring_billed_Gull
059.California_Gull
147.Least_Tern
062.Herring_Gull
060.Glaucous_winged_Gull
3.80s to make predictions [GPU=0]

To run it on your own images run bird_demo('imgPath', 'favorite-bird.jpg');. Classification roughlly takes 4s per image on my laptop on a CPU. On an NVIDIA K40 GPU with bigger batch sizes you should roughly get a throughput of 8 images/second with the B-CNN [D,M] model.

Fine-tuning B-CNN models

See run_experiments_bcnn_train.m for fine-tuning a B-CNN model. Note that this code caches all the intermediate results during fine-tuning which takes about 200GB disk space.

Here are the steps to fine-tuning a B-CNN [M,M] model on the CUB dataset:

  1. Download CUB-200-2011 dataset (see link above)

  2. Edit opts.cubDir=CUBROOT in model_setup.m, CUBROOT is the location of CUB dataset.

  3. Download imagenet-vgg-m model (see link above)

  4. Set the path of the model in run_experiments_bcnn_train.m. For example, set PRETRAINMODEL='data/model/imagenet-vgg-m.mat', to use the Oxford's VGG-M model trained on ImageNet LSVRC 2012 dataset. You also have to set the bcnnmm.opts to:

     bcnnmm.opts = {..
        'type', 'bcnn', ...
        'modela', PRETRAINMODEL, ...
        'layera', 14,...
        'modelb', PRETRAINMODEL, ...
        'layerb', 14,...
        'shareWeight', true,...
     } ;
    

    The option shareWeight=true implies that the blinear model uses the same CNN to extract both features resulting in a symmetric model. For assymetric models set shareWeight=false. Note that this roughly doubles the GPU memory requirement.

  5. Once the fine-tuning is complete, you can train a linear SVM on the extracted features to evaluate the model. See run_experiments.m for training/testing using SVMs. You can simply set the MODELPATH to the location of the fine-tuned model by setting MODELPATH='data/ft-models/bcnn-cub-mm.mat' and the bcnnmm.opts to:

     bcnnmm.opts = {..
        'type', 'bcnn', ...
        'modela', MODELPATH, ...
        'layera', 14,...
        'modelb', MODELPATH, ...
        'layerb', 14,...
     } ;
    
  6. And type >> run_experiments() on the MATLAB command line. The results with be saved in the opts.resultPath.

Implementation details

The asymmetric B-CNN model is implemented using two networks whose feature outputs are bilinearly combined followed by a shallow network for normalization and computing softmax loss. This implementation runs forward and backward passes through two networks separatey. You can find the details in bcnn_train().

When the same network is used to extract both features, the symmetric B-CNN model is implemented as a single network architecture consisting of bilinearpool, sqrt, and l2norm layers on the top of convolutional layers. This implementation is about twice as fast and memory efficient than asymmetric implementaion.

The code for B-CNN is implemented in the following MATLAB functions:

  1. vl_bilinearnn() : This extends vl_simplenn() of the MatConvNet library to include the bilinear layers.
  2. vl_nnbilinearpool(): Bilinear feature pooling with outer product with itself.
  3. vl_nnbilinearclpool(): Bilinear feature pooling with outer product of two different features. Current version only supports the same resolution of two feature outputs.
  4. vl_nnsqrt(): Signed square-root normalization.
  5. vl_nnl2norm(): L2 normalization.

Running B-CNN on other datasets

The code can be used for other classification datasets as well. You have to implement the corresponding >> imdb = <dataset-name>_get_database() function that returns the imdb structure in the right format. Take a look at the cub_get_database.m file as an example.

Acknowldgements

We thank MatConvNet and VLFEAT teams for creating and maintaining these excellent packages.

bcnn's People

Contributors

msubhransu avatar tsungyu avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.