Giter Site home page Giter Site logo

chainer-faster-rcnn's Introduction

Faster R-CNN

This is an experimental implementation of Faster R-CNN using Chainer based on Ross Girshick's py-faster-rcnn codes.

Requirement

Using anaconda is strongly recommended.

  • Python 2.7.6+, 3.4.3+, 3.5.1+

    • Chainer 1.9.1+
    • NumPy 1.9, 1.10, 1.11
    • Cython 0.23+
    • OpenCV 2.9+, 3.1+

Installation of dependencies

pip install numpy
pip install cython
pip install chainer
# for python3
conda install -c https://conda.binstar.org/menpo opencv3
# for python2
conda install opencv

For Windows users

There's a known problem in cpu_nms.pyx. But a workaround has been posted here (and see also the issue posted to the original py-faster-rcnn).

Setup

1. Build extensions

cd lib
python setup.py build_ext -i
cd ..

Inference

1. Download pre-trained model

if [ ! -d data ]; then mkdir data; fi; cd data
wget https://dl.dropboxusercontent.com/u/2498135/faster-rcnn/VGG16_faster_rcnn_final.model
cd ..

NOTE: The model definition in faster_rcnn.py has been changed, so if you already have the older pre-trained model file, please download it again to replace the older one with the new one.

2. Use forward.py

wget http://vision.cs.utexas.edu/voc/VOC2007_test/JPEGImages/004545.jpg

python forward.py --img_fn 004545.jpg --gpu 0

--gpu 0 turns on GPU. When you turn off GPU, use --gpu -1 or remove --gpu option.

Layers

Summarization of Faster R-CNN layers used during inference

RPN

The region proposal layer (RPN) is consisted of AnchorTargetLayer and ProposalLayer. RPN takes feature maps from trunk network like VGG-16, and performs 3x3 convolution to it. Then, it applies two independent 1x1 convolutions to the output of the first 3x3 convolution. Resulting outputs are rpn_cls_score and rpn_bbox_pred.

  • The shape of rpn_cls_score is (N, 2 * n_anchors, 14, 14) because each pixel on the feature map has n_anchors bboxes and each bbox should have 2 values that mean object/background.
  • The shape of rpn_bbox_pred is (N, 4 * n_anchors, 14, 14) because each pixel on the feature map has n_anchors bboxes, and each bbox is represented with 4 values that mean left top x & y, width & height.

Training

1. Download dataset

if [ ! -d data ]; then mkdir data; fi; cd data
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar
tar xvf VOCtrainval_06-Nov-2007.tar
tar xvf VOCtest_06-Nov-2007.tar
tar xvf VOCdevkit_08-Jun-2007.tar
rm -rf *.tar; cd ../

2. Prepare ImageNet pre-trained model

First, if you don't have docker and nvidia-docker, install them:

sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates
sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
echo "deb https://apt.dockerproject.org/repo ubuntu-trusty main" | sudo tee /etc/apt/sources.list.d/docker.list
sudo apt-get install -y linux-image-extra-$(uname -r) linux-image-extra-virtual
sudo apt-get update
sudo apt-get install -y docker-engine
sudo service docker start

and then build caffe docker image and run the converter to make a chainer model from the pre-trained caffe model.

cd docker
bash install_caffe_docker.sh
bash create_image.sh
bash run_caffe_docker.sh
cd ..

It creates data/VGG16.model that is converted from pre-trained model in Caffe format. The pre-trained model is the one distributed in the official Model Zoo of Caffe wiki.

3. Start training

python train.py

Workflow

Note that it is a visualization of the workflow DURING INFERENCE

chainer-faster-rcnn's People

Contributors

mitmul avatar shiba24 avatar soralab avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.