Giter Site home page Giter Site logo

xychen9459 / anchor-boxes-with-kmeans Goto Github PK

View Code? Open in Web Editor NEW

This project forked from joydeepmedhi/anchor-boxes-with-kmeans

0.0 3.0 0.0 5.98 MB

How to initialize Anchors in Faster RCNN for custom dataset?

Jupyter Notebook 99.69% Python 0.31%

anchor-boxes-with-kmeans's Introduction

Initial Anchor Boxes Estimation using KMeans Clusterring for Faster-RCNN

Introduction

Faster-RCNN is one of the state-of-the-art object detection algorithms around.

If you are not familiar with Faster-RCNN, Please go through this blog.

Here is the link to the original paper Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.

When we train Faster RCNN for custom datasets, we often get confused over how to choose hyperparameters for the Network. Anchor boxes (one of the hyperparameters) are very important to detect objects with different scales and aspect ratios. We will get improved detection results if we get the anchors right.

The training & hypereparameters are in accordance with Tensorflow Object Detection API.

Faster-RCNN config file

faster_rcnn{
    # other hyperparameters

    first_stage_anchor_generator {
      grid_anchor_generator {
        height: 256
        width: 256
        height_stride: 16
        width_stride: 16
        scales: 0.9
        scales: 1.14
        scales: 1.53
        aspect_ratios: .8
        aspect_ratios: 1.15
        aspect_ratios: 2.77
      }
    }    
}

height & width

This is the size of base anchor size. (i.e. for scale 1 and aspect ratio 1, base anchor is 256 x 256)

height_stride & width_stride

This is basically the stride of anchor centers. Generally, we want to visit each point of the feature map (final convolutional layer) and create a set of anchors. Hence, It is the subsampling ratio of the network. In case of VGG16 this ratio is 16. Different network archetectures have different subsampling ratios. User may select this stride as per the base-model or use case.

scales & aspect_ratios

Aspect Ratio of an anchor box is basically width/height. Scales are bigger as the anchor box are from base box (i.e. 512 x 512 box is twice as big as 256 x 256).

if aspect_ratio = ar
   base_anchor = 256 x 256
   "width_b x height_b" is the dimension of an anchor box

width_b = scale * sqrt(ar) * base_anchor[0]
height_b = scale * base_anchor[1] / sqrt(ar)

Analysis of bounding boxes (Training data)

  1. Convert the XML files to a csv file.

    xml_to_csv.py (modify this file as per your XML format)

  2. Open EDA_of_bbox.ipynb jupyter notebook for analysis.

    Here, we convert the image dimension with _compute_new_static_size() function. Then we normalize bounding box height and width according to new image dimension.

Then we find optimal clusters and cluster centers using K-Means. This is inspired from YOLO.

Distribution of Bounding Boxes!

bbox

Experiments

1

Cluster bbox (width, height) on eucledian distance metric

clusters

Blue Line - Base Model (cards dataset)

Red Line - Cluster Model (cards dataset)

precision_eu

f1_eu

2

Cluster bbox (width, height) on iou metric (This is prefered as eucledian distance metric will give priority to bigger boxes and minimize their loss)

Blue Line - Base Model (cards dataset)

Pink Line - IOU Cluster Model (cards dataset)

Precision_iou F1_iou

3

Cluster AR and Scales of bbox Separately with eucledian distance metric.

Blue Line - Base Model (cards dataset)

Green Line - Cluster Model (cards dataset)

Precision_a_s F1_a_s

*************** More to be added *****************


References

  1. KMeans in YOLO
  2. Cards Dataset (Reference)
  3. Advantage & Disadvantage of KMeans
  4. Different Clusterring Algorithms

anchor-boxes-with-kmeans's People

Contributors

joydeepmedhi avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.