Here is the python program to compute the HOG (Histograms of Oriented Gradients)
feature from an input image and then classify the HOG feature vector into human or no-human by
using a 3-nearest neighbor (NN) classifier. In the 3-NN classifier, the distance between the input
image and a training image is computed by taking the histogram intersection of their HOG feature
vectors:
where I is the HOG feature of the input image and M is the HOG feature of the training image;
the subscript j indicates the jth component of the feature vector and n is the dimension of the
HOG feature vector. The distance between the input image and each of the training images is
computed and the classification of the input image is taken to be the majority classification of the
three nearest neighbors.
The inputs to your program are color images cut out from a larger image. First, convert the color images into grayscale using the formula πΌπΌ = π π π π π π π π π π (0.299π π + 0.587πΊπΊ + 0.114π΅π΅) where R, G and B are the pixel values from the red, green and blue channels of the color image, respectively, and Round is the round off operator.
Here Prewittβs operator is used for the computation of horizontal and vertical to compute gradient magnitudes. Normalize and round off the gradient magnitude to integers within the range [0, 255]. Next, compute the gradient angle. For image locations where the templates go outside of the borders of the image, assign a value of 0 to both the gradient magnitude and gradient angle. Also, if both πΊπΊπ₯π₯ and πΊπΊπ¦π¦ are 0, assign a value of 0 to both gradient magnitude and gradient angle.
Here we are using the unsigned representation and quantize the gradient angle into one of the 9 bins as shown in the
table below. If the gradient angle is within the range [180, 360), simply subtract the angle by 180
first. Use the following parameter values in your implementation: cell size = 8 x 8 pixels, block
size = 16 x 16 pixels (or 2 x 2 cells), block overlap or step size = 8 pixels (or 1 cell.) Use L2
norm for block normalization. Leave the histogram and final feature values as floating point
numbers
A set of 20 training images and a set of 10 test images in .bmp format will be provided. The training set contains 10 positive (human) and 10 negative (no human) samples and the test set contains 5 positive and 5 negative samples. All images are of size 160 (height) X 96 (width). With the given image size and the parameters given above for computing the HOG feature, there are 20 X 12 cells and 19 X 11 blocks in the detection window. The dimension of the HOG feature vector is 7,524.
To see outputs and results please check Human Detection.pdf