Woman and man image classifier based on convolutional neural network. The architecture is a self-modified version of VGG16 [1] model.
Man: 0.99846715 Man: Woman 0.00153284 |
Man: 0.000028477365 Woman: 0.99997151 |
Install dependencies from requirements.txt by executing pip command.
pip install -r /path/to/requirements.txt
Run the program python main.py
and fallow the menu by choosing
1 and typing path of a jpg, png or bnp image, i.e. data/Aman.jpg
.
The neural network is based on 19 layers, where 13 layers are convolutional layers with weights from VGG16 [1] model using transfer learning technique, three layers are full connected layers, with batch normalization, applied before every layer.
Leyer (Type) Output dimensions _________________________________________________________________ conv2d_1 (Convolutional 3x3) (None, 224, 224, 64) 1792 _________________________________________________________________ conv2d_2 (Convolutional 3x3) (None, 224, 224, 64) 36928 _________________________________________________________________ max_pooling2d_1 (MaxPooling 2x2) (None, 112, 112, 64) 0 _________________________________________________________________ conv2d_3 (Convolutional 3x3) (None, 112, 112, 128) 73856 _________________________________________________________________ conv2d_4 (Convolutional 3x3) (None, 112, 112, 128) 147584 _________________________________________________________________ max_pooling2d_2 (Max Pooling 2x2) (None, 56, 56, 128) 0 _________________________________________________________________ conv2d_5 (Convolutional 3x3) (None, 56, 56, 256) 295168 _________________________________________________________________ conv2d_6 (Convolutional 3x3) (None, 56, 56, 256) 590080 _________________________________________________________________ conv2d_7 (Convolutional 3x3) (None, 56, 56, 256) 590080 _________________________________________________________________ max_pooling2d_3 (MaxPooling 2x2) (None, 28, 28, 256) 0 _________________________________________________________________ conv2d_8 (Convolutional 3x3) (None, 28, 28, 512) 1180160 _________________________________________________________________ conv2d_9 (Convolutional 3x3) (None, 28, 28, 512) 2359808 _________________________________________________________________ conv2d_10 (Convolutional 3x3) (None, 28, 28, 512) 2359808 _________________________________________________________________ max_pooling2d_4 (MaxPooling 2x2) (None, 14, 14, 512) 0 _________________________________________________________________ conv2d_11 (Convolutional 3x3) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_12 (Convolutional 3x3) (None, 14, 14, 512) 2359808 _________________________________________________________________ conv2d_13 (Convolutional 3x3) (None, 14, 14, 512) 2359808 _________________________________________________________________ max_pooling2d_5 (MaxPooling 2x2) (None, 7, 7, 512) 0 _________________________________________________________________ flatten_1 (Flatten) (None, 25088) 0 _________________________________________________________________ batch_norm_1 (Batch Normalization) (None, 25088) 100352 _________________________________________________________________ dense_1 (Fully Connected) (None, 4096) 102764544 _________________________________________________________________ batch_norm_2 (Batch Normalization) (None, 4096) 16384 _________________________________________________________________ dense_2 (Fully connected) (None, 4096) 16781312 _________________________________________________________________ batch_norm_3 (Batch Normalization) (None, 4096) 16384 _________________________________________________________________ dense_3 (Fully connected) (None, 2) 8194 =================================================================
As the optimizer Adam [2] was used and as the loss function Binary Crossentropy
Input are images with shape (224, 224, 3), so they are 224 x 224 pixels with 3 RGB channels. Nevertheless, if you use other size or greyscale image, the program will convert it the proper shape.
The threshold is set to get the highest specificity and sensitivity. It was set to 0.2 based on the ROC (Receiver operating characteristic) and grid search. Giving sensitivity about 0.925 and specificity 0.956. #####ROC curve is shown below:
The dataset was not published due to usage of own private images.
D. Wiczew, Man-Woman Classifier (2019), GitHub repository https://github.com/danielwicz/manwoman-classifier
[1] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556. [2] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.