experiencor / keras-yolo2 Goto Github PK

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).

License: MIT License

Jupyter Notebook 96.01% Python 3.99%

convolutional-networks deep-learning yolo2 realtime regression

keras-yolo2's Introduction

YOLOv2 in Keras and Applications

This repo contains the implementation of YOLOv2 in Keras with Tensorflow backend. It supports training YOLOv2 network with various backends such as MobileNet and InceptionV3. Links to demo applications are shown below. Check out https://experiencor.github.io/yolo_demo/demo.html for a Raccoon Detector demo run entirely in brower with DeepLearn.js and MobileNet backend (it somehow breaks in Window). Source code of this demo is located at https://git.io/vF7vG.

Todo list:

Warmup training
Raccoon detection, Self-driving car, and Kangaroo detection
SqueezeNet, MobileNet, InceptionV3, and ResNet50 backends
Support python 2.7 and 3.6
Multiple-GPU training
Multiscale training
mAP Evaluation

Some example applications (click for videos):

Usage for python code

0. Requirement

python 2.7

keras >= 2.0.8

imgaug

1. Data preparation

Download the Raccoon dataset from from https://github.com/experiencor/raccoon_dataset.

Organize the dataset into 4 folders:

train_image_folder <= the folder that contains the train images.
train_annot_folder <= the folder that contains the train annotations in VOC format.
valid_image_folder <= the folder that contains the validation images.
valid_annot_folder <= the folder that contains the validation annotations in VOC format.

There is a one-to-one correspondence by file name between images and annotations. If the validation set is empty, the training set will be automatically splitted into the training set and validation set using the ratio of 0.8.

2. Edit the configuration file

The configuration file is a json file, which looks like this:

{
    "model" : {
        "architecture":         "Full Yolo",    # "Tiny Yolo" or "Full Yolo" or "MobileNet" or "SqueezeNet" or "Inception3"
        "input_size":           416,
        "anchors":              [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
        "max_box_per_image":    10,        
        "labels":               ["raccoon"]
    },

    "train": {
        "train_image_folder":   "/home/andy/data/raccoon_dataset/images/",
        "train_annot_folder":   "/home/andy/data/raccoon_dataset/anns/",      
          
        "train_times":          10,             # the number of time to cycle through the training set, useful for small datasets
        "pretrained_weights":   "",             # specify the path of the pretrained weights, but it's fine to start from scratch
        "batch_size":           16,             # the number of images to read in each batch
        "learning_rate":        1e-4,           # the base learning rate of the default Adam rate scheduler
        "nb_epoch":             50,             # number of epoches
        "warmup_epochs":        3,              # the number of initial epochs during which the sizes of the 5 boxes in each cell is forced to match the sizes of the 5 anchors, this trick seems to improve precision emperically

        "object_scale":         5.0 ,           # determine how much to penalize wrong prediction of confidence of object predictors
        "no_object_scale":      1.0,            # determine how much to penalize wrong prediction of confidence of non-object predictors
        "coord_scale":          1.0,            # determine how much to penalize wrong position and size predictions (x, y, w, h)
        "class_scale":          1.0,            # determine how much to penalize wrong class prediction

        "debug":                true            # turn on/off the line that prints current confidence, position, size, class losses and recall
    },

    "valid": {
        "valid_image_folder":   "",
        "valid_annot_folder":   "",

        "valid_times":          1
    }
}

The model section defines the type of the model to construct as well as other parameters of the model such as the input image size and the list of anchors. The labels setting lists the labels to be trained on. Only images, which has labels being listed, are fed to the network. The rest images are simply ignored. By this way, a Dog Detector can easily be trained using VOC or COCO dataset by setting labels to ['dog'].

Download pretrained weights for backend (tiny yolo, full yolo, squeezenet, mobilenet, and inceptionV3) at:

https://drive.google.com/drive/folders/10oym4eL2RxJa0gro26vzXK__TtYOP5Ng

These weights must be put in the root folder of the repository. They are the pretrained weights for the backend only and will be loaded during model creation. The code does not work without these weights.

The link to the pretrained weights for the whole model (both frontend and backend) of the raccoon detector can be downloaded at:

https://drive.google.com/drive/folders/10oym4eL2RxJa0gro26vzXK__TtYOP5Ng

These weights can be used as the pretrained weights for any one class object detectors.

3. Generate anchors for your dataset (optional)

python gen_anchors.py -c config.json

Copy the generated anchors printed on the terminal to the anchors setting in config.json.

4. Start the training process

python train.py -c config.json

By the end of this process, the code will write the weights of the best model to file best_weights.h5 (or whatever name specified in the setting "saved_weights_name" in the config.json file). The training process stops when the loss on the validation set is not improved in 3 consecutive epoches.

5. Perform detection using trained weights on an image by running

python predict.py -c config.json -w /path/to/best_weights.h5 -i /path/to/image/or/video

It carries out detection on the image and write the image with detected bounding boxes to the same folder.

Usage for jupyter notebook

Refer to the notebook (https://github.com/experiencor/basic-yolo-keras/blob/master/Yolo%20Step-by-Step.ipynb) for a complete walk-through implementation of YOLOv2 from scratch (training, testing, and scoring).

Evaluation of the current implementation:

Train	Test	mAP (with this implementation)	mAP (on released weights)
COCO train	COCO val	28.6	42.1

The code to evaluate detection results can be found at #27.

Copyright

See LICENSE for details.

keras-yolo2's People

Contributors

Stargazers

Watchers

Forkers

cklmnhzve rtao zouwen198317 yangshiyu89 taodream pfgoting nature0310 unyqhz hobson khoaprogrammer supernihui littleboss kamil-k jfsdcgy zengxiao1028 oishi89 v-italy luislofer89 angelajiang vtaranti watkyns leesoon1984 cheneason bittdy willdamon wsz912 allensmile statml taojiastanford nicehuster123 shahariarrabby cgcooke virginiayung vkbss usatenko borisnadion longchuan1985 lif3line dgreyling johnkari sandyhsia sirotenko tiravata yuye1992 alessandro-montanari shwars lorenzoferrante tonykuo222 terrorists ttorkar enriquesolarte horngjason fitrialif alanjschoen karthikbhat13 iceriverg yasheshsavani mzk665 dohoit2016 zhf459 ivanlin003 erhwenkuo ossdc snehil allen1202 vinodrajendran001 cfsantos mrshu npetsky darjoo iseehz0530 shartoo dimplesl wwymak rangk friendmine deepersystems geekysethi intellectual-cafe ml-lab drorhilman fractalbass maylisloisy tonyle9 yashar78 rsbohn willgeary chaoyueziji rocketredneck ani1cr7 nandhakishore11 chaoyu123 lazycrazyowl world4jason abhishekkodi singhamitkumar shahbazmancho shandude po-hsuan-huang mayforcebewithyou

keras-yolo2's Issues

pretrained model

You wouldn't happen to have a pretrained keras model would you?

Model for hand detection

can you please share the model for hand detection?

Training from scratch

first of all, Good job
I want to train this model from scratch. (e.g. tiny yolo)
for real time problems
when i tried to turn the train.py file, an error appears that it can't open the weights file
" OSError: Unable to open file (Unable to open file: name = 'full_yolo_features.h5', errno = 2, error message = 'no such file or directory', flags = 0, o_flags = 0) "
I didn't want to load it
i want to load the model architecture then train it in my own dataset

Thanks in advance.

Output size is wrong I think

https://github.com/experiencor/basic-yolo-keras/blob/88c47da1ba9681bf769b8599991366ba0f20f61f/frontend.py#L63

@experiencor I think this is part of the same issue as before where the output size should be 6 not 5 + nb_class

Output of Yolo-Keras

1425/3080 [============>.................] - ETA: 1803s - loss: 0.15872017-11-18 10:13:30.536160: I tensorflow/core/kernels/logging_ops.cc:79] Dummy Line [0]
2017-11-18 10:13:30.536237: I tensorflow/core/kernels/logging_ops.cc:79] Loss XY [0.0229914095]
2017-11-18 10:13:30.536257: I tensorflow/core/kernels/logging_ops.cc:79] Loss WH [0.0974636301]
2017-11-18 10:13:30.536286: I tensorflow/core/kernels/logging_ops.cc:79] Loss Conf [0.00172986195]
2017-11-18 10:13:30.536298: I tensorflow/core/kernels/logging_ops.cc:79] Loss Class [0.0058139083]
2017-11-18 10:13:30.536310: I tensorflow/core/kernels/logging_ops.cc:79] Total Loss [0.127998814]
2017-11-18 10:13:30.536321: I tensorflow/core/kernels/logging_ops.cc:79] Current Recall [0.607142866]
2017-11-18 10:13:30.536334: I tensorflow/core/kernels/logging_ops.cc:79] Average Recall [0.567189932]
l_bound:1232
r_bound:1248
r_bound:1248
l_bound:1232
1426/3080 [============>.................] - ETA: 1802s - loss: 0.15872017-11-18 10:13:31.626686: I tensorflow/core/kernels/logging_ops.cc:79] Dummy Line [0]
2017-11-18 10:13:31.626748: I tensorflow/core/kernels/logging_ops.cc:79] Loss XY [0.0277243108]
2017-11-18 10:13:31.626766: I tensorflow/core/kernels/logging_ops.cc:79] Loss WH [0.126982644]
2017-11-18 10:13:31.626779: I tensorflow/core/kernels/logging_ops.cc:79] Loss Conf [0.000806401658]
2017-11-18 10:13:31.626790: I tensorflow/core/kernels/logging_ops.cc:79] Loss Class [0.0110849598]
2017-11-18 10:13:31.626802: I tensorflow/core/kernels/logging_ops.cc:79] Total Loss [0.16659832]
2017-11-18 10:13:31.626812: I tensorflow/core/kernels/logging_ops.cc:79] Current Recall [0.763157904]
2017-11-18 10:13:31.626824: I tensorflow/core/kernels/logging_ops.cc:79] Average Recall [0.567207873]

Is this correct output or am I missing something. Please help

Error in parse_annotation

Dear experiencor,
When I run the code in the notebook "Yolo Step-by-Step", I found the following error executing the 10th command (Image attached):

File "", line unknown ParseError: not well-formed (invalid token): line 1, column 0

I was using the pictures and annotations downloaded from COCO network, and I put them in a backup hard drive (because the size of files is huge for my computer...).

Thank you! Also thanks for your code and detailed tutorial!

How to fit kitti image shape?

Hi, experiencor, sorry for bother, but may I have some questions about train and prediction?

1. I want train on kitti, to detect Car and Pedestrian and etc, but kitti image shape is (1274, 375),
 but yolo input size is square and it's 416, if I leave it in 416, then then image would be crop too
 much, and the height is not that enough, so what should I set the inputsize?
2. If I changed the input size, should I edit the net structure?

Really helps for your advise!

total size of new array must be unchanged

tried to build the model and error come up at this line:
output = Reshape((GRID_H, GRID_W, BOX, 4 + 1 + CLASS))(x)

the only thing i change is the LABELS, only one class. Please help!

And can you change the model to the old-style (create the Sequence and add layers one by one)? So we can watch it's summary whenever we want?

trained weights

Could you provide a trained weights in HDF5 format?

great job, thank you for sharing.

Using Full Yolo Features

Hi. Trying to use your full_yolo_features.h5 which I believe are pretrained weights from ImageNet (or maybe something else?) as the basis for the weights of all the layers before the last couple+softmax.

I just ran python train.py -c config.json with the "pretrained_weights": "full_yolo_features.h5", in my config.json.

Unsure what I'm doing wrong - I think the model that's built isn't being isn't separating the layers properly?

Also, I'm trying to use this to train on VOC2007. What does the program do about boxes with labels that aren't included in the "labels" key in the config.json?

Included a snippet from my config.json as well as my console.

"model" : {
        "architecture":         "Full Yolo",
        "input_size":           416,
        "anchors":              [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
        "max_box_per_image":    10,        
        "labels":               ["aeroplane","bicycle","bird","boat","bottle","bus","car","cat","chair","cow","diningtable","dog","horse","motorbike","person","pottedplant","sheep","sofa","train","tvmonitor"]
    },

vivekme-mac02:basic-yolo-keras vivekme$ python train.py -c config.json
Using TensorFlow backend.
2017-11-02 10:56:49.309727: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.2 AVX
(13, 13)
____________________________________________________________________________________________________
Layer (type)                     Output Shape          Param #     Connected to
====================================================================================================
input_1 (InputLayer)             (None, 416, 416, 3)   0
____________________________________________________________________________________________________
model_1 (Model)                  (None, 13, 13, 1024)  50547936    input_1[0][0]
____________________________________________________________________________________________________
conv_23 (Conv2D)                 (None, 13, 13, 125)   128125      model_1[1][0]
____________________________________________________________________________________________________
reshape_1 (Reshape)              (None, 13, 13, 5, 25) 0           conv_23[0][0]
____________________________________________________________________________________________________
input_2 (InputLayer)             (None, 1, 1, 1, 10, 4 0
____________________________________________________________________________________________________
lambda_2 (Lambda)                (None, 13, 13, 5, 25) 0           reshape_1[0][0]
                                                                   input_2[0][0]
====================================================================================================
Total params: 50,676,061
Trainable params: 50,655,389
Non-trainable params: 20,672
____________________________________________________________________________________________________
Loading pre-trained weights in full_yolo_features.h5
Traceback (most recent call last):
  File "train.py", line 134, in <module>
    _main_(args)
  File "train.py", line 111, in _main_
    yolo.load_weights(config['train']['pretrained_weights'])
  File "/Users/vivekme/workspace/basic-yolo-keras/frontend.py", line 228, in load_weights
    self.model.load_weights(weight_path)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/keras/engine/topology.py", line 2619, in load_weights
    load_weights_from_hdf5_group(f, self.layers)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/keras/engine/topology.py", line 3068, in load_weights_from_hdf5_group
    str(len(filtered_layers)) + ' layers.')
ValueError: You are trying to load a weight file containing 44 layers into a model with 2 layers.

divided by 255.

in the below code, why did you divide image by 255 ONLY in 'Predict' function?
and why not in 'Training' function?

Don't we need to divide them in both cases?

    def preprocess(self, image):
        input_image = cv2.resize(image, (self.input_size, self.input_size))
        input_image = **input_image / 255.**

thank you!

not enough params

Block 14 of the notebook:
train_imgs = parse_annotation(coco_train_path)

which uses parse_annotation from the python preprocessing.py script should be given a second param 'labels'.

TypeError: parse_annotation() missing 1 required positional argument: 'labels'

There is an error in predicting use weights from other machine .

hi，the error is " You are trying to load a weight file containing 2 layers into a model with 44 layers " .

Question mAP

Hi, thanks for posting your code :)

I have a question about mAP. I dont understand how to compute this metric.
I saw that you calc it but can´t find it in your code.

Thank you!

training own data

I am trying to train own data, i.e. scene text(road sign) detection.

Q1) I have 600 annotated data, but when I train with batch size = 16, there are ONLY 3~4 STEPS per a epoch.

see below report.

what's wrong with this? I expected 600/16 = 35 steps per a epoch.

Q2) I wonder why it does not work well when I don't use pre-trained weights.
it just become explosion or meaningless oscillating.

Thank you very much in advance! :)

Epoch 1/50
1/3 [======>.......................] - ETA: 78s - loss: 52996.2266
2/3 [==============>...............] - ETA: 46s - loss: 26586.9480
3/3 [======================>.......] - ETA: 20s - loss: 17939.5313Epoch 00000: val_loss improved from inf to 69.81903, saving model to best_weights.h5

4/3 [===============================] - 383s - loss: 13474.0479 - val_loss: 69.8190
Epoch 2/50
1/3 [======>.......................] - ETA: 64s - loss: 110.6660
2/3 [==============>...............] - ETA: 41s - loss: 110.1645
3/3 [======================>.......] - ETA: 18s - loss: 120.9267Epoch 00001: val_loss did not improve

4/3 [===============================] - 390s - loss: 134.6366 - val_loss: 176.3403
Epoch 3/50
1/3 [======>.......................] - ETA: 70s - loss: 140.6994
2/3 [==============>...............] - ETA: 44s - loss: 131.1704
3/3 [======================>.......] - ETA: 19s - loss: 135.7138Epoch 00002: val_loss did not improve

4/3 [===============================] - 385s - loss: 155.2880 - val_loss: 157.5658
Epoch 4/50
1/3 [======>.......................] - ETA: 65s - loss: 79.9112
2/3 [==============>...............] - ETA: 42s - loss: 43.6893
3/3 [======================>.......] - ETA: 18s - loss: 31.8849Epoch 00003: val_loss improved from 69.81903 to 21.53916, saving model to best_weights.h5

4/3 [===============================] - 383s - loss: 33.7253 - val_loss: 21.5392
Epoch 5/50
1/3 [======>.......................] - ETA: 65s - loss: 15.6504
2/3 [==============>...............] - ETA: 41s - loss: 11.6136
3/3 [======================>.......] - ETA: 18s - loss: 9.0862

raccoon training problem

hey, i am trying to test the raccoon dataset, i clone your dataset and model and code here, i got the error that i can't understand. i am very new in object detection field , could you plz help me up?

Epoch 1/50
Traceback (most recent call last):
  File "train.py", line 140, in <module>
    _main_(args)
  File "train.py", line 136, in _main_
    debug              = config['train']['debug'])
  File "/home/e44041034/basic-yolo-keras/frontend.py", line 445, in train
    max_queue_size   = 8)
  File "/home/e44041034/basic-yolo-keras/yolokeras/local/lib/python2.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/e44041034/basic-yolo-keras/yolokeras/local/lib/python2.7/site-packages/keras/engine/training.py", line 2083, in fit_generator
    generator_output = next(output_generator)
  File "/home/e44041034/basic-yolo-keras/yolokeras/local/lib/python2.7/site-packages/keras/utils/data_utils.py", line 553, in get
    raise StopIteration(e)
StopIteration: 'NoneType' object has no attribute 'shape'

    "model" : {
        "architecture":         "Full Yolo",
        "input_size":           416,
        "anchors":              [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
        "max_box_per_image":    10,
        "labels":               ["raccoon"]
    },

    "train": {
        "train_image_folder":   "/home/e44041034/basic-yolo-keras/raccoon_dataset/images/",
        "train_annot_folder":   "/home/e44041034/basic-yolo-keras/raccoon_dataset/annotations/",

        "train_times":          10,
        "pretrained_weights":   "",
        "batch_size":           16,
        "learning_rate":        1e-4,
        "nb_epoch":             50,
        "warmup_batches":       250,

        "object_scale":         5.0,
        "no_object_scale":      1.0,
        "coord_scale":          1.0,
        "class_scale":          1.0,

        "saved_weights_name":   "full_yolo_raccoon.h5",
        "debug":                true
    },

    "valid": {
        "valid_image_folder":   "",
        "valid_annot_folder":   "",

        "valid_times":          1
    }
}

4 coordinates of bounding box instead of 4 (xmin, xmax, ymin, ymax)

my dataset was annotated by 4 coordinates instead of 4. How can i change the code to make it works?? Thank you!

Quick Questions

Hello

Are you using Multiscale Training of Data..also You have Pretrained Weights on VOC Data ...Below is the Image of Blood Smear :-

I Want to Detect The Purple Color and Red Color Cells...I have Done the Annotations ...
I Only have 300 Images With Me with 15-20 Annotation in an Image...What do u Recommand ...

I have a few question!

I'm working on a school project about hand detection and i'm trying to use your model with this dataset:
http://www.robots.ox.ac.uk/~vgg/data/hands/. The problem is the dataset is too large and i don't know if this dataset can work with this model yet so i want to make sure that this dataset works. What i'm trying to do is to split the dataset to a number of chunks (about 50 images each) and train the model chunk by chunk to see if the prediction is better after each chunk. Is this a good approach? Because i dont have much time to play with this.
Which value of val_loss is at least enough to make a good prediction?
I see that you have used this model to make hand prediction, can you give me the pretrained weight?

Thank you!

Issues Training

Error:

Using TensorFlow backend.
Traceback (most recent call last):
File "train.py", line 132, in
main(args)
File "train.py", line 103, in main
anchors = config['model']['anchors'])
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/models.py", line 213, in init
output = Reshape((self.grid_h, self.grid_w, self.nb_box, 4 + 1 + self.nb_class))(x)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/keras/engine/topology.py", line 602, in call
output = self.call(inputs, **kwargs)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/keras/layers/core.py", line 392, in call
return K.reshape(inputs, (-1,) + target_shape)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1722, in reshape
return tf.reshape(x, shape)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2619, in reshape
name=name)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 589, in apply_op
param_name=input_name)
File "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/yoloKeras/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 60, in _SatisfiesTypeConstraint
", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: Value passed to parameter 'shape' has DataType float32 not in list of allowed values: int32, int64

Config file:
{
"model" : {
"architecture": "Tiny Yolo",
"input_size": 416,
"anchors": [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
"max_box_per_image": 1,
"labels": ["M4A1 Tank"]
},

"train": {
    "train_image_folder":   "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/train_image_folder/",
    "train_annot_folder":   "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/train_annot_folder/",

    "train_times":          10,
    "pretrained_weights":   "/home/alihusain/Documents/yoloConversion/keras/basic-yolo-keras/tiny.h5",
    "batch_size":           16,
    "learning_rate":        1e-4,
    "nb_epoch":             50,
    "warmup_batches":       100,

    "object_scale":         5.0,
    "no_object_scale":      1.0,
    "coord_scale":          1.0,
    "class_scale":          1.0,

    "debug":                false
},

"valid": {
    "valid_image_folder":   "",
    "valid_annot_folder":   "",

    "valid_times":          1
}

Any ideas?

The code in preprocessing.py line175 176

center_w = (obj['xmax'] - obj['xmin']) / (float(self.config['IMAGE_W']) / self.config['GRID_W']) # unit: grid cell center_h = (obj['ymax'] - obj['ymin']) / (float(self.config['IMAGE_W']) / self.config['GRID_W']) # unit: grid cell

Is there any effect when the image ratio is not 1:1?

loss explotion and become nan

Hi, i 'm using full yolo version and try to train the model to predict one class "person" only and i have about 300 images (not load pretrain weight!). i'm using the Adam optimizier with the default value that you use in the code. But when i train the model at 1st epoch, the loss get high very quickly and become nan. Please help!!!

Retraining the model

Can I retrain the model by loading my already trained weights? Or I should add my new images and train the entire image set again?

question about the explaination of the code

Hi,author, I have two questions for your code.
1.I have used your code on the Raccoon dataset which you provided. But as you have not mentioned much details, I have used the pretrained weights of tiny yolo and found no box detected no matter what raccoon image I found on the internet or I select from the database you provided.

2.In your model.py code, pred_box_xy = tf.sigmoid(y_pred[..., :2]) + cell_grid. In my opinion, this code just calculate the paper bx=sigma(tx)+cx, but the cell_grid is fixed matrix not some variables, and will not change with different location of the cell.I can't understand.
Thanks

Issues in training on own dataset

I've done some changes to train the model on my own data set.(4 classes). Model architecture is formed and while training epoch 1, the following error is coming:

Epoch 1/100000
Traceback (most recent call last):
File "train.py", line 137, in
main(args)
File "train.py", line 133, in main
debug = config['train']['debug'])
File "/home/bhanu/Yolo-Keras/frontend.py", line 447, in train
max_queue_size = 8)
File "/usr/local/lib/python3.5/dist-packages/Keras-2.1.1-py3.5.egg/keras/legacy/interfaces.py", line 87, in wrapper
File "/usr/local/lib/python3.5/dist-packages/Keras-2.1.1-py3.5.egg/keras/engine/training.py", line 2114, in fit_generator
File "/usr/local/lib/python3.5/dist-packages/Keras-2.1.1-py3.5.egg/keras/engine/training.py", line 1826, in train_on_batch
File "/usr/local/lib/python3.5/dist-packages/Keras-2.1.1-py3.5.egg/keras/engine/training.py", line 1411, in _standardize_user_data
File "/usr/local/lib/python3.5/dist-packages/Keras-2.1.1-py3.5.egg/keras/engine/training.py", line 153, in _standardize_input_data
ValueError: Error when checking target: expected lambda_2 to have shape (None, 11, 11, 5, 9) but got array with shape (1, 11, 11, 5, 6)

Please help me with this.

Add accuracy

@experiencor what do you think about the used accuracy within darkflow project (Yolo implementation), thtrieu/darkflow#377

Look for advice for my Photo OCR application

I am working on a Photo OCR application. The objects I am trying to detect in an image are a few text areas. I am attaching one of the sample images here. (the text areas I am trying to detect are those dollar numbers. This output will then feed into models for segmentation, and lastly character recognition). The number of text area vary from 1 to 3. I don't know what is the most appropriate choice. The original image is 640 * 480. I can crop it into 480 * 480 as most likely, the text areas won't appear in the very left or very right.

Use grid. To me, it is really a waste of effort. Because the text area can never be more than 3. However, I will still have to use something like 1 * 12 grid. If that does not work, I may have to use 12 * 12 grid. BTW, I saw the model is using 13 * 13 grid, which happens to be the size of the final layer. If that is a must, I will have to use 15 * 15 grid.
No grid. However, all the text areas will have almost the same shape. The anchor boxes seem not work for that.

Therefore, I would really appreciate if you could give me some advice as an experienced user on this model.

Thank you so much!

You are trying to load a weight file containing 44 layers into a model with 45 layers.

Cannot load full yolo pretrain weight, please help!

Low loss, but bad prediction

Hello, I'm trying to train in my own dataset, when I load a pre trained weights I get low loss function, around 0.2, but if I do not use I get something like 6 or 7. But even using the pre trained weights and trying to predict the train dataset I dont get any correct prediction. I tried disable the augmentation data for overfitting the train dataset ( for all data and just for a single image as well), but did not worked, also I tried changing the threshold
Also I modified the data generator and the parser for reading data from txt file.
the reference of this adaptation is from here: https://github.com/yhenon/keras-frcnn
Link:yolo_lib.zip

The faster RCNN looks to be more easy to train, even having a high cost per epoch, but corverge more quickly than YOLO

ps: My own dataset have around 1k images with around 7k objects.

Thank you for sharing your code, it is very easy to read and understand what you did.

Training with more than one class

I think there may be a bug in how the loss_class is determined.

I noticed that in preprocessing.py you define the y_batch two different ways:

 y_batch = np.zeros((self.config['BATCH_SIZE'], self.config['GRID_H'],
                                    self.config['GRID_W'],  self.config['BOX'], 4+1+1))
 y_batch = np.zeros((self.config['BATCH_SIZE'], self.config['GRID_H'],
                                   self.config['GRID_W'],  self.config['BOX'], 5+self.config['CLASS']))

which happens to be the same if only have a single class.

Basically I have been training this model and it doesn't learn classes at all when the data set contains more than one class.

Further edit: I trained the model on a single image (to overfit). I know it can obviously detect this image because the class loss is very low 1e-7. However when I try and predict on it, I get no boxes detected -- so something is wrong here.

EDIT: Ignore the following part, I see that tf.nn.sparse_softmax_cross_entropy_with_logits() works with a single number being given not a one-hot vector

The second part to this is that I'm assuming the label that you assign to y_batch should actually be a vector with 1s corresponding to class index and zero otherwise?

Pre-trained weights for Raccoon model

The link to the pre-trained weights for raccoon model does not work. I'm trying to train my own one class detector. It will be great help if you could make the raccoon weights available again. Thanks 😃

FileNotFoundError: [WinError 3] The system cannot find the path specified: '../logs/yolo/

Any idea how to proceed?
I receive this error when I run the following section of the code --
tb_counter = max([int(num) for num in os.listdir('../logs/yolo/')] or [0]) + 1
tensorboard = TensorBoard(log_dir='../logs/yolo/' + str(tb_counter), histogram_freq=0, write_graph=True, write_images=False)

sgd = SGD(lr=0.00001, decay=0.0005, momentum=0.9)

model.compile(loss=custom_loss, optimizer=sgd)#'adagrad')
model.fit_generator(data_gen(all_img, BATCH_SIZE),
int(len(all_img)/BATCH_SIZE),
epochs = 100,
verbose = 2,
callbacks = [early_stop, checkpoint, tensorboard],
max_q_size = 3)

predict video

How to predict video??? i'm using the old code from ipython file and python die when writing frame, please help!!

Why do you carry true_boxes during the training?

I don't understand why you carry true_boxes in the input of the model. The ground truth is already in y_true. I saw this effort impact the yolo model (backend), and the BatchGenerator, and then the loss function (custom_loss). I assume the intention is to use this information in the loss function. However, why cannot just extract that from y_true parameter instead of using self.true_boxes (I assume this self.true_boxes is constantly receiving the ground truth bounding box information during the training)?

Thank you very much for the time and help!

Unhable to load provided pretrained weights

Dear Friend,
After all thanks for sharing your work.
It works fine for me when I train from scratch but when I try to load pretrained weights withaout last layer provided here i get the error "You are trying to load a weight file containing 16 layers into a model with 2 layers"

When I try to load full model pretrained weights (raccoon example) it works fine so I can understand that the problem is that weights are trying to be loaded in the full model when they should be loaded only on model_1 layer.

Sorry but I am quite noob with python and Keras so I can not find a solution.
Thansk for your support

Are you reversing the image channels?

I am not sure I understand this correctly. But I see you have the similar lines in both training data preprocessing and prediction:

input_image = image[:, :, ::-1]

Are you reversing the RGB channels of the image(s)? If so, why is that?

Sorry to keep bothering you. Thank you so much for the help!

Bug in __overlap(self, interval_a, interval_b)

Line 35 in utils.py should be

 return min(x2,x4) - x3

instead of

 return min(x2,x4) - x1

Problem with loss

I'm training a tiny yolo with 600 pic without loading pretrain weight. The problem is at the 1st epoch, the loss is very low (about 0.07) and the model was saved. At the 2nd epoch, the loss increase to 2 and after that it slowly descrease. The problem is the model was saved at the 1st epoch and after that it can't be saved anymore be cause the loss is alway higher than it is at the 1st epoch! please help

Epoch 1/50
38/39 [============================>.] - ETA: 20s - loss: 0.7124 Epoch 00000: val_loss improved from inf to 0.07960, saving model to best_weights.h5
39/39 [==============================] - 1103s - loss: 0.6963 - val_loss: 0.0796
Epoch 2/50
38/39 [============================>.] - ETA: 20s - loss: 0.9856 Epoch 00001: val_loss did not improve
39/39 [==============================] - 1093s - loss: 1.0072 - val_loss: 1.6463
Epoch 3/50
38/39 [============================>.] - ETA: 20s - loss: 1.4972 Epoch 00002: val_loss did not improve
39/39 [==============================] - 1096s - loss: 1.4875 - val_loss: 1.5270
Epoch 4/50
38/39 [============================>.] - ETA: 20s - loss: 1.2201 Epoch 00003: val_loss did not improve
39/39 [==============================] - 1100s - loss: 1.2121 - val_loss: 1.8843
Epoch 5/50
38/39 [============================>.] - ETA: 20s - loss: 1.0720 Epoch 00004: val_loss did not improve
39/39 [==============================] - 1091s - loss: 1.0613 - val_loss: 1.4738
Epoch 6/50
38/39 [============================>.] - ETA: 20s - loss: 0.9571 Epoch 00005: val_loss did not improve
39/39 [==============================] - 1085s - loss: 0.9489 - val_loss: 1.4657
Epoch 7/50
38/39 [============================>.] - ETA: 20s - loss: 0.8815 Epoch 00006: val_loss did not improve
39/39 [==============================] - 1083s - loss: 0.8774 - val_loss: 1.4764
Epoch 8/50
38/39 [============================>.] - ETA: 20s - loss: 0.7926 Epoch 00007: val_loss did not improve
39/39 [==============================] - 1085s - loss: 0.7873 -

PS: I'm using my train set as my test set to test the model, is that ok???

Problem when detect hand palm with the model

Hi, i'm trying to use Full Yolo model to train and detect hand palm. I loaded the pretrain weight and try to test the model with 10 pic to train and 2 pic to test. After train about 17 epochs, the model can predict the box correctly but it also predict some mini box within that box with higher score than the main box. Please help. This is a result pic:

https://i.imgur.com/h1BpflZ.jpg

problems with training

Hallo,
i have a problem to get the same training results as you with the last layer reset. The SGD optimizer didnt work for me i only got results with adam.
Like in darkflow i dont understand why the mean of the loss function is mult with .5
"loss = .5 * tf.reduce_mean(loss)" i didn´t saw it in the yolo-paper.

Default Anchors

Why do you choose that anchors? Must I change them if I use other dataset?

Thanks

Loss function

I am attempting to implement a custom version of this algorithm and I can't wrap my head around the loss function, I've read the paper many times and it seems as though everyone's interpretation of this function differs. Can someone explain to me in simple terms what is occurring in this function? I see sigmoid functions and exponentials, while neither of those appear in the original paper.

MultiScale Training

Hey , I want to incorporate MultiScale training into your model. Any idea what resources/papers I should read and implement this.

Link for hand detection dataset is not correct.

train own dataset

Hi. I want to train with my own dataset.
I want to use my dataset or only a part of the VOC dataset (for example, learn only three classes of people, dog, cat).
The pre-trainning model has 20 classes and loads the tiny-yolo-voc.weights file.
But I want to train without tiny-yolo-voc.weights. What should I do?

Raccoon training

First, congratulation for this implementation.

Im try train raccoon.
I had some problem, because I use Windows and Py35, but I solved everything.
I have GTX 1060 and 32GB ram

Now, my train does not improve, I tried full_yolo and mobilenet

Epoch 1/50
199/200 [============================>.] - ETA: 0s - loss: 0.2688Epoch 00000: val_loss improved from inf to 0.09791, saving model to my_MobileNet_raccoon2.h5
200/200 [==============================] - 77s - loss: 0.2678 - val_loss: 0.0979
Epoch 2/50
199/200 [============================>.] - ETA: 0s - loss: 1.0182Epoch 00001: val_loss did not improve
200/200 [==============================] - 72s - loss: 1.0150 - val_loss: 1.9232
Epoch 3/50
199/200 [============================>.] - ETA: 0s - loss: 0.5514Epoch 00002: val_loss did not improve
200/200 [==============================] - 74s - loss: 0.5503 - val_loss: 1.1664
Epoch 4/50
199/200 [============================>.] - ETA: 0s - loss: 0.4256Epoch 00003: val_loss did not improve
200/200 [==============================] - 74s - loss: 0.4245 - val_loss: 1.5531
Epoch 5/50
199/200 [============================>.] - ETA: 0s - loss: 0.3441Epoch 00004: val_loss did not improve
200/200 [==============================] - 75s - loss: 0.3440 - val_loss: 1.4772
Epoch 00004: early stopping

My config is

{
"model" : {
"architecture": "MobileNet",
"input_size": 416,
"anchors": [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828],
"max_box_per_image": 10,
"labels": ["raccoon"]
},

"train": {
    "train_image_folder":   "E:/Datasets/raccoon_dataset/train_image_folder/",
    "train_annot_folder":   "E:/Datasets/raccoon_dataset/train_annot_folder/",     
      
    "train_times":          10,
    "pretrained_weights":   "",
    "batch_size":           8,
    "learning_rate":        1e-4,
    "nb_epoch":             50,
    "warmup_batches":       250,

    "object_scale":         5.0 ,
    "no_object_scale":      1.0,
    "coord_scale":          1.0,
    "class_scale":          1.0,

    "saved_weights_name":   "my_MobileNet_raccoon2.h5",
    "debug":                false
},

"valid": {
    "valid_image_folder":   "E:/Datasets/raccoon_dataset/valid_image_folder/",
    "valid_annot_folder":   "E:/Datasets/raccoon_dataset/valid_annot_folder/",

    "valid_times":          1
}

}

Can you help me ?
Tks

Mismatched tensor shapes

Thanks a lot for making this repo. I have been using it with my own dataset using python3. I have resolved a few version compatibility issues, but now I'm stuck on an issue which I don't think is related to the python version or library versions.

I'm trying to train YOLO on my own dataset with 54 classes, and I'm getting the error:

Traceback (most recent call last):
  File "train.py", line 135, in <module>
    _main_(args)
  File "train.py", line 131, in _main_
    debug              = config['train']['debug'])
  File "/home/aschoen/git/basic-yolo-keras/frontend.py", line 444, in train
    max_queue_size   = 8)
  File "/home/aschoen/.local/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/aschoen/.local/lib/python3.5/site-packages/keras/engine/training.py", line 2114, in fit_generator
    class_weight=class_weight)
  File "/home/aschoen/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1826, in train_on_batch
    check_batch_axis=True)
  File "/home/aschoen/.local/lib/python3.5/site-packages/keras/engine/training.py", line 1411, in _standardize_user_data
    exception_prefix='target')
  File "/home/aschoen/.local/lib/python3.5/site-packages/keras/engine/training.py", line 153, in _standardize_input_data
    str(array.shape))
ValueError: Error when checking target: expected lambda_2 to have shape (None, 13, 13, 5, 59) but got array with shape (16, 13, 13, 5, 6)

I'm getting the same error regardless of whether I use the train.py script or the Yolo Step-by-Step notebook.

I notice that the raccoon example has one category, so the last dimension of each tensor seems to correspond to number of classes + 5. So I assume this problem is arising because some part of the train function does not know the correct number of classes.

I set the LABELS variable, and so it seems that the model is getting the correct output dimension via output = Reshape((GRID_H, GRID_W, BOX, 4 + 1 + CLASS))(x), so I think the model is the correct size and it's some other operand that has the wrong size, but it's hard for me to tell where the error is coming from, so I haven't been able to tell what the other tensor is so I can fix the size.

confidence score inconsistency with original paper

This is a question related with the content of the yolov2 paper. In paper, the predicted confidence pred_c is the product of class probability and IOU:

Pr(object)*IOU(box,object) = sigmoid(pred_c)

However, in the interpret_netout function, I saw you do: Pr(object)=Pr(object)*sigmoid(pred_c), where Pr(object) uses softmax(pred_prob). While it makes sense to do so, I don't see the relationship between that and the above equation in paper.. and I don't know if this is the approach used in darknet?

Could you help me understand your approach here? Thanks!

Slow training speed

Training speed is very slow...almost 2000s per epoch on a dataset of 300 images.

I'm using the train.py because I haven't yet figured out how to get tiny-yolo working through the jupyter implementation (i'm still working on it though so I'll keep everyone updated!).

I'm almost maxed out on CPU usage...but my GPU usage is nothing. Does anyone have any suggestions on how to offload some work onto the GPU, or a working jupyter implementation of tiny yolo?

nearly all person detection results

hi, thanks for the awesome job and I have a problem when I do the process as the ipynb, the loss will nearly stop at around 5.7 and the detection on the images are nearly all persons when I lower the threshold to see more detection boxes and I cannot get the result like you show. I don't know why and do you have some ideas? Do I need train longer time? but it seems that output will be all persons.
Thanks so much~