Giter Site home page Giter Site logo

wizyoung / yolov3_tensorflow Goto Github PK

View Code? Open in Web Editor NEW
1.6K 60.0 579.0 3.16 MB

Complete YOLO v3 TensorFlow implementation. Support training on your own dataset.

License: MIT License

Python 100.00%
yolov3 tensorflow object-detection real-time tensorflow-yolo

yolov3_tensorflow's Introduction

YOLOv3_TensorFlow

NOTE: This repo is no longer maintained (actually I dropped the support for a long time) as I have switched to PyTorch for one year. Life is short, I use PyTorch.


1. Introduction

This is my implementation of YOLOv3 in pure TensorFlow. It contains the full pipeline of training and evaluation on your own dataset. The key features of this repo are:

  • Efficient tf.data pipeline
  • Weights converter (converting pretrained darknet weights on COCO dataset to TensorFlow checkpoint.)
  • Extremely fast GPU non maximum supression.
  • Full training and evaluation pipeline.
  • Kmeans algorithm to select prior anchor boxes.

2. Requirements

Python version: 2 or 3

Packages:

  • tensorflow >= 1.8.0 (theoretically any version that supports tf.data is ok)
  • opencv-python
  • tqdm

3. Weights convertion

The pretrained darknet weights file can be downloaded here. Place this weights file under directory ./data/darknet_weights/ and then run:

python convert_weight.py

Then the converted TensorFlow checkpoint file will be saved to ./data/darknet_weights/ directory.

You can also download the converted TensorFlow checkpoint file by me via [Google Drive link] or [Github Release] and then place it to the same directory.

4. Running demos

There are some demo images and videos under the ./data/demo_data/. You can run the demo by:

Single image test demo:

python test_single_image.py ./data/demo_data/messi.jpg

Video test demo:

python video_test.py ./data/demo_data/video.mp4

Some results:

Compare the kite detection results with TensorFlow's offical API result here.

(The kite detection result is under input image resolution 1344x896)

5. Inference speed

How fast is the inference speed? With images scaled to 416*416:

Backbone GPU Time(ms)
Darknet-53 (paper) Titan X 29
Darknet-53 (my impl.) Titan XP ~23

why is it so fast? Check the ImageNet classification result comparision from the paper:

6. Model architecture

For better understanding of the model architecture, you can refer to the following picture. With great thanks to Levio for your excellent work!

7. Training

7.1 Data preparation

(1) annotation file

Generate train.txt/val.txt/test.txt files under ./data/my_data/ directory. One line for one image, in the format like image_index image_absolute_path img_width img_height box_1 box_2 ... box_n. Box_x format: label_index x_min y_min x_max y_max. (The origin of coordinates is at the left top corner, left top => (xmin, ymin), right bottom => (xmax, ymax).) image_index is the line index which starts from zero. label_index is in range [0, class_num - 1].

For example:

0 xxx/xxx/a.jpg 1920 1080 0 453 369 473 391 1 588 245 608 268
1 xxx/xxx/b.jpg 1920 1080 1 466 403 485 422 2 793 300 809 320
...

Since so many users report to use tools like LabelImg to generate xml format annotations, I add one demo script on VOC dataset to do the convertion. Check the misc/parse_voc_xml.py file for more details.

(2) class_names file:

Generate the data.names file under ./data/my_data/ directory. Each line represents a class name.

For example:

bird
person
bike
...

The COCO dataset class names file is placed at ./data/coco.names.

(3) prior anchor file:

Using the kmeans algorithm to get the prior anchors:

python get_kmeans.py

Then you will get 9 anchors and the average IoU. Save the anchors to a txt file.

The COCO dataset anchors offered by YOLO's author is placed at ./data/yolo_anchors.txt, you can use that one too.

The yolo anchors computed by the kmeans script is on the resized image scale. The default resize method is the letterbox resize, i.e., keep the original aspect ratio in the resized image.

7.2 Training

Using train.py. The hyper-parameters and the corresponding annotations can be found in args.py:

CUDA_VISIBLE_DEVICES=GPU_ID python train.py

Check the args.py for more details. You should set the parameters yourself in your own specific task.

8. Evaluation

Using eval.py to evaluate the validation or test dataset. The parameters are as following:

$ python eval.py -h
usage: eval.py [-h] [--eval_file EVAL_FILE] 
               [--restore_path RESTORE_PATH]
               [--anchor_path ANCHOR_PATH] 
               [--class_name_path CLASS_NAME_PATH]
               [--batch_size BATCH_SIZE]
               [--img_size [IMG_SIZE [IMG_SIZE ...]]]
               [--num_threads NUM_THREADS]
               [--prefetech_buffer PREFETECH_BUFFER]
               [--nms_threshold NMS_THRESHOLD]
               [--score_threshold SCORE_THRESHOLD] 
               [--nms_topk NMS_TOPK]

Check the eval.py for more details. You should set the parameters yourself.

You will get the loss, recall, precision, average precision and mAP metrics results.

For higher mAP, you should set score_threshold to a small number.

9. Some tricks

Here are some training tricks in my experiment:

(1) Apply the two-stage training strategy or the one-stage training strategy:

Two-stage training:

First stage: Restore darknet53_body part weights from COCO checkpoints, train the yolov3_head with big learning rate like 1e-3 until the loss reaches to a low level.

Second stage: Restore the weights from the first stage, then train the whole model with small learning rate like 1e-4 or smaller. At this stage remember to restore the optimizer parameters if you use optimizers like adam.

One-stage training:

Just restore the whole weight file except the last three convolution layers (Conv_6, Conv_14, Conv_22). In this condition, be careful about the possible nan loss value.

(2) I've included many useful training strategies in args.py:

  • Cosine decay of lr (SGDR)
  • Multi-scale training
  • Label smoothing
  • Mix up data augmentation
  • Focal loss

These are all good strategies but it does not mean they will definitely improve the performance. You should choose the appropriate strategies for your own task.

This paper from gluon-cv has proved that data augmentation is critical to YOLO v3, which is completely in consistent with my own experiments. Some data augmentation strategies that seems reasonable may lead to poor performance. For example, after introducing random color jittering, the mAP on my own dataset drops heavily. Thus I hope you pay extra attention to the data augmentation.

(4) Loss nan? Setting a bigger warm_up_epoch number or smaller learning rate and try several more times. If you fine-tune the whole model, using adam may cause nan value sometimes. You can try choosing momentum optimizer.

10. Fine-tune on VOC dataset

I did a quick train on the VOC dataset. The params I used in my experiments are included under misc/experiments_on_voc/ folder for your reference. The train dataset is the VOC 2007 + 2012 trainval set, and the test dataset is the VOC 2007 test set.

Finally with the 416*416 input image, I got a 87.54% test mAP (not using the 07 metric). No hard-try fine-tuning. You should get the similar or better results.

My pretrained weights on VOC dataset can be downloaded here.

11. TODO

[ ] Multi-GPUs with sync batch norm.

[ ] Maybe tf 2.0 ?


Credits:

I referred to many fantastic repos during the implementation:

YunYang1994/tensorflow-yolov3

qqwweee/keras-yolo3

eriklindernoren/PyTorch-YOLOv3

pjreddie/darknet

dmlc/gluon-cv

yolov3_tensorflow's People

Contributors

matthew-jack avatar wizyoung avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

yolov3_tensorflow's Issues

How to Retrain Model?

I've already change the restore_path and set it to the right path, but it didn't work. Would anyone help me out? (I want to change the lr.)
微信截图_20190411101533

OutOfRangeError (see above for traceback): End of sequence

Training error

I get a training error as shown below
screenshot from 2019-01-27 11-47-15

I have no negative samples in my train.txt and i left a blank line at the end of both train.txt and val.txt
test.txt
train.txt

What can be the problem for this error?

About training

Hello, How to understand the following sentence :
First stage: Restore darknet53_body part weights from COCO checkpoints, train the yolov3_head with big learning rate like 1e-3 until the loss reaches to a low level, like less than 1.
Second stage: Restore the weights from the first stage, then train the whole model with small learning rate like 1e-4 or smaller. At this stage remember to restore the optimizer parameters if you use optimizers like adam.

speed question

Hello, your project is doing very well. Thank you very much. However, there was a problem with my speed during the test. I trained my own training set with only one category.When testing a picture, I need 2.4s, which is much worse than your test time. I also tried to put model and nms in front of the session in test_single_image.py, but it still has no effect. what should I do?(By the way, my tensorflow version is 1.11.0 ,with the cuda 9.0 and cudnn 7.)

problems

Traceback (most recent call last):
Epoch: 1, global_step: 200 | loss: total: nan, xy: 11.83, wh: 1146.74, conf: nan, class: nan | Last batch: rec: 0.000, prec: 0.000 | lr: 0.0004329
File "E:/net/new/YOLOv3_TensorFlow-master/train.py", line 160, in
'Gradient exploded! Please train again and you may need modify some parameters.')


ArithmeticError: Gradient exploded! Please train again and you may need modify some parameters.

Performance on VOC 2007 test set

Does anyone try to train on VOC 2007+2012 trainval?

I got only 0.742 mAP on VOC 2007 test

recall: 0.703, precision: 0.793, total_loss: 0.440, loss_xy: 0.017, loss_wh: 0.007, loss_conf: 0.376, loss_class: 0.040

I apply the 2-stage training:

  1. I load the darknet weight first and only train the yolov3_head.
  2. When the loss < 2, I lower the learning rate and train the whole model

about convert_weights

hello wizyoung, when i run the python convert_weights.py , it happend following errors:
Traceback (most recent call last): File "convert_weight.py", line 30, File"D:\ocr\YOLOv3_TensorFlow\utils\misc_utils.py", line 101, in load_weights (shape[3], shape[2], shape[0], shape[1]))
ValueError: cannot reshape array of size 285787 into shape (256,128,3,3)
what should i do ? thx

Help!

Train. Py file running error, I hope you can help, thank you!

File "D:\Tensorflow yolov3\YOLOv3_TensorFlow-master\utils\data_utils.py", line 142, in parse_data
img, boxes = resize_image_and_correct_boxes(img, boxes, img_size)

File "D:\Tensorflow yolov3\YOLOv3_TensorFlow-master\utils\data_utils.py", line 53, in resize_image_and_correct_boxes
boxes[:, 0] = boxes[:, 0] / ori_width * new_width

IndexError: too many indices for array

自己的训练数据集问题

笔者,您好!希望您在百忙之中可以看到我的请教,希望您可以赐教。万分感谢!
请问您写的这个yolov3版本的新数据集的制作是如何制作的呢?是不是和原版yolov3的不太一样呢?
请问在您示例当中写的xxx/xxx/1.jpg 0 453 369 473 391 1 588 245 608 268中数字的顺序是:类别,方框左上角x坐标,方框左上角y坐标,方框右下角x坐标,方框右下角y坐标的书序吗?
关于自己数据集的制作请问您有相关文章链接吗?大致步骤是怎样的呢?
谢谢

Question about 5dims tensor

楼主好,请问一下为什么yolov3要用reshape将四维tensor转换成5维,是因为这样更适配于GPU还是其他的原因?

Problem of 'y_true' in data_utils.py

When I train my own data by your model , I meet this problem as flow, and I wolud give your my data , thank you!!!

train.txt

File "D:\Project_Data\YOLOv3_TensorFlow\utils\data_utils.py", line 122, in process_box
y_true[feature_map_group][y, x, k, 5+c] = 1.

IndexError: index 10 is out of bounds for axis 3 with size 10

about freeze graph and convert TFlite problem

Hi, thank you so much for sharing your code! I have some questions and hope you could help me.I try to freeze graph and convert *.pb to TFlite model, so that can transport to Android mobile phones.This is my source code as follow:

But,I get a unsupported operation error as shown below:

b'2019-01-29 16:12:22.550452: I tensorflow/contrib/lite/toco/import_tensorflow.cc:1080] Converting unsupported operation: ResizeNearestNeighbor\n

Yes, about TFlite unsupported operation error,Could you help how to modify the yolov3 source code to support operation?

当我运行时遇到了如下报错,请给点提示。

Traceback (most recent call last):
File "test_single_image.py", line 52, in
saver.restore(sess, args.restore_path)
File "C:\Program Files\Python35\lib\site-packages\tensorflow\python\training\saver.py", line 1538, in restore
+ compat.as_text(save_path))
ValueError: The passed save_path is not a valid checkpoint: ./data/darknet_weights/yolov3.ckpt

a problem about type of txt

2019-03-06 22:22:51.632340: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1306] Invalid argument: TypeError: a bytes-like object is required, not 'str'
Traceback (most recent call last):

File "C:\Users\Xiaos\Anaconda3\lib\site-packages\tensorflow\python\ops\script_ops.py", line 158, in call
ret = func(*args)

File "F:\DL\YOLOv3_TensorFlow\utils\data_utils.py", line 130, in parse_data
pic_path, boxes, labels = parse_line(line)

File "F:\DL\YOLOv3_TensorFlow\utils\data_utils.py", line 19, in parse_line
s = line.strip().split(' ')

TypeError: a bytes-like object is required, not 'str'

what does the error mean? could someone tell me how to construt the train.txt? I use the coco dataset, the txt file is the following
F:/PascalVOCdataset/VOCdevkit/VOC2007/JPEGImages/003949.jpg 11 210 151 245 193

any people know where is the error?

How to reproduce Neural Networks

笔者你好,能系统介绍一些复现神经网络的技巧吗,看你复现的这份代码写得很漂亮,我自学这方面貌似很困难,常常遇到要写一个功能时却不知道怎样去寻找相关需要调用的函数有没有、在哪儿、叫什么名字,你是如何解决这些问题的呢,不能先把各个库的API先全部记住吧,感谢回复

Question about the prepared data

Should i transform the origin images' bx,by,bw,bh ?
e.g. if my image is 213213 bx =1 by = 1, i should transform the image to 416416 and replace bx = 2 . and put the transformed image's path and bx,by,bh,bw in data.txt ?

I don't see the Mutil-scale training

hi, I don't see the Mutil-scale training.If I go from [416, 416] scale to [512, 512] scale model, will I reset model after saving?
By loading the trained model of [416, 416], the img_size is set to [512, 512] for retraining. Is it possible to do multi-scale training?
Because I understand that multi-scale training has been set up several scales [320, 416, 512, 640], in the training process each batch randomly selected a size.
Thank you in advance for your assistance。@wizyoung

Error regarding 'exception'

Hello

I tried to train my data, but I received the below error:
[[{{node PyFunc}}]]
[[{{node ITeratorGetNext}}]]

'During handling of the above exception, another exception occurred:'

[[{{node PyFunc}}]]
[[node ITeratorGetNext (defined at train.py: 60) ]]

and line 60 is:
image_ids, image, y_true_13, y_true_26, y_true_52 = iterator.get_next()

My system has no GPU and I am running the program on CPU. How can I solve it?

Regards

a problem about kmeans algorithm to get the prior anchors

您好,博主,我细看了源码,有个地方很不懂!其中get_kmeans.py的kmean聚类prior anchors时,所使用的数据[W,H]是没有scale到[0,1],但在训练时scale了anchor_w = anchor_w / W * 416, anchor_h = anchor_g / H * 416,我想问的是将样本数据[W,H]宽高scale到[0,1],再kmean聚类会不会更好?望回复~,谢谢~

About precision and recall

I have a question about the precision and recall.I have trained model on voc2007, and the precision and recall can get 90%+ during training, but when I run on the eval dataset, the precision is 80%, more Important is that the recall is only 30%!!!!!! , what can i do to improve the precision especially recall!!!

leaky_relu or relu

作者你好,看原版yolov3的cfg文件,激活层用的是leaky,而这里用的是默认的relu,这个没有影响吗?

issues on indexError

博主你好,我尝试使用poscal voc2007数据进行训练,出现一下错误,请帮忙看下。谢谢

train.txt, val.txt, 已经按照数据格式替换成如下的格式:
/.../JPEGImages/002611.jpg 14 1 397 1 351 11 152 337 131 298
/.../JPEGImages/002332.jpg 14 121 254 39 231 14 337 494 61 182 14 101 500 81 362
/.../JPEGImages/005445.jpg 7 32 353 233 497
之前bbox中坐标使用相对于width/height的scale值,可以进行训练,到了3个itre后就全部是NAN。
估计是值太小,出现以下报错,仅仅是更换了bbox中的xmin,ymin,xmax,ymax的值的大小。有时间的话帮忙看下哈

.names文件替换成了voc的20个类,

报错信息:
File "/home/jiapy/workspace/yolo-v3/YOLOv3_TensorFlow/utils/data_utils.py", line 115, in process_box
y_true[feature_map_group][y, x, k, 0:2] = box_centers[i]
IndexError: index 59 is out of bounds for axis 0 with size 52

出现这个错误

tensorflow.python.framework.errors_impl.InvalidArgumentError: 0-th value returned by pyfunc_0 is int32, but expects int64
出现这个错误怎么办啊 大佬

loss nan

笔者你好,我用您的模型训练4类,人 ,车,交通灯,自行车.
我重新实现了您的代码.网络也能跑起来! 在此十分感谢

前期跑很正常.loss都下降.但是不知何时我的loss就变NaN了.请问这是什么问题呢?

我没有做自己的数据增强.
谢谢啦.

Assign requires shapes of both tensors to match error

I have trained a model successfully, but when i use test_single_image.py to test it showed error as follow, I can't find the reason, can anyone help me? thanks.

InvalidArgumentError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a mismatch between the current graph and the graph from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Assign requires shapes of both tensors to match. lhs shape= [1,1,1024,21] rhs shape= [1,1,1024,255]
[[Node: save/Assign_350 = Assign[T=DT_FLOAT, _class=["loc:@yolov3/yolov3_head/Conv_6/weights"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](yolov3/yolov3_head/Conv_6/weights, save/RestoreV2/_701)]]
[[Node: save/RestoreV2/_372 = _SendT=DT_FLOAT, client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_378_save/RestoreV2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Training issue

I am training with custom data which looks like this.
img size is 720*1280
my training data looks like this

image
Training is not starting and
i am getting an error like this

OutOfRangeError: End of sequence
[[node IteratorGetNext (defined at C:/Users/madhu/Desktop/YOLOv3_TensorFlow-master/train.py:160) = IteratorGetNextoutput_shapes=[, , , ], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[{{node IteratorGetNext/_535}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_671_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

please help me resolve this issue

Few questions

  1. If i want to export output graph and use the graph in C++ api, do i need to change the name of the input placeholder? Currenly 'image' has no name. Only phase_train gets to decide whether to get images from data generator

  2. How to train with negative samples ( image with no annotations to reduce false positives)

  3. Is it possible to get original image size before image is reshaped with 'img_size" ??

Thank you in advance :)

Question about resizing the anchor

I noticed that you mention in README.md that

NOTE: The yolo anchors should be scaled to the rescaled new image size. Suppose your image size is [W, H], and the image will be rescale to 416*416 as input, for each generated anchor [anchor_w, anchor_h], you should apply the transformation anchor_w = anchor_w / W * 416, anchor_h = anchor_g / H * 416.

however, the size of the images in my dataset are varied, do I need to calculate the mean H and W of my images and apply the above transformation?

training error, just cant fix it

i got an error when i try to train my own dataset.

捕获

is that because i work it on windows?
i found someone saying it working correctly on Linux when i google this error.
could you please help me?

Run test_single_image.py only get FPS 10

Hi wizyoung:

Thank you for providing this awesome repo!

Based on my test, it takes around 100s to run boxes_, scores_, labels_ = sess.run([boxes, scores, labels], feed_dict={input_data: img}) 1000 times with a 416x416 image. This speed is nearly 5 times slower than 23ms you claimed. I didn't change anything on your test_single_image.py file.

Could you help me a little bit with this?

Thank you!

question about convert .weight to .ckpt or .pb

thanks for your great job,I need convert .ckpt.meta or .pb to .dlc file
but i just have .weight,
use your convert_weight.py ,it can success convert to .ckpt
but i can't convert to .dlc, it just need some node,this is the node,
node
why it different to link
can you give me some advise,thank you so much.

data_utils.py

data_utils.py文件中生成y_true的时候

for i, idx in enumerate(best_match_idx):

idx: 0,1,2 ==> 2; 3,4,5 ==> 1; 6,7,8 ==> 2

    feature_map_group = 2 - idx // 3

这里是不是有问题,按照anchors_mask = [[6,7,8], [3,4,5], [0,1,2]],
idx对应关系应该是 0,1,2 ==> 2; 3,4,5 ==> 1; 6,7,8 ==> 0,
与feature_map_group的计算公式对应不上。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.