yunyang1994 / tensorflow-yolov3 Goto Github PK

View Code? Open in Web Editor NEW

3.6K 89.0 1.4K 62.87 MB

🔥 TensorFlow Code for technical report: "YOLOv3: An Incremental Improvement"

Home Page: https://yunyang1994.gitee.io/2018/12/28/YOLOv3-算法的一点理解/

License: MIT License

Python 100.00%

yolov3 tensorflow object-detection deep-learning

tensorflow-yolov3's People

Contributors

Stargazers

Watchers

Forkers

collector-m csq20081052 wonderingboy chinoisxu aurora11111 andrewhuman hojjatabdollahi ycwu133 tantao258 xjohnxjohn fendaq machinelp happog lzc1994 cherryxuan ardianumam shawnxsw hundred06 barbecacov jacksu20160407 wishgale dreadlord1984 ouya-bytes wanjinchang sailinglqh angleboy8 irentang xcls1117 fabro66 0fanren0 ligo hizhaoyuan lwpyh asi-sx mbyase amitabhama dorniwang sundawei aust-hansen goodtalen chenyyx lryanx merlotq adienly benjamesbabala jdc08161063 qiaomai89 zuovision berlala gavin666github jsmilemsj devin-d-u gothicfox sayan-paul ailib tlwzzy kickers18 wuxiaolianggit zssasa jangocheng zhangwenwen bytetwo gm19900510 charpoint bruinxiong zhoushuqiang phoebe-star sumenpuyuan changya1990 xincxiong wuyuanfei xiaoxiasun sunzhuojun 18ct lovepan1 mikolodz rfdickerson yang-fei zhukkang shadowkun smhuang426 wanglifucv jhx646018057 maysthree yingmuying zavierhan wangyazhao001 lichunxiang93 zhulei2016 liys0558 zhuleiguang1992 kepengxu zkk0911 shifengcpp qiaosibo lucifer2288 echizen1605 miss-peret jerrycatleung tylerlu

tensorflow-yolov3's Issues

I implement a new one. Also support training~

Hey, guys, I implement a new version of yolov3. It supports training and the experiment result is fantastic on my own dataset. The repo is here: https://github.com/wizyoung/YOLOv3_TensorFlow

Updating moving_variance and moving_mean

Hello, in your code, I didn't find moving_variance and moving_mean of batch norm layer is updated during training phase. Is it a mistake, or you do it on purpose?

ValueError about video_demo

Hi ! i'm facing the issue when i run video_demo.py ,i'm looking forward you reply, Thanks.

VIDEOIO ERROR: V4L: can't open camera by index 0
Traceback (most recent call last):
File "video_demo.py", line 37, in
raise ValueError("No image!")
ValueError: No image!

A problem about train.py

There is a problem when I run train.py
$python train.py
File "train.py", line 40
images,*y_true = example
^
SyntaxError: invalid syntax

After a couple epochs, I want to save the graph to a .pb file and use the nms_demo.py to try out the network.
I tried using utils.freeze_graph, but I don't know what tensors should I give it to save and what tensors to use to get the output.
In convert_weight.py you are using utils.freeze_graph(sess, './checkpoint/yolov3_gpu_nms.pb', ["concat_10", "concat_11", "concat_12"]), but those tensors do not exist in the new graph.
and nsm_demo.py needs to know what are the input and ouput tensors:
input_tensor, output_tensors = utils.read_pb_return_tensors(gpu_nms_graph, "./checkpoint/yolov3_gpu_nms.pb", ["Placeholder:0", "concat_10:0", "concat_11:0", "concat_12:0"])

What do you suggest?

Batches bigger than 1

With the current setup, we can not use batches bigger than 1. It says the examples in the tfrecord have different shapes. I resized the images and corrected the bboxes (for the new image size) and saved it into the tfrecords file. But then it said that the bboxes have different sizes (some pictures have more boxes than others).
So, I saved the y_true (for all different sizes 13x13, 26x26, 52x52) in the tfrecords file. This increased the size of the tfrecords file significantly. But my system runs out of memory if I choose batches bigger than 4.

What do you suggest I use to make all the input samples uniform, so that we can use bigger batches?

Do you have any plans to change the batch size?

cpu and Gpu

关于精度丢失

用darknet训练的图片，然后用darknet测试，效果还不错，但是把.weights权重文件转化为.pb然后永nms_demo.py测试发现效果不如darknet的，不知道这是什么问题o.o

Train on your own dataset/quick_train_data.py

tensorflow 1.12

3.1 quick train

python quick_train.py

Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.OutOfRangeError: End of sequence
[[{{node IteratorGetNext}} = IteratorGetNextoutput_shapes=[[?,416,416,3], , , ], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
[[{{node IteratorGetNext/_735}} = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_382_IteratorGetNext", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

problems on load the pretrained_weights

utils.load_weights(...., ) make the loss "NAN", do you have this issue?

A typo in quick_test.py.

Hi, I think there is a typo in script 'quick_test.py':
img = PIL.Image.open(path)
img.resize(size = (IMAGE_H, IMAGE_W))

which should be:
img.resize(size = (IMAGE_W, IMAGE_H))

Error in calc iou ?

Hi, thanks for your wonderful work!
I saw code in loss_layer()

    # caculate iou between true boxes and pred boxes
    intersect_xy1 = tf.maximum(true_box_xy - true_box_wh / 2.0,
                               pred_box_xy - pred_box_xy / 2.0)
    intersect_xy2 = tf.maximum(true_box_xy + true_box_wh / 2.0,
                               pred_box_xy + pred_box_wh / 2.0)
    intersect_wh = tf.maximum(intersect_xy2 - intersect_xy1, 0.)

Is there something wrong? I think "intersect_xy2 = tf.minimum(true_box_xy + true_box_wh / 2.0,
pred_box_xy + pred_box_wh / 2.0) " may be right.

can it support multi-GPU?

The FPS is higher than others, I expect it can support multi-GPU.

The speed in gpu and in cpu

Hi, I run the demo code and it seems that the speed in cpu graph is faster than that in gpu graph.
Is it normal? @YunYang1994

File ".../tensorflow-yolov3/core/utils.py", line 379. ndexError: index 58 is out of bounds for axis 3 with size 25

File ".../tensorflow-yolov3/core/utils.py", line 379, in preprocess_true_boxes
y_true[l][i, j, k, 5+c] = 1

IndexError: index 58 is out of bounds for axis 3 with size 25

感觉loss函数写错了？

Used RAM memory gradually increases killing the process

Hi,

I try to train, and the used RAM memory gradually increases until it kills the process due to out of RAM memory. I test it tf 1.12 and 1.10. Thanks.

evaluating the trained model performance based on ckpt file

Hi,

thanks for sharing your code, which helps a lot.

But there is a problem that when we run the train.py, three ckpt files are saved. But how can we run these model files to test performance?

I have tried two ways: **one is to use the test.py,** but it says

Traceback (most recent call last):
File "test.py", line 28, in
saver = tf.train.Saver()
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1293, in init
self.build()
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1302, in build
self._build(self._filename, build_save=True, build_restore=True)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/training/saver.py", line 1327, in _build
raise ValueError("No variables to save")
ValueError: No variables to save

Another way I tried convert_weight.py --ckpt_file file --freeze, but it says
Traceback (most recent call last):
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: Assign requires shapes of both tensors to match. lhs shape= [255] rhs shape= [33]
[[Node: save/Assign_349 = Assign[T=DT_FLOAT, _class=["loc:@yolov3/yolo-v3/Conv_6/biases"], use_locking=true, validate_shape=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](yolov3/yolo-v3/Conv_6/biases, save/RestoreV2/_149)]]

Could you help me with this problem? Thanks a lot!

Incorrect in assigning box GT (?!), and suggest the new one

Hi,

First, thanks a lot for making this project.
I spot a potential issue in assigning box GT in this code (line 99 and 100), rewritten as:

i = np.floor(gt_boxes[t,0]/self.image_w*grid_sizes[l][1]).astype('int32')
j = np.floor(gt_boxes[t,1]/self.image_h*grid_sizes[l][0]).astype('int32')

Using code above, the label assigment may encounter wrong assignment illustrated in this attachment pic, where the label should be in the grid of (1,1), but code above puts the label in grid (0,0) due to floor operation. So, I suggest to use round operation instead. I know, the network can still learn via the training data, but I think, using round operation, in this case, will be more consistent and makes the network easier to learn. What do you think?

Thanks.

Value Error

init.py in core

I think there should be an __init__.py in the core dir. I was getting an import error for yolov3.py and utils.py before adding it when running convert_weight.py

I'm on windows.

关于kmeans选框的问题

其实我存在一个疑问？就是这个使用kmeans选框是体现在哪里的，quick_train.py似乎没涉及到？在下初学，有好多存疑的地方，希望您能解答

correcting the bboxes.

In utils.py in the function resize_image_correct_bbox, you are getting the image size by
image_size = tf.to_float(tf.shape(image)[1:3])[::-1]
but I think it should be [0:2] instead of [1:3]
image_size = tf.to_float(tf.shape(image)[0:2])[::-1]

have some mistake in the loss computation.

when compute loss_class:
loss_class = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true_i[..., 5:], logits=pred_boxes_class)

the pred_boxes_class have computed by sigmoid function, I think here can’t use tf.nn.sigmoid_cross_entropy_with_logits

FileNotFoundError: [Errno 2] No such file or directory: '../yolo_weghts/darknet53.conv.74'

File "train.py", line 69, in
weights_file="../yolo_weghts/darknet53.conv.74")
File "/media/cjw/data/AI_allin/model/computerVision/tensorflow-yolov3/core/utils.py", line 258, in load_weights
with open(weights_file, "rb") as fp:
FileNotFoundError: [Errno 2] No such file or directory: '../yolo_weghts/darknet53.conv.74'

您好，怎么设置动态学习率

开始的时候选择较大的学习率，然后逐步减少学习率，提高训练速度。谢谢。

Changing network input size

Hi @YunYang1994

I am curious as to why we cannot easily change the network input size of YOLOv3 in a TensorFlow implementation. In the original implementation by pjreddie and alexey, the network input size can be changed in an instant to compensate for accuracy or speed. Is this because TensorFlow is graph-based? Thank you!

Training issue shows out of range

Hi,

I run the train.py in the latest package to train the Yolov3 , but it shows
loss_class += result[3]
IndexError: tuple index out of range

In quick_train.py, I can run the process success and show the result 1.0 and 0.0.

Could you help to check the issue in train.py?
Thanks a lot for your help! =)

cpu and gpu

你好：

我想问一下checkpoint文件夹下面生成的pb文件中有三个cpu-gpu和feature三个版本，你在不同的地方用了不同的版本，有什么区别吗。还有nms是什么意思。
非诚感谢你。

我在准备tiny-yolo,为什么darknet的权重文件，BN层的参数在前面，卷积层的参数在后面？darknet的权重文件毫无结构可言，就是一个列表，请问这个列表里的权重顺序是什么？

tensorflow－yolov3中文交流

hello大家好，我是该仓库的作者。鉴于我复现tensorflow-yolov3踩了太多坑，特此发个贴，帮助大家少走弯路。大家有问题可以在下面留言。

Can you give out the loss function of YOLOv3？

Reading the code about loss_layer() without loss function is just hard to understand.
Thanks a lot ! @YunYang1994

image process for training should not div 255？, I didn't find this operation in your code.

@

Have you solved it? i also meet same problem

about dataset.py

when I trained the pascal_voc2012, I got this error in dataset.py
line 107, in preprocess_true_boxes
y_true[l][j, i, k, 5+c] = 1.

IndexError: index 425 is out of bounds for axis 3 with size 25
Why I can got this error?
If anyone has a solution ,please tell me

"Nan" always appears during training!

"Nan" always appears during training.

=> EPOCH: 132 loss_xy: 1.5719 loss_wh: 2.2539 loss_conf:189.6436 loss_class: 3.0253 total_loss:196.4947 rec_50:0.00 prec_50:0.00
=> EPOCH: 133 loss_xy: 1.6151 loss_wh: 2.1801 loss_conf:194.6695 loss_class: 2.7355 total_loss:201.2003 rec_50:0.00 prec_50:0.00
=> EPOCH: 134 loss_xy: 1.0573 loss_wh: 1.9318 loss_conf:184.6295 loss_class: 1.5075 total_loss:189.1261 rec_50:0.00 prec_50:0.00
=> EPOCH: 135 loss_xy: 1.2824 loss_wh: 1.9386 loss_conf:178.7914 loss_class: 1.0579 total_loss:183.0704 rec_50:0.00 prec_50:0.00
=> EPOCH: 136 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 137 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 138 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 139 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 140 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 141 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.00
=> EPOCH: 142 loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan total_loss: nan rec_50:0.00 prec_50:0.0

================================================================
Below is my configuration information：

IMAGE_H, IMAGE_W = 416, 416
BATCH_SIZE = 8
EPOCHS = 2000*1000
LR = 0.0005 #0.0005
SHUFFLE_SIZE = 1000

Type Error

Hi! I am facing an error when i run python nms_demo.py look forward to your reply ,
thankyou
File "E:/tensorflow-yolov3-master/nms_demo.py", line 48, in
image = utils.draw_boxes(img, boxes, scores, labels, classes, SIZE,show=True)

File "E:\tensorflow-yolov3-master\core\utils.py", line 190, in draw_boxes
draw.rectangle(bbox, outline=colors[labels[i]], width=3)

TypeError: rectangle() got an unexpected keyword argument 'width'
![qq 20181214122643](https://user-images.githubusercontent.com/37647733/49983334-1316a680-ff9d-11e8-81c8-bc8b3bcd12a9.png

show_input_image ---- error

OutOfRangeError: End of sequence
[[{{node cond/IteratorGetNext_1}} = IteratorGetNextoutput_shapes=[[?,416,416,3], [?,?,5]], output_types=[DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'cond/IteratorGetNext_1', defined at:
............................................

When I run the show_input_image.py,it occurs thr problem above. Please give a analyse.

Is Tiny Yolov3 supported ?

Hi, have you a plan to support tiny yolo v3 soon ?

I think It could be really helpful

Video_demo中如果用yolov3_gpu_nms.pb,draw_boxes传入的参数类型是Tensor，会报错，要在前面加上boxes = sess.run(boxes) scores = sess.run(scores) labels = sess.run(labels)，但是我发现这样很慢，我用的1080Ti，把NMS加入到pb中比较合适

Error in file

Hi, thanks for your this repository, I'm facing an issue when I run $ python convert_weight.py --convert --freeze I'm getting this error.
File "/home/ahsan/YOLOtf/core/yolov3.py", line 125, in get_boxes_confs_scores box_centers = box_centers * stride TypeError: can't multiply sequence by non-int of type 'Tensor'

I'm using Tensorflow-gpu 1.5.0

Comments in English

It would be more helpful if you translate your comments to English. This way I can also help with adding more features, such as training.

traing error

=> EPOCH: 341 loss_xy: 1.6257 loss_wh: 4.0940 loss_conf:4570.6030 loss_class:14.2002
=> EPOCH: 342 loss_xy: 1.5741 loss_wh: 2.6803 loss_conf:4656.8652 loss_class:11.8622
=> EPOCH: 343 loss_xy: 1.3638 loss_wh: 2.3212 loss_conf:4657.7090 loss_class:11.4552
2019-01-22 16:34:48.172989: W tensorflow/core/framework/op_kernel.cc:1261] Unknown: IndexError: index 13 is out of bounds for axis 0 with size 13
Traceback (most recent call last):

File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/script_ops.py", line 206, in call
ret = func(*args)

File "........../tensorflow-yolov3/core/dataset.py", line 105, in preprocess_true_boxes
y_true[l][j, i, k, 0:4] = gt_boxes[t, 0:4]

IndexError: index 13 is out of bounds for axis 0 with size 13

what's wrong

Training in quick_train_data.py & train.py

Hi,
I have question as running in quick_train_data.py, but there is nothing happened.

Further, when I run the train.py,

there is no saved tfrecord file when running the dataset.py. It is not in ./model_data, but it is doing the process as shown
print('Processed {} of {} images'.format(index + 1, len(image_data)))
when I have the tfrecord, the following message is shown
OutOfRangeError (see above for traceback): End of sequence
[[{{node PyFunc}} = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT], token="pyfunc_0"]]
[[node IteratorGetNext (defined at train.py:27) = IteratorGetNextoutput_shapes=[[?,?,?,3], [?,?,5], , , ], output_types=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Could you please tell me how to train your work? Thank you.

W tensorflow/core/framework/op_kernel.cc:1192] Unknown: IndexError: index 57 is out of bounds for axis 1 with size 52

When I trained my own data according to your steps, the program ran incorrectly, as follows:
File "/home/lgl/PycharmProjects/tensorflow-yolov3-New/quick_train.py", line 76, in
run_items = sess.run([train_op, write_op, y_pred, y_true] + loss, feed_dict={is_training:True})
File "/home/lgl/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/home/lgl/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/home/lgl/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/home/lgl/.conda/envs/tensorflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: IndexError: index 57 is out of bounds for axis 1 with size 52.
and I have change the anchors, The error changed into UnknownError: IndexError: index ** is out of bounds for axis 1 with size **. !

tuple index out of range

File "train.py", line 44, in
loss = model.compute_loss(y_pred, y_true)
File ".../tensorflow-yolov3/core/yolov3.py", line 269, in compute_loss
loss_class += result[3]
IndexError: tuple index out of range

c++ api

Has anyone exported output graph and used it with c++ api?

你的gpu_nms那里有点问题，tf.image.non_max_suppression 要求的box形式是[ymin,xmin,ymax,xmax]，你给的box是[xmin,ymin,xmax,ymax]

How to extract label directly from .pb

Hi,
This line

tensorflow-yolov3/video_demo.py

Line 27 in 60c39ee

    
           input_tensor, output_tensors = utils.read_pb_return_tensors(tf.get_default_graph(),

show that the input_tensor="Placeholder:0", output_tensor="concat_9:0", "mul_6:0" for boxes and scores.How can I extract labels directly form graph by tensor_name?
as I just wonna detect person, after getting the label, to do nms just for those bboxes predicted person label. this maybe improve FPS.

invalid syntax

images, *y_true = example
^
SyntaxError: invalid syntax

is there a way to fix this without using python3?
im currently using python2 and i don't want to install tensorflow again