Hi, <div class="snippet-clipboard-content notranslate position-relative overflow-a

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

evaluating the trained model performance based on ckpt file about tensorflow-yolov3 HOT 14 OPEN

yunyang1994 commented on May 18, 2024

evaluating the trained model performance based on ckpt file

from tensorflow-yolov3.

Comments (14)

qiaomai89 commented on May 18, 2024

@YunYang1994 Hey, could you give some advice towards to these issues?

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

Have you solved it? i also meet same problem

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

add _ = tf.Variable(initial_value='fake_variable') before saver = tf.train.Saver(), it works in my codes. You can try

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan Thanks a lot for sharing!

I have tried to add " add _ = tf.Variable(initial_value='fake_variable') " before saver = tf.train.Saver(), but there is an error:

NotFoundError (see above for traceback): Key Variable not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/Assign/_2 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_7_save/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]

Here is my question. I use the previous version of train.py, it can run and create
models, but when I run test.py, it rises the error as above.

But when I use the latest version of train.py, it shows
loss_class += result[3]
IndexError: tuple index out of range
and for this one, I tried the one
"commented this in line 338

return object_mask, intersect_area, iou_scores"

the error comes:
"InvalidArgumentError (see above for traceback): ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]"

could you share your train.py? give some advice?
Thanks again!

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

I met the errors you mentioned above.
For the NotFoundError error, you can try the followed codes instead of the code saver.restore(sess, save_path=WEIGHTS_PATH) :
module_file = tf.train.latest_checkpoint(WEIGHTS_PATH)
sess.run(tf.global_variables_initializer())
if module_file is not None:
saver.restore(sess, module_file)
In my codes, it works.
For the IndexError, you can try to # return object_mask, intersect_area, iou_scores， as a matter of fact, you can find that the last line in yolov3.py is the what we need.
And what is previous version and last version of train.py, I just have one version.

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan Hi, you are so helpful and thanks very much.

and for the IndexError, I have tried to # return object_mask, intersect_area, iou_scores,

but the error comes:

=> loading yolov3/darknet-53/Conv_49/BatchNorm/gamma:0
=> loading yolov3/darknet-53/Conv_50/weights:0
=> loading yolov3/darknet-53/Conv_50/BatchNorm/gamma:0
=> loading yolov3/darknet-53/Conv_51/weights:0
=> loading yolov3/darknet-53/Conv_51/BatchNorm/gamma:0
=> EPOCH: 0 total_loss: nan loss_xy: 0.0066 loss_wh: nan loss_conf: 0.9094 loss_class: 0.0058 rec_50: 0.0000 rec_70: 0.0000 avg_iou: 0.0000
=> EPOCH: 1 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 2 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 3 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 4 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
2019-01-16 00:54:23.044334: W tensorflow/core/framework/op_kernel.cc:1190] Invalid argument: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
Traceback (most recent call last):
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "train.py", line 73, in
run_items = sess.run([train_op, write_op] + loss, feed_dict={is_training:True})
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

Caused by op 'yolov3/PyFunc_1', defined at:
File "train.py", line 44, in
loss = model.compute_loss(y_pred, y_true)
File "/DALAB/DATA1/suhuiqiao/new-tensorflow-yolov3-master/core/yolov3.py", line 256, in compute_loss
result = self.loss_layer(y_pred[i], y_true[i], _ANCHORS[i], ignore_thresh, max_box_per_image)
File "/DALAB/DATA1/suhuiqiao/new-tensorflow-yolov3-master/core/yolov3.py", line 356, in loss_layer
true_boxes = tf.py_func(pick_out_gt_box, [y_true], [tf.float32] )[0]
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 317, in py_func
func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 225, in _internal_py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 93, in _py_func
"PyFunc", input=input, token=token, Tout=Tout, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]

You can see there are two problems: one is the training loss equals to nan, another one is the shape error.

do you have some advice?
Thanks!!

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

@qiaomai89
For the first error, I changed "boxes = tf.concat([box_centers, box_sizes], axis=-1)" to "boxes =tf.concat([box_centers-box_sizes/2,box_centers + box_sizes/2], axis = -1)".
If you track the error, you can find that the code "pred_box_wh = pred_boxes[..., 2:4] - pred_boxes[..., 0:2] " in line 302 of yolov3,py causes the value pred_box_wh to be negative, so, it is nan in log funciton. If the statement "# pred_boxes 前面两个坐标是左上角，后面两个是右下角" is ture, the value pred_box_wh will be positive. Thus, pred_boxes not meets the above statement. My change let the statement to be ture. It can run but I am not sure it's very very very.... absolute correct.

For the second issue, I changed "true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] = true_boxes_per_layer " in line about 370 of yolo3.py as follows:
if len(true_boxes_per_layer)<max_box_per_image :
true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] = true_boxes_per_layer
else :
true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] =
true_boxes_per_layer[0:max_box_per_image]
The reason is that "true_boxes_batch = np.zeros([bs, 1, 1, 1, max_box_per_image, 4], dtype=np.float32)" in line about 357 limits the shape. off course, I am also not sure it's very very very.... absolute correct, but, mostly, it's correct.

After modifing, I think you can run.

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan you are really sooooooooo nice, kind,patient and helpful! I really appreciate your help! now I can run both training and testing.

By the way, what is your model performance? is it equal to the one trained by the codes provided by the author(darknet) in the same data set?

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

@qiaomai89
Hi, guy, you are welcome. I just trained little time yesterday and the result is not satisfied. I will train it in the next few days. And help more communication.

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan Hi, I am training both two frames(darknet and tf) in the same data set, if there is any result, I will let you know. And if there is anything you find, pls share with me. Thanks!

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan could you do testing on ckpt files?

I have tried two ways: first, running convert_weight.py, and I get three pb files. Then running nms_demo.py to see the jpg results, but there is nothing on the picture, and there is the log:
=> nms on gpu the number of boxes= 0 time=5239.83 ms
=> nms on gpu the number of boxes= 0 time=42.90 ms
=> nms on gpu the number of boxes= 0 time=37.91 ms
=> nms on gpu the number of boxes= 0 time=36.61 ms
=> nms on gpu the number of boxes= 0 time=40.40 ms

Another one I have tried: running test.py on tfrecords data, and there is the log:
=> EPOCH: 0 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 1 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 2 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 3 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 4 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 5 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 6 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 7 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 8 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 9 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 10 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 11 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 12 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 13 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 14 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 15 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 16 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 17 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 18 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 19 rec:0.00 prec:0.00 mAP:0.00

Any advice about this one?

from tensorflow-yolov3.

forwardwfg commented on May 18, 2024

@qiaomai89
Trying to lower the score_thresh in evaluate function in utils.py, you will see some map which is not zero but small value. I think that there are some problems with codes. I run the train.py, but I found that the total loss always floats in the range of about 0.1 to 2, not trending to convergent. So I try another code https://github.com/aloyschen/tensorflow-yolo3. It trends to convergent at least. I try to train it now. I recommend that you try this also.

For convenient communication, I think we can add QQ or Wechat.

from tensorflow-yolov3.

qiaomai89 commented on May 18, 2024

@WeifaGan you can add me wechat: 455741772

from tensorflow-yolov3.

stevenwuaggie507 commented on May 18, 2024

@WeifaGan Hi: I follow your suggestion code change and also lower score_thresh =0.1 but I still got the same result as @qiaomai89 did:

l GPU (device: 0, name: TITAN V, pci bus id: 0000:65:00.0, compute capability: 7.0)
=> nms on gpu the number of boxes= 0 time=4340.55 ms
=> nms on gpu the number of boxes= 0 time=28.67 ms
=> nms on gpu the number of boxes= 0 time=27.60 ms
=> nms on gpu the number of boxes= 0 time=28.12 ms
=> nms on gpu the number of boxes= 0 time=29.70 ms

do you solve this issue?
Thanks!!

from tensorflow-yolov3.

evaluating the trained model performance based on ckpt file about tensorflow-yolov3 HOT 14 OPEN

Comments (14)

return object_mask, intersect_area, iou_scores"

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent