Comments (14)
@YunYang1994 Hey, could you give some advice towards to these issues?
from tensorflow-yolov3.
Have you solved it? i also meet same problem
from tensorflow-yolov3.
add _ = tf.Variable(initial_value='fake_variable') before saver = tf.train.Saver(), it works in my codes. You can try
from tensorflow-yolov3.
@WeifaGan Thanks a lot for sharing!
I have tried to add " add _ = tf.Variable(initial_value='fake_variable') " before saver = tf.train.Saver(), but there is an error:
NotFoundError (see above for traceback): Key Variable not found in checkpoint
[[Node: save/RestoreV2 = RestoreV2[dtypes=[DT_STRING], _device="/job:localhost/replica:0/task:0/device:CPU:0"](_arg_save/Const_0_0, save/RestoreV2/tensor_names, save/RestoreV2/shape_and_slices)]]
[[Node: save/Assign/_2 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_7_save/Assign", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
Here is my question. I use the previous version of train.py, it can run and create
models, but when I run test.py, it rises the error as above.
But when I use the latest version of train.py, it shows
loss_class += result[3]
IndexError: tuple index out of range
and for this one, I tried the one
"commented this in line 338
return object_mask, intersect_area, iou_scores"
the error comes:
"InvalidArgumentError (see above for traceback): ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]"
could you share your train.py? give some advice?
Thanks again!
from tensorflow-yolov3.
I met the errors you mentioned above.
For the NotFoundError error, you can try the followed codes instead of the code saver.restore(sess, save_path=WEIGHTS_PATH) :
module_file = tf.train.latest_checkpoint(WEIGHTS_PATH)
sess.run(tf.global_variables_initializer())
if module_file is not None:
saver.restore(sess, module_file)
In my codes, it works.
For the IndexError, you can try to # return object_mask, intersect_area, iou_scores, as a matter of fact, you can find that the last line in yolov3.py is the what we need.
And what is previous version and last version of train.py, I just have one version.
from tensorflow-yolov3.
@WeifaGan Hi, you are so helpful and thanks very much.
and for the IndexError, I have tried to # return object_mask, intersect_area, iou_scores,
but the error comes:
=> loading yolov3/darknet-53/Conv_49/BatchNorm/gamma:0
=> loading yolov3/darknet-53/Conv_50/weights:0
=> loading yolov3/darknet-53/Conv_50/BatchNorm/gamma:0
=> loading yolov3/darknet-53/Conv_51/weights:0
=> loading yolov3/darknet-53/Conv_51/BatchNorm/gamma:0
=> EPOCH: 0 total_loss: nan loss_xy: 0.0066 loss_wh: nan loss_conf: 0.9094 loss_class: 0.0058 rec_50: 0.0000 rec_70: 0.0000 avg_iou: 0.0000
=> EPOCH: 1 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 2 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 3 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
=> EPOCH: 4 total_loss: nan loss_xy: nan loss_wh: nan loss_conf: nan loss_class: nan rec_50: 0.0000 rec_70: 0.0000 avg_iou: nan
2019-01-16 00:54:23.044334: W tensorflow/core/framework/op_kernel.cc:1190] Invalid argument: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
Traceback (most recent call last):
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1361, in _do_call
return fn(*args)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1340, in _run_fn
target_list, status, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/errors_impl.py", line 516, in exit
c_api.TF_GetCode(self.status.status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 73, in
run_items = sess.run([train_op, write_op] + loss, feed_dict={is_training:True})
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 905, in run
run_metadata_ptr)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1137, in _run
feed_dict_tensor, options, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1355, in _do_run
options, run_metadata)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1374, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Caused by op 'yolov3/PyFunc_1', defined at:
File "train.py", line 44, in
loss = model.compute_loss(y_pred, y_true)
File "/DALAB/DATA1/suhuiqiao/new-tensorflow-yolov3-master/core/yolov3.py", line 256, in compute_loss
result = self.loss_layer(y_pred[i], y_true[i], _ANCHORS[i], ignore_thresh, max_box_per_image)
File "/DALAB/DATA1/suhuiqiao/new-tensorflow-yolov3-master/core/yolov3.py", line 356, in loss_layer
true_boxes = tf.py_func(pick_out_gt_box, [y_true], [tf.float32] )[0]
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 317, in py_func
func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/script_ops.py", line 225, in _internal_py_func
input=inp, token=token, Tout=Tout, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/ops/gen_script_ops.py", line 93, in _py_func
"PyFunc", input=input, token=token, Tout=Tout, name=name)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3271, in create_op
op_def=op_def)
File "/home/suhuiqiao/anaconda3/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1650, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InvalidArgumentError (see above for traceback): ValueError: could not broadcast input array from shape (10,4) into shape (8,4)
[[Node: yolov3/PyFunc_1 = PyFuncTin=[DT_FLOAT], Tout=[DT_FLOAT], token="pyfunc_2", _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
You can see there are two problems: one is the training loss equals to nan, another one is the shape error.
do you have some advice?
Thanks!!
from tensorflow-yolov3.
@qiaomai89
For the first error, I changed "boxes = tf.concat([box_centers, box_sizes], axis=-1)" to "boxes =tf.concat([box_centers-box_sizes/2,box_centers + box_sizes/2], axis = -1)".
If you track the error, you can find that the code "pred_box_wh = pred_boxes[..., 2:4] - pred_boxes[..., 0:2] " in line 302 of yolov3,py causes the value pred_box_wh to be negative, so, it is nan in log funciton. If the statement "# pred_boxes 前面两个坐标是左上角,后面两个是右下角" is ture, the value pred_box_wh will be positive. Thus, pred_boxes not meets the above statement. My change let the statement to be ture. It can run but I am not sure it's very very very.... absolute correct.
For the second issue, I changed "true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] = true_boxes_per_layer " in line about 370 of yolo3.py as follows:
if len(true_boxes_per_layer)<max_box_per_image :
true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] = true_boxes_per_layer
else :
true_boxes_batch[i][0][0][0][0:len(true_boxes_per_layer)] =
true_boxes_per_layer[0:max_box_per_image]
The reason is that "true_boxes_batch = np.zeros([bs, 1, 1, 1, max_box_per_image, 4], dtype=np.float32)" in line about 357 limits the shape. off course, I am also not sure it's very very very.... absolute correct, but, mostly, it's correct.
After modifing, I think you can run.
from tensorflow-yolov3.
@WeifaGan you are really sooooooooo nice, kind,patient and helpful! I really appreciate your help! now I can run both training and testing.
By the way, what is your model performance? is it equal to the one trained by the codes provided by the author(darknet) in the same data set?
from tensorflow-yolov3.
@qiaomai89
Hi, guy, you are welcome. I just trained little time yesterday and the result is not satisfied. I will train it in the next few days. And help more communication.
from tensorflow-yolov3.
@WeifaGan Hi, I am training both two frames(darknet and tf) in the same data set, if there is any result, I will let you know. And if there is anything you find, pls share with me. Thanks!
from tensorflow-yolov3.
@WeifaGan could you do testing on ckpt files?
I have tried two ways: first, running convert_weight.py, and I get three pb files. Then running nms_demo.py to see the jpg results, but there is nothing on the picture, and there is the log:
=> nms on gpu the number of boxes= 0 time=5239.83 ms
=> nms on gpu the number of boxes= 0 time=42.90 ms
=> nms on gpu the number of boxes= 0 time=37.91 ms
=> nms on gpu the number of boxes= 0 time=36.61 ms
=> nms on gpu the number of boxes= 0 time=40.40 ms
Another one I have tried: running test.py on tfrecords data, and there is the log:
=> EPOCH: 0 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 1 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 2 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 3 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 4 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 5 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 6 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 7 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 8 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 9 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 10 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 11 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 12 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 13 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 14 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 15 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 16 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 17 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 18 rec:0.00 prec:0.00 mAP:0.00
=> EPOCH: 19 rec:0.00 prec:0.00 mAP:0.00
Any advice about this one?
from tensorflow-yolov3.
@qiaomai89
Trying to lower the score_thresh in evaluate function in utils.py, you will see some map which is not zero but small value. I think that there are some problems with codes. I run the train.py, but I found that the total loss always floats in the range of about 0.1 to 2, not trending to convergent. So I try another code https://github.com/aloyschen/tensorflow-yolo3. It trends to convergent at least. I try to train it now. I recommend that you try this also.
For convenient communication, I think we can add QQ or Wechat.
from tensorflow-yolov3.
@WeifaGan you can add me wechat: 455741772
from tensorflow-yolov3.
@WeifaGan Hi: I follow your suggestion code change and also lower score_thresh =0.1 but I still got the same result as @qiaomai89 did:
l GPU (device: 0, name: TITAN V, pci bus id: 0000:65:00.0, compute capability: 7.0)
=> nms on gpu the number of boxes= 0 time=4340.55 ms
=> nms on gpu the number of boxes= 0 time=28.67 ms
=> nms on gpu the number of boxes= 0 time=27.60 ms
=> nms on gpu the number of boxes= 0 time=28.12 ms
=> nms on gpu the number of boxes= 0 time=29.70 ms
do you solve this issue?
Thanks!!
from tensorflow-yolov3.
Related Issues (20)
- Cannot train each branch separately (large-object or medium-object branch)
- Can this implemented on Tensorflow 2.4?
- 能否直接读取ckpt文件的网络进行训练
- question about restriction on xind, yind
- yolov3 tensorflow2的实现
- 测试的时候为什么只有一个input_size?图片不是正方形怎么办? HOT 1
- Using MSE loss for class prediction
- 提一个bug,固定的一组anchors用于多尺度训练存在不严谨。 HOT 1
- 测试时检测的目标框比实际目标大出很多
- 请问您有相关的yolov3部署的教程吗?开发板或者服务器
- 3d detection
- Stuck
- Converting .pb to .tflite ? HOT 1
- How do you save the trained model to HDF5 (h5) format?
- Can i train on multi-gpu HOT 1
- 发现个测试小问题image_demo/video_demo
- Need explnation on LR Scheduler
- WHAT IS PROBLEM HERE
- Key conv52/batch_normalization/beta/ExponentialMovingAverage not found in checkpoint
- cpu训练正常,gpu训练出现loss=nan HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tensorflow-yolov3.