I tested all models and they work very well except for YOLO.
Below is error message when I tried to train.
tf-docker /app/K210_Yolo_framework > make train MODEL=yolo MAXEP=1 ILR=0.001 DATASET=voc CLSNUM=20 IAA=False BATCH=16
python3 ./keras_train.py \
--train_set voc \
--class_num 20 \
--pre_ckpt "" \
--model_def yolo \
--depth_multiplier 0.75 \
--augmenter False \
--image_size 224 320 \
--output_size 7 10 14 20 \
--batch_size 16 \
--rand_seed 3 \
--max_nrof_epochs 1 \
--init_learning_rate 0.001 \
--learning_rate_decay_factor 0 \
--obj_weight 1 \
--noobj_weight 1 \
--wh_weight 1 \
--obj_thresh 0.7 \
--iou_thresh 0.5 \
--vaildation_split 0.05 \
--log_dir log \
--is_prune False \
--prune_initial_sparsity 0.5 \
--prune_final_sparsity 0.9 \
--prune_end_epoch 5 \
--prune_frequency 100
2020-06-11 08:58:42.428160: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-06-11 08:58:42.432710: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2020-06-11 08:58:42.573288: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.573645: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x57b2a90 executing computations on platform CUDA. Devices:
2020-06-11 08:58:42.573665: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): GeForce GTX 1650, Compute Capability 7.5
2020-06-11 08:58:42.593525: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2599990000 Hz
2020-06-11 08:58:42.593991: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x56b8550 executing computations on platform Host. Devices:
2020-06-11 08:58:42.594015: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): <undefined>, <undefined>
2020-06-11 08:58:42.594243: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.594556: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1650 major: 7 minor: 5 memoryClockRate(GHz): 1.56
pciBusID: 0000:01:00.0
2020-06-11 08:58:42.594836: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 08:58:42.595891: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-11 08:58:42.596735: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-11 08:58:42.596998: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-11 08:58:42.598197: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-11 08:58:42.599063: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-11 08:58:42.601488: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-11 08:58:42.601588: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.601883: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.602095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-11 08:58:42.602130: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 08:58:42.602803: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-11 08:58:42.602815: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-11 08:58:42.602821: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-11 08:58:42.602904: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.603170: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.603412: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2946 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5)
[ INFO ] data augment is False
WARNING: Logging before flag parsing goes to stderr.
W0611 08:58:42.764019 139703720445760 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/ops/dataset_ops.py:494: py_func (from tensorflow.python.ops.script_ops) is deprecated and will be removed in a future version.
Instructions for updating:
tf.py_func is deprecated in TF V2. Instead, there are two
options available in V2.
- tf.py_function takes a python function which manipulates tf eager
tensors instead of numpy arrays. It's easy to convert a tf eager tensor to
an ndarray (just call tensor.numpy()) but having access to eager tensors
means `tf.py_function`s can use accelerators such as GPUs as well as
being differentiable using a gradient tape.
- tf.numpy_function maintains the semantics of the deprecated tf.py_func
(it is not differentiable, and manipulates numpy arrays). It drops the
stateful argument making all functions stateful.
W0611 08:58:42.782631 139703720445760 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/data/util/random_seed.py:58: add_dispatch_support.<locals>.wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
2020-06-11 08:58:42.786745: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.787202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1650 major: 7 minor: 5 memoryClockRate(GHz): 1.56
pciBusID: 0000:01:00.0
2020-06-11 08:58:42.787283: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 08:58:42.787312: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-11 08:58:42.787349: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-11 08:58:42.787398: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-11 08:58:42.787451: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-11 08:58:42.787504: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-11 08:58:42.787537: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-11 08:58:42.787690: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.788133: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.788488: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-11 08:58:42.788899: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.789267: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties:
name: GeForce GTX 1650 major: 7 minor: 5 memoryClockRate(GHz): 1.56
pciBusID: 0000:01:00.0
2020-06-11 08:58:42.789312: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.0
2020-06-11 08:58:42.789348: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10.0
2020-06-11 08:58:42.789362: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10.0
2020-06-11 08:58:42.789388: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10.0
2020-06-11 08:58:42.789429: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10.0
2020-06-11 08:58:42.789455: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10.0
2020-06-11 08:58:42.789468: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2020-06-11 08:58:42.789565: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.789932: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.790293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2020-06-11 08:58:42.790313: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2020-06-11 08:58:42.790318: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187] 0
2020-06-11 08:58:42.790322: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0: N
2020-06-11 08:58:42.790438: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.790889: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-06-11 08:58:42.791276: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 2946 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1650, pci bus id: 0000:01:00.0, compute capability: 7.5)
[ INFO ] data augment is False
W0611 08:58:42.951060 139703720445760 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:1251: calling VarianceScaling.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0611 08:58:46.494499 139703720445760 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:97: calling GlorotUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0611 08:58:46.494857 139703720445760 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:97: calling Zeros.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0611 08:58:46.495954 139703720445760 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/init_ops.py:97: calling Ones.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version.
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
W0611 08:58:54.212770 139703720445760 hdf5_format.py:221] No training configuration found in save file: the model was *not* compiled. Compile it manually.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1864, in _create_c_op
c_op = c_api.TF_FinishOperation(op_desc)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimensions must be equal, but are 13 and 7 for 'loss/l1_loss/calc_mask_0/xywh_to_all_0/add' (op: 'Add') with input shapes: [?,13,13,3,2], [7,10,1,2].
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./keras_train.py", line 163, in <module>
args.prune_frequency)
File "./keras_train.py", line 88, in main
metrics=[Yolo_Precision(obj_thresh, name='p'), Yolo_Recall(obj_thresh, name='r')])
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 337, in compile
self._compile_weights_loss_and_weighted_metrics()
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/training/tracking/base.py", line 457, in _method_wrapper
result = method(self, *args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1710, in _compile_weights_loss_and_weighted_metrics
self.total_loss = self._prepare_total_loss(masks)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/engine/training.py", line 1770, in _prepare_total_loss
per_sample_losses = loss_fn.call(y_true, y_pred)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/keras/losses.py", line 215, in call
return self.fn(y_true, y_pred, **self._fn_kwargs)
File "/app/K210_Yolo_framework/tools/utils.py", line 760, in loss_fn
iou_thresh, layer, h)
File "/app/K210_Yolo_framework/tools/utils.py", line 689, in calc_ignore_mask
pred_xy, pred_wh = tf_xywh_to_all(p_xy, p_wh, layer, helper)
File "/app/K210_Yolo_framework/tools/utils.py", line 545, in tf_xywh_to_all
all_pred_xy = (tf.sigmoid(grid_pred_xy[..., :]) + h.xy_offset[layer]) / h.out_hw[layer][::-1]
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py", line 897, in binary_op_wrapper
return func(x, y, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py", line 387, in add
"Add", x=x, y=y, name=name)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
op_def=op_def)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 2027, in __init__
control_input_ops)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ops.py", line 1867, in _create_c_op
raise ValueError(str(e))
ValueError: Dimensions must be equal, but are 13 and 7 for 'loss/l1_loss/calc_mask_0/xywh_to_all_0/add' (op: 'Add') with input shapes: [?,13,13,3,2], [7,10,1,2].
Makefile:35: recipe for target 'train' failed
make: *** [train] Error 1
I'm using TensorFlow docker with GPU support. But I think it's not problem because training with all other models works very well in the same environment.
I downloaded pretrained weights yolo_weights.h5
from your Google Drive.
Please, help me.