Hello, I try to train the faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_

Which parameters to reduce to avoid ResourceExhaustedError about pedestrian-detection HOT 3 CLOSED

thatbrguy commented on July 3, 2024

Which parameters to reduce to avoid ResourceExhaustedError

from pedestrian-detection.

Comments (3)

thatbrguy commented on July 3, 2024 3

Oh alright. You can try using Google Colab to train them then. You can fit FasterRCNN+ResNet-50 (and other models with similar param count) over there.

from pedestrian-detection.

thatbrguy commented on July 3, 2024 1

Changing parameters (besides batch size) won't help your case that much if you're using pretrained models. The model faster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28 is pretty large. I would suggest using a smaller model such as FasterRCNN_ResNet50 or SSD_MobileNet.

from pedestrian-detection.

MounirB commented on July 3, 2024

Same problem occurring again, even with SSD_MobileNet :/
I have a P400 GPU

2018-10-18 16:09:49.781181: W tensorflow/core/common_runtime/bfc_allocator.cc:279] ____**********___********************************************************************xxxxx
2018-10-18 16:09:49.781198: W tensorflow/core/framework/op_kernel.cc:1275] OP_REQUIRES failed at conv_ops.cc:693 : Resource exhausted: OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
INFO:tensorflow:Error reported to Coordinator: <class 'tensorflow.python.framework.errors_impl.ResourceExhaustedError'>, OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_3_pointwise/weights/read/_3593)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: Loss/Where_260/_6409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14144_Loss/Where_260", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D', defined at:
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 165, in
tf.app.run()
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/trainer.py", line 165, in _create_losses
prediction_dict = detection_model.predict(images)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/meta_architectures/ssd_meta_arch.py", line 264, in predict
preprocessed_inputs)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/models/ssd_mobilenet_v1_feature_extractor.py", line 106, in extract_features
scope=scope)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/nets/mobilenet_v1.py", line 258, in mobilenet_v1_base
scope=end_point)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 805, in apply
return self.call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 362, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 186, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in call
return self.conv_op(inp, filter)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in call
return self.call(inp, filter)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in call
name=self.name)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 956, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_3_pointwise/weights/read/_3593)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: Loss/Where_260/_6409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14144_Loss/Where_260", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Traceback (most recent call last):
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1278, in _do_call
return fn(*args)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1263, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1350, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_3_pointwise/weights/read/_3593)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: Loss/Where_260/_6409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14144_Loss/Where_260", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 165, in
tf.app.run()
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/trainer.py", line 332, in train
saver=saver)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 770, in train
sess, train_op, global_step, train_step_kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step
run_metadata=run_metadata)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 877, in run
run_metadata_ptr)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1100, in _run
feed_dict_tensor, options, run_metadata)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1272, in _do_run
run_metadata)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1291, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_3_pointwise/weights/read/_3593)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: Loss/Where_260/_6409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14144_Loss/Where_260", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Caused by op 'FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D', defined at:
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 165, in
tf.app.run()
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/train.py", line 161, in main
worker_job_name, is_chief, FLAGS.train_dir)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/trainer.py", line 228, in train
clones = model_deploy.create_clones(deploy_config, model_fn, [input_queue])
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/deployment/model_deploy.py", line 193, in create_clones
outputs = model_fn(*args, **kwargs)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/trainer.py", line 165, in _create_losses
prediction_dict = detection_model.predict(images)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/meta_architectures/ssd_meta_arch.py", line 264, in predict
preprocessed_inputs)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/object_detection/models/ssd_mobilenet_v1_feature_extractor.py", line 106, in extract_features
scope=scope)
File "/home/mounir/PycharmProjects/Pedestrian-detection-DL/nets/mobilenet_v1.py", line 258, in mobilenet_v1_base
scope=end_point)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1154, in convolution2d
conv_dims=2)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(*args, **current_args)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1057, in convolution
outputs = layer.apply(inputs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 805, in apply
return self.call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 362, in call
outputs = super(Layer, self).call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call
outputs = self.call(inputs, *args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 186, in call
outputs = self._convolution_op(inputs, self.kernel)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 868, in call
return self.conv_op(inp, filter)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 520, in call
return self.call(inp, filter)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 204, in call
name=self.name)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 956, in conv2d
data_format=data_format, dilations=dilations, name=name)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 454, in new_func
return func(*args, **kwargs)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3155, in create_op
op_def=op_def)
File "/home/mounir/anaconda3/envs/tflow-gpu/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1717, in init
self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[24,128,75,75] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[Node: FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_pointwise/Conv2D = Conv2D[T=DT_FLOAT, data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](FeatureExtractor/MobilenetV1/MobilenetV1/Conv2d_3_depthwise/Relu6, FeatureExtractor/MobilenetV1/Conv2d_3_pointwise/weights/read/_3593)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

[[Node: Loss/Where_260/_6409 = _Recvclient_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_14144_Loss/Where_260", tensor_type=DT_INT64, _device="/job:localhost/replica:0/task:0/device:CPU:0"]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

from pedestrian-detection.

Which parameters to reduce to avoid ResourceExhaustedError about pedestrian-detection HOT 3 CLOSED

Comments (3)

Related Issues (11)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent