...
2019-09-14 09:19:42,314 maskrcnn_benchmark.trainer INFO: eta: 8:44:11 iter: 740 loss: 3.6038 (3.8930) loss_retina_positive: 3.4092 (3.6858) loss_retina_negative: 0.1473 (0.2072) time: 0.3526 (0.3524) data: 0.0130 (0.0210) lr: 0.010000 max mem: 4004
2019-09-14 09:19:49,227 maskrcnn_benchmark.trainer INFO: eta: 8:43:48 iter: 760 loss: 3.6373 (3.8917) loss_retina_positive: 3.4743 (3.6837) loss_retina_negative: 0.1995 (0.2080) time: 0.3483 (0.3522) data: 0.0153 (0.0209) lr: 0.010000 max mem: 4004
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [1,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [2,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [3,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [4,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [5,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [6,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [7,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [8,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [9,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [10,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [11,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [12,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [13,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [14,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
/opt/conda/conda-bld/pytorch_1556653099582/work/aten/src/THCUNN/BCECriterion.cu:57: void bce_updateOutput_no_reduce_functor<Dtype, Acctype>::operator()(const Dtype *, const Dtype *, Dtype *) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [15,0,0] Assertion `*input >= 0. && *input <= 1.` failed.
Traceback (most recent call last):
File "tools/train_net.py", line 171, in <module>
main()
File "tools/train_net.py", line 164, in main
model = train(cfg, args.local_rank, args.distributed)
File "tools/train_net.py", line 73, in train
arguments,
File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/engine/trainer.py", line 74, in do_train
loss_dict = model(images, targets)
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/detector/retinanet.py", line 62, in forward
(anchors, detections), detector_losses = self.rpn(images, rpn_features, targets)
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in __call__
result = self.forward(*input, **kwargs)
File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 152, in forward
return self._forward_train(anchors, box_cls, box_regression, targets)
File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/retinanet.py", line 159, in _forward_train
anchors, box_cls, box_regression, targets
File "/home/zlq/code/FreeAnchor/maskrcnn_benchmark/modeling/rpn/free_anchor_loss.py", line 114, in __call__
(object_box_iou - self.bbox_threshold) / (H - self.bbox_threshold)
RuntimeError: CUDA error: device-side assert triggered
Traceback (most recent call last):
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/distributed/launch.py", line 235, in <module>
main()
File "/home/zlq/anaconda3/envs/torch2/lib/python3.6/site-packages/torch/distributed/launch.py", line 231, in main
cmd=process.args)
subprocess.CalledProcessError: Command '['/home/zlq/anaconda3/envs/torch2/bin/python', '-u', 'tools/train_net.py', '--local_rank=0', '--config-file', 'configs/free_anchor_R-50-FPN_1x.yaml']' returned non-zero exit status 1.
Did anyone meet this error and please give me some suggestions on sloving this! Thanks a lot!