Giter Site home page Giter Site logo

Comments (6)

fcjian avatar fcjian commented on June 1, 2024 2

You can try to clamp the value of the box area when computing GIoU loss, e.g.,

area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0)
area2 = fp16_clamp((bboxes2[..., 2] - bboxes2[..., 0]), min=0) * fp16_clamp((
bboxes2[..., 3] - bboxes2[..., 1]), min=0)

from tood.

MeteoriteWeny avatar MeteoriteWeny commented on June 1, 2024 1

@fcjian Thanks for reply! It solves the CUDA error, but the model can not converge. During training, a problem similar with gradient cutting happened. The log shows a sudden increase of loss. After that, the loss fluctuates in a tiny range. I'll try again with the original TOOD code without transfering to higher mmdet version.

2021-12-29 09:32:52,217 - mmdet - INFO - Epoch [1][600/1162]	lr: 2.000e-03, eta: 8:39:18, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 0.6940, loss_bbox: 1.2061, loss: 1.9001
2021-12-29 09:33:18,832 - mmdet - INFO - Epoch [1][650/1162]	lr: 2.000e-03, eta: 8:38:09, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6794, loss_bbox: 1.1886, loss: 1.8680
2021-12-29 09:33:45,535 - mmdet - INFO - Epoch [1][700/1162]	lr: 2.000e-03, eta: 8:37:13, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6674, loss_bbox: 1.0485, loss: 1.7159
2021-12-29 09:34:12,217 - mmdet - INFO - Epoch [1][750/1162]	lr: 2.000e-03, eta: 8:36:19, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 0.6646, loss_bbox: 1.0119, loss: 1.6765
2021-12-29 09:34:38,781 - mmdet - INFO - Epoch [1][800/1162]	lr: 2.000e-03, eta: 8:35:20, time: 0.531, data_time: 0.013, memory: 5142, loss_cls: 0.6487, loss_bbox: 0.9564, loss: 1.6051
2021-12-29 09:35:05,190 - mmdet - INFO - Epoch [1][850/1162]	lr: 2.000e-03, eta: 8:34:14, time: 0.528, data_time: 0.013, memory: 5142, loss_cls: 0.6176, loss_bbox: 0.8406, loss: 1.4582
2021-12-29 09:35:31,799 - mmdet - INFO - Epoch [1][900/1162]	lr: 2.000e-03, eta: 8:33:26, time: 0.532, data_time: 0.013, memory: 5142, loss_cls: 0.6210, loss_bbox: 0.9229, loss: 1.5439
2021-12-29 09:35:58,144 - mmdet - INFO - Epoch [1][950/1162]	lr: 2.000e-03, eta: 8:32:24, time: 0.527, data_time: 0.013, memory: 5142, loss_cls: 1.1693, loss_bbox: 1.1850, loss: 2.3543
2021-12-29 09:36:25,339 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:36:25,340 - mmdet - INFO - Epoch [1][1000/1162]	lr: 2.000e-03, eta: 8:32:14, time: 0.544, data_time: 0.013, memory: 5142, loss_cls: 1.2817, loss_bbox: 1.3174, loss: 2.5991
2021-12-29 09:36:52,114 - mmdet - INFO - Epoch [1][1050/1162]	lr: 2.000e-03, eta: 8:31:39, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2358, loss_bbox: 1.2847, loss: 2.5205
2021-12-29 09:37:18,908 - mmdet - INFO - Epoch [1][1100/1162]	lr: 2.000e-03, eta: 8:31:07, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.2365, loss_bbox: 1.3173, loss: 2.5538
2021-12-29 09:37:45,867 - mmdet - INFO - Epoch [1][1150/1162]	lr: 2.000e-03, eta: 8:30:43, time: 0.539, data_time: 0.013, memory: 5142, loss_cls: 1.2022, loss_bbox: 1.2296, loss: 2.4319
2021-12-29 09:37:52,329 - mmdet - INFO - Saving checkpoint at 1 epochs
2021-12-29 09:38:47,804 - mmdet - INFO - Evaluating bbox...
2021-12-29 09:38:51,494 - mmdet - INFO - Exp name: tood_r50_fpn_on_input_1x_coco_cloth.py
2021-12-29 09:38:51,495 - mmdet - INFO - Epoch(val) [1][793]	bbox_mAP: 0.0170, bbox_mAP_50: 0.0560, bbox_mAP_75: 0.0090, bbox_mAP_s: -1.0000, bbox_mAP_m: 0.0240, bbox_mAP_l: 0.0190, bbox_mAP_copypaste: 0.017 0.056 0.009 -1.000 0.024 0.019
2021-12-29 09:39:21,128 - mmdet - INFO - Epoch [2][50/1162]	lr: 2.000e-03, eta: 8:27:14, time: 0.592, data_time: 0.062, memory: 5142, loss_cls: 1.2236, loss_bbox: 1.2423, loss: 2.4659
2021-12-29 09:39:47,839 - mmdet - INFO - Epoch [2][100/1162]	lr: 2.000e-03, eta: 8:26:45, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2410, loss_bbox: 1.2517, loss: 2.4927
2021-12-29 09:40:14,530 - mmdet - INFO - Epoch [2][150/1162]	lr: 2.000e-03, eta: 8:26:16, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2827, loss_bbox: 1.2900, loss: 2.5726
2021-12-29 09:40:41,392 - mmdet - INFO - Epoch [2][200/1162]	lr: 2.000e-03, eta: 8:25:54, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2351, loss_bbox: 1.2374, loss: 2.4725
2021-12-29 09:41:08,168 - mmdet - INFO - Epoch [2][250/1162]	lr: 2.000e-03, eta: 8:25:28, time: 0.536, data_time: 0.013, memory: 5142, loss_cls: 1.1736, loss_bbox: 1.1955, loss: 2.3691
2021-12-29 09:41:34,806 - mmdet - INFO - Epoch [2][300/1162]	lr: 2.000e-03, eta: 8:24:57, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2357, loss_bbox: 1.2372, loss: 2.4729
2021-12-29 09:42:01,528 - mmdet - INFO - Epoch [2][350/1162]	lr: 2.000e-03, eta: 8:24:29, time: 0.534, data_time: 0.013, memory: 5142, loss_cls: 1.2839, loss_bbox: 1.2587, loss: 2.5425
2021-12-29 09:42:28,154 - mmdet - INFO - Epoch [2][400/1162]	lr: 2.000e-03, eta: 8:23:58, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2595, loss_bbox: 1.2359, loss: 2.4954
2021-12-29 09:42:54,986 - mmdet - INFO - Epoch [2][450/1162]	lr: 2.000e-03, eta: 8:23:35, time: 0.537, data_time: 0.013, memory: 5142, loss_cls: 1.2725, loss_bbox: 1.3049, loss: 2.5773
2021-12-29 09:43:21,637 - mmdet - INFO - Epoch [2][500/1162]	lr: 2.000e-03, eta: 8:23:05, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2867, loss_bbox: 1.2862, loss: 2.5730
2021-12-29 09:43:48,377 - mmdet - INFO - Epoch [2][550/1162]	lr: 2.000e-03, eta: 8:22:38, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2554, loss_bbox: 1.2227, loss: 2.4781
2021-12-29 09:44:15,013 - mmdet - INFO - Epoch [2][600/1162]	lr: 2.000e-03, eta: 8:22:08, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.2519, loss_bbox: 1.2955, loss: 2.5474
2021-12-29 09:44:42,014 - mmdet - INFO - Epoch [2][650/1162]	lr: 2.000e-03, eta: 8:21:49, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2472, loss_bbox: 1.2727, loss: 2.5199
2021-12-29 09:45:08,675 - mmdet - INFO - Epoch [2][700/1162]	lr: 2.000e-03, eta: 8:21:20, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.1740, loss_bbox: 1.2461, loss: 2.4200
2021-12-29 09:45:35,666 - mmdet - INFO - Epoch [2][750/1162]	lr: 2.000e-03, eta: 8:21:00, time: 0.540, data_time: 0.013, memory: 5142, loss_cls: 1.2391, loss_bbox: 1.2960, loss: 2.5351
2021-12-29 09:46:02,395 - mmdet - INFO - Epoch [2][800/1162]	lr: 2.000e-03, eta: 8:20:33, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2462, loss_bbox: 1.2470, loss: 2.4933
2021-12-29 09:46:29,543 - mmdet - INFO - Epoch [2][850/1162]	lr: 2.000e-03, eta: 8:20:17, time: 0.543, data_time: 0.013, memory: 5142, loss_cls: 1.2525, loss_bbox: 1.3128, loss: 2.5653
2021-12-29 09:46:56,271 - mmdet - INFO - Epoch [2][900/1162]	lr: 2.000e-03, eta: 8:19:50, time: 0.535, data_time: 0.013, memory: 5142, loss_cls: 1.2501, loss_bbox: 1.2733, loss: 2.5234
2021-12-29 09:47:22,898 - mmdet - INFO - Epoch [2][950/1162]	lr: 2.000e-03, eta: 8:19:19, time: 0.533, data_time: 0.013, memory: 5142, loss_cls: 1.3215, loss_bbox: 1.2575, loss: 2.5790

from tood.

one23sunnyQQ avatar one23sunnyQQ commented on June 1, 2024

Same issue

from tood.

Bo396543018 avatar Bo396543018 commented on June 1, 2024

Same issue. My mmdet's version is 2.19.0 and raise error during training the 3rd epoch

from tood.

GloriaHM avatar GloriaHM commented on June 1, 2024

You can try to clamp the value of the box area when computing GIoU loss, e.g.,

area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0)
area2 = fp16_clamp((bboxes2[..., 2] - bboxes2[..., 0]), min=0) * fp16_clamp((
bboxes2[..., 3] - bboxes2[..., 1]), min=0)

hello sir,i have clamp the value of box area as you show ,but still crash at the 5rd epoch. My mmdet's version is 2.14.0+d3e713d.

Error Report:

/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [508,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [238,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [328,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [32,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [33,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [34,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [35,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [36,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [37,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [38,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [39,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [40,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [41,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [42,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [43,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [44,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [45,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [46,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [47,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [48,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [49,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [50,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [51,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [52,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [53,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [54,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [55,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [56,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [57,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [58,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [59,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [60,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [61,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [62,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [418,0,0], thread: [63,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [0,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [1,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [2,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [3,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [4,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [5,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [6,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [7,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [8,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [9,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [10,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [11,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [12,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [13,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [14,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [15,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [16,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [17,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [18,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [19,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [20,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [21,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [22,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [23,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [24,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [25,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [26,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [27,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [28,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [29,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [30,0,0] Assertion input_val >= zero && input_val <= one failed.
/opt/conda/conda-bld/pytorch_1614378098133/work/aten/src/ATen/native/cuda/Loss.cu:102: operator(): block: [568,0,0], thread: [31,0,0] Assertion input_val >= zero && input_val <= one failed.
Traceback (most recent call last):
File "./tools/train.py", line 188, in
main()
File "./tools/train.py", line 184, in main
meta=meta)
File "/mnt/mhm/project/TODO/TOOD/mmdet/apis/train.py", line 170, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter
**kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/parallel/distributed.py", line 52, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 237, in train_step
losses = self(**data)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func
return old_func(*args, **kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/base.py", line 171, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/detectors/single_stage.py", line 83, in forward_train
gt_labels, gt_bboxes_ignore)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/base_dense_head.py", line 54, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 186, in new_func
return old_func(*args, kwargs)
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 447, in loss
num_total_samples=num_total_samples)
File "/mnt/mhm/project/TODO/TOOD/mmdet/core/utils/misc.py", line 29, in multi_apply
return tuple(map(list, zip(map_results)))
File "/mnt/mhm/project/TODO/TOOD/mmdet/models/dense_heads/tood_head.py", line 354, in loss_single
& (labels < bg_class_ind)).nonzero().squeeze(1)
RuntimeError: CUDA error: device-side assert triggered
terminate called after throwing an instance of 'c10::Error'
what(): CUDA error: device-side assert triggered
Exception raised from create_event_internal at /opt/conda/conda-bld/pytorch_1614378098133/work/c10/cuda/CUDACachingAllocator.cpp:733 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fdf062a32f2 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const
, char const
, unsigned int, std::string const&) + 0x5b (0x7fdf062a067b in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: c10::cuda::CUDACachingAllocator::raw_delete(void
) + 0x809 (0x7fdf064fc219 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::TensorImpl::release_resources() + 0x54 (0x7fdf0628b3a4 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #4: + 0x6e6a3a (0x7fdf5d204a3a in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x6e6ae1 (0x7fdf5d204ae1 in /root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #6: + 0x1817da (0x55b94fc797da in /root/anaconda3/envs/open-mmlab/bin/python)
frame #7: + 0xfbfa9 (0x55b94fbf3fa9 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #8: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #9: + 0xfa8c8 (0x55b94fbf28c8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #10: + 0xfa2d8 (0x55b94fbf22d8 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #11: + 0xfad68 (0x55b94fbf2d68 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #12: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #13: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #14: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #15: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #16: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #17: + 0xfad7c (0x55b94fbf2d7c in /root/anaconda3/envs/open-mmlab/bin/python)
frame #18: + 0x12b327 (0x55b94fc23327 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #19: PyDict_SetItemString + 0x89 (0x55b94fc2fe59 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #20: PyImport_Cleanup + 0xab (0x55b94fca4d0b in /root/anaconda3/envs/open-mmlab/bin/python)
frame #21: Py_FinalizeEx + 0x64 (0x55b94fd19304 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #22: + 0x232960 (0x55b94fd2a960 in /root/anaconda3/envs/open-mmlab/bin/python)
frame #23: _Py_UnixMain + 0x3c (0x55b94fd2accc in /root/anaconda3/envs/open-mmlab/bin/python)
frame #24: __libc_start_main + 0xf0 (0x7fdf9851e830 in /lib/x86_64-linux-gnu/libc.so.6)
frame #25: + 0x1d7555 (0x55b94fccf555 in /root/anaconda3/envs/open-mmlab/bin/python)

Killing subprocess 19911
Traceback (most recent call last):
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/train.py', '--local_rank=0',

Thank you for your reply.

from tood.

beeper00 avatar beeper00 commented on June 1, 2024

i meet the same issue , my code is
"area1 = fp16_clamp((bboxes1[..., 2] - bboxes1[..., 0]), min=0) * fp16_clamp((
bboxes1[..., 3] - bboxes1[..., 1]), min=0) "
since i clone the code, so i don't have to modify it.
but the bug still happens. and it happens randomly each time when i train it.

from tood.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.