HI, I'm running this command to train the same dataset provided in your repository but I'm not able to replicate same results cited in the paper. I'm using encoder resnet50 and decoder default pspnet, but the learning process seems not improving IoU even after 20 epochs, which is still stack below 0.4 in train and 0.2 in test and is far from 0.6 score reported in the paper.
python run.py train --data.path preprocessing/ --scheduler.target=poly --optimizer.encoder-lr=1e-4 --optimizer.decoder-lr=1e-3 --optimizer.lr=1e-3 --data.mask-body-ratio=0.0 --data.in-channels=2 --data.weighted-sampling --loss.target=tversky --model.encoder=resnet50
08 22:06] floods.training : INFO - Run started [266/648]
08 22:06] floods.training : INFO - Experiment ID: 2023-11-08-22-06 [265/648]
[2023-11-08 22:06] floods.training : INFO - Output folder: outputs/2023-11-08-22-06
[2023-11-08 22:06] floods.training : INFO - Models folder: outputs/2023-11-08-22-06/models
[2023-11-08 22:06] floods.training : INFO - Logs folder: outputs/2023-11-08-22-06/logs
[2023-11-08 22:06] floods.training : INFO - Configuration: outputs/2023-11-08-22-06/config.yaml
[2023-11-08 22:06] floods.training : INFO - Using seed: 1337
08 22:06] floods.training : INFO - Loading datasets... [259/648]
08 22:06] floods.prepare : INFO - Train transforms: Compose([ [258/648]
RandomSizedCrop(always_apply=False, p=0.8, min_max_height=(256, 512), height=512, width=512, w2h_ratio=1.0, interpolation=1),
Flip(always_apply=False, p=0.5),
RandomRotate90(always_apply=False, p=0.5),
ElasticTransform(always_apply=False, p=0.5, alpha=1, sigma=50, alpha_affine=50, interpolation=1, border_mode=4, value=None, m
ask_value=None, approximate=False, same_dxdy=False),
GridDistortion(always_apply=False, p=0.5, num_steps=5, distort_limit=(-0.3, 0.3), interpolation=1, border_mode=4, value=None,
mask_value=None, normalized=False),
], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}, is_check_shapes=True)Compose([
OneOf([
GaussianBlur(always_apply=False, p=0.5, blur_limit=(3, 13), sigma_limit=(0, 0)),
MultiplicativeNoise(always_apply=False, p=0.5, multiplier=(0.7, 1.3), per_channel=True, elementwise=True),
], p=0.6),
], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}, is_check_shapes=True)Compose([
], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}, is_check_shapes=True)
[2023-11-08 22:06] floods.prepare : INFO - Eval. transforms: Compose([
ClipNormalize(always_apply=False, p=1.0, mean=(0.049329374, 0.011776519), std=(0.0391287043, 0.0103687926), max_pixel_value=1
.0, clip_min=-30.0, clip_max=30.0),
ToTensorV2(always_apply=True, p=1.0, transpose_mask=False),
], p=1.0, bbox_params=None, keypoint_params=None, additional_targets={}, is_check_shapes=True)
[2023-11-08 22:06] floods.training : INFO - Full sets - train set: 6181 samples, validation set: 559 samples
[2023-11-08 22:06] floods.prepare : INFO - Found an existing array of sample weights
[2023-11-08 22:06] floods.training : INFO - Preparing model...
[2023-11-08 22:06] floods.prepare : INFO - Returning intermediate features: False
[2023-11-08 22:06] floods.training : INFO - Visualize: True, num. batches for visualization: 8
[2023-11-08 22:06] floods.trainer.base : INFO - [Epoch 0]
Epoch 0 - train: 0%| | 0/1545 [00:00<?, ?batch/s, loss=--]/root/anaconda3/envs/torch
/lib/python3.8/site-packages/torch/nn/modules/conv.py:456: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (T
riggered internally at /opt/conda/conda-bld/pytorch_1695392020195/work/aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
return F.conv2d(input, weight, bias, self.stride,
Epoch 0 - train: 100%|| 1545/1545 [42:42<00:00, 1.66s/batch, loss=0.6215]
[2023-11-08 22:49] floods.trainer.base : INFO - train/iou: 0.3486
Epoch 0 - val: 100%|| 70/70 [02:18<00:00, 1.97s/batch, loss=0.3641]
[2023-11-08 22:51] floods.trainer.base : INFO - val/f1: 0.3629, val/iou: 0.2216, val/precision: 0.3172, val/recall: 0.4
239, val/class_iou: 0.2216, val/class_f1: 0.3629
[2023-11-08 22:52] floods.trainer.callbacks: INFO - [Epoch 0] Checkpoint saved: outputs/2023-11-08-22-06/models/model-00_l
oss-0.62_iou-0.22.pth
[2023-11-08 22:52] floods.trainer.base : INFO - [Epoch 1]
Epoch 1 - train: 100%|| 1545/1545 [56:50<00:00, 2.21s/batch, loss=0.3676]
[2023-11-08 23:48] floods.trainer.base : INFO - train/iou: 0.3754
Epoch 1 - val: 100%|| 70/70 [00:58<00:00, 1.20batch/s, loss=0.4147]
[2023-11-08 23:49] floods.trainer.base : INFO - val/f1: 0.3433, val/iou: 0.2072, val/precision: 0.3358, val/recall: 0.3
511, val/class_iou: 0.2072, val/class_f1: 0.3433
[2023-11-08 23:49] floods.trainer.callbacks: INFO - [Epoch 1] Early stopping patience increased to: 1/25
[2023-11-08 23:49] floods.trainer.callbacks: INFO - [Epoch 1] Checkpoint saved: outputs/2023-11-08-22-06/models/model-01_l
oss-0.37_iou-0.21.pth
[2023-11-08 23:49] floods.trainer.base : INFO - [Epoch 2]
Epoch 2 - train: 100%|| 1545/1545 [44:25<00:00, 1.73s/batch, loss=0.3085]
[2023-11-09 00:34] floods.trainer.base : INFO - train/iou: 0.3809
Epoch 2 - val: 100%|| 70/70 [00:40<00:00, 1.71batch/s, loss=0.3751]
[2023-11-09 00:35] floods.trainer.base : INFO - val/f1: 0.3669, val/iou: 0.2247, val/precision: 0.3050, val/recall: 0.4
603, val/class_iou: 0.2247, val/class_f1: 0.3669
[2023-11-09 00:35] floods.trainer.callbacks: INFO - [Epoch 2] Checkpoint saved: outputs/2023-11-08-22-06/models/model-02_l
oss-0.31_iou-0.22.pth
[2023-11-09 00:35] floods.trainer.base : INFO - [Epoch 3]
Epoch 3 - train: 100%|| 1545/1545 [39:03<00:00, 1.52s/batch, loss=0.1254]
[2023-11-09 01:14] floods.trainer.base : INFO - train/iou: 0.3838
Epoch 3 - val: 100%|| 70/70 [00:45<00:00, 1.55batch/s, loss=0.3296]
[2023-11-09 01:14] floods.trainer.base : INFO - val/f1: 0.3741, val/iou: 0.2301, val/precision: 0.2948, val/recall: 0.5
117, val/class_iou: 0.2301, val/class_f1: 0.3741
[2023-11-09 01:14] floods.trainer.callbacks: INFO - [Epoch 3] Checkpoint saved: outputs/2023-11-08-22-06/models/model-03_l
oss-0.13_iou-0.23.pth
[2023-11-09 01:14] floods.trainer.base : INFO - [Epoch 4]
Epoch 4 - train: 100%|| 1545/1545 [37:03<00:00, 1.44s/batch, loss=0.1508]
[2023-11-09 01:51] floods.trainer.base : INFO - train/iou: 0.3762
Epoch 4 - val: 100%|| 70/70 [00:34<00:00, 2.00batch/s, loss=0.3494]
[2023-11-09 01:52] floods.trainer.base : INFO - val/f1: 0.3793, val/iou: 0.2340, val/precision: 0.3142, val/recall: 0.4
784, val/class_iou: 0.2340, val/class_f1: 0.3793
[2023-11-09 01:52] floods.trainer.callbacks: INFO - [Epoch 4] Checkpoint saved: outputs/2023-11-08-22-06/models/model-04_l
oss-0.15_iou-0.23.pth
[2023-11-09 01:52] floods.trainer.base : INFO - [Epoch 5]
Epoch 5 - train: 100%|| 1545/1545 [36:07<00:00, 1.40s/batch, loss=0.1026]
[2023-11-09 02:28] floods.trainer.base : INFO - train/iou: 0.3872
Epoch 5 - val: 100%|| 70/70 [00:52<00:00, 1.34batch/s, loss=0.3265]
[2023-11-09 02:29] floods.trainer.base : INFO - val/f1: 0.3757, val/iou: 0.2313, val/precision: 0.3201, val/recall: 0.4
547, val/class_iou: 0.2313, val/class_f1: 0.3757
[2023-11-09 02:29] floods.trainer.callbacks: INFO - [Epoch 5] Early stopping patience increased to: 1/25
[2023-11-09 02:29] floods.trainer.callbacks: INFO - [Epoch 5] No checkpoint saved
[2023-11-09 02:29] floods.trainer.base : INFO - [Epoch 6]
Epoch 6 - train: 100%|| 1545/1545 [35:36<00:00, 1.38s/batch, loss=0.4175]
[2023-11-09 03:05] floods.trainer.base : INFO - train/iou: 0.3982
Epoch 6 - val: 100%|| 70/70 [00:32<00:00, 2.15batch/s, loss=0.3370]
[2023-11-09 03:05] floods.trainer.base : INFO - val/f1: 0.3900, val/iou: 0.2423, val/precision: 0.3238, val/recall: 0.4
904, val/class_iou: 0.2423, val/class_f1: 0.3900
[2023-11-09 03:05] floods.trainer.callbacks: INFO - [Epoch 6] Checkpoint saved: outputs/2023-11-08-22-06/models/model-06_l
oss-0.42_iou-0.24.pth
[2023-11-09 03:05] floods.trainer.base : INFO - [Epoch 7]
Epoch 7 - train: 100%|| 1545/1545 [35:40<00:00, 1.39s/batch, loss=0.0647]
[2023-11-09 03:41] floods.trainer.base : INFO - train/iou: 0.4065
Epoch 7 - val: 100%|| 70/70 [00:43<00:00, 1.62batch/s, loss=0.3275]
[2023-11-09 03:42] floods.trainer.base : INFO - val/f1: 0.3743, val/iou: 0.2303, val/precision: 0.3162, val/recall: 0.4
586, val/class_iou: 0.2303, val/class_f1: 0.3743
[2023-11-09 03:42] floods.trainer.callbacks: INFO - [Epoch 7] Early stopping patience increased to: 1/25
[2023-11-09 03:42] floods.trainer.callbacks: INFO - [Epoch 7] No checkpoint saved
[2023-11-09 03:42] floods.trainer.base : INFO - [Epoch 8]
Epoch 8 - train: 100%|| 1545/1545 [36:11<00:00, 1.41s/batch, loss=0.2009]
[2023-11-09 04:18] floods.trainer.base : INFO - train/iou: 0.3958
Epoch 8 - val: 100%|_____| 70/70 [00:34<00:00, 2.02batch/s, loss=0.3475]
[2023-11-09 04:19] floods.trainer.base : INFO - val/f1: 0.3891, val/iou: 0.2415, val/precision: 0.3554, val/recall: 0.4
298, val/class_iou: 0.2415, val/class_f1: 0.3891
[2023-11-09 04:19] floods.trainer.callbacks: INFO - [Epoch 8] Early stopping patience increased to: 2/25
[2023-11-09 04:19] floods.trainer.callbacks: INFO - [Epoch 8] No checkpoint saved
[2023-11-09 04:19] floods.trainer.base : INFO - [Epoch 9]
Epoch 9 - train: 100%|| 1545/1545 [35:56<00:00, 1.40s/batch, loss=0.1640]
[2023-11-09 04:55] floods.trainer.base : INFO - train/iou: 0.3963
Epoch 9 - val: 100%|| 70/70 [00:36<00:00, 1.91batch/s, loss=0.4421]
[2023-11-09 04:55] floods.trainer.base : INFO - val/f1: 0.3628, val/iou: 0.2216, val/precision: 0.3357, val/recall: 0.3
947, val/class_iou: 0.2216, val/class_f1: 0.3628
[2023-11-09 04:55] floods.trainer.callbacks: INFO - [Epoch 9] Early stopping patience increased to: 3/25
[2023-11-09 04:55] floods.trainer.callbacks: INFO - [Epoch 9] No checkpoint saved
[2023-11-09 04:55] floods.trainer.base : INFO - [Epoch 10]
Epoch 10 - train: 100%|| 1545/1545 [35:57<00:00, 1.40s/batch, loss=0.3653]
[2023-11-09 05:31] floods.trainer.base : INFO - train/iou: 0.3876
Epoch 10 - val: 100%|| 70/70 [00:34<00:00, 2.03batch/s, loss=0.8312]
[2023-11-09 05:32] floods.trainer.base : INFO - val/f1: 0.3029, val/iou: 0.1785, val/precision: 0.3336, val/recall: 0.2
773, val/class_iou: 0.1785, val/class_f1: 0.3029
[2023-11-09 05:32] floods.trainer.callbacks: INFO - [Epoch 10] Early stopping patience increased to: 4/25
[2023-11-09 05:32] floods.trainer.callbacks: INFO - [Epoch 10] No checkpoint saved
[2023-11-09 05:32] floods.trainer.base : INFO - [Epoch 11]
Epoch 11 - train: 100%|| 1545/1545 [35:35<00:00, 1.38s/batch, loss=0.1300]
[2023-11-09 06:07] floods.trainer.base : INFO - train/iou: 0.3983
Epoch 11 - val: 100%|| 70/70 [00:40<00:00, 1.73batch/s, loss=0.5205]
[2023-11-09 06:08] floods.trainer.base : INFO - val/f1: 0.3787, val/iou: 0.2336, val/precision: 0.3345, val/recall: 0.4
363, val/class_iou: 0.2336, val/class_f1: 0.3787
[2023-11-09 06:08] floods.trainer.callbacks: INFO - [Epoch 11] Early stopping patience increased to: 5/25
[2023-11-09 06:08] floods.trainer.callbacks: INFO - [Epoch 11] No checkpoint saved
[2023-11-09 06:08] floods.trainer.base : INFO - [Epoch 12]
Epoch 12 - train: 100%|| 1545/1545 [35:45<00:00, 1.39s/batch, loss=0.1798]
[2023-11-09 06:44] floods.trainer.base : INFO - train/iou: 0.3968
Epoch 12 - val: 100%|| 70/70 [00:32<00:00, 2.13batch/s, loss=0.6031]
[2023-11-09 06:44] floods.trainer.base : INFO - val/f1: 0.3537, val/iou: 0.2148, val/precision: 0.3924, val/recall: 0.3
219, val/class_iou: 0.2148, val/class_f1: 0.3537
[2023-11-09 06:44] floods.trainer.callbacks: INFO - [Epoch 12] Early stopping patience increased to: 6/25
[2023-11-09 06:44] floods.trainer.callbacks: INFO - [Epoch 12] No checkpoint saved
[2023-11-09 06:44] floods.trainer.base : INFO - [Epoch 13]
Epoch 13 - train: 100%|| 1545/1545 [35:33<00:00, 1.38s/batch, loss=0.3209] [123/648]
[2023-11-09 07:20] floods.trainer.base : INFO - train/iou: 0.3617
Epoch 13 - val: 100%|| 70/70 [00:32<00:00, 2.13batch/s, loss=0.4442]
[2023-11-09 07:20] floods.trainer.base : INFO - val/f1: 0.3666, val/iou: 0.2244, val/precision: 0.4198, val/recall: 0.3
253, val/class_iou: 0.2244, val/class_f1: 0.3666
[2023-11-09 07:20] floods.trainer.callbacks: INFO - [Epoch 13] Early stopping patience increased to: 7/25
[2023-11-09 07:20] floods.trainer.callbacks: INFO - [Epoch 13] No checkpoint saved
[2023-11-09 07:21] floods.trainer.base : INFO - [Epoch 14]
Epoch 14 - train: 100%|| 1545/1545 [35:51<00:00, 1.39s/batch, loss=0.0421]
[2023-11-09 07:56] floods.trainer.base : INFO - train/iou: 0.3806
Epoch 14 - val: 100%|| 70/70 [00:31<00:00, 2.19batch/s, loss=0.5249]
[2023-11-09 07:57] floods.trainer.base : INFO - val/f1: 0.3868, val/iou: 0.2398, val/precision: 0.3947, val/recall: 0.3
792, val/class_iou: 0.2398, val/class_f1: 0.3868
[2023-11-09 07:57] floods.trainer.callbacks: INFO - [Epoch 14] Early stopping patience increased to: 8/25
[2023-11-09 07:57] floods.trainer.callbacks: INFO - [Epoch 14] No checkpoint saved
[2023-11-09 07:57] floods.trainer.base : INFO - [Epoch 15]
Epoch 15 - train: 100%|| 1545/1545 [35:42<00:00, 1.39s/batch, loss=0.3494]
[2023-11-09 08:33] floods.trainer.base : INFO - train/iou: 0.3696
Epoch 15 - val: 100%|| 70/70 [00:40<00:00, 1.74batch/s, loss=0.5384]
[2023-11-09 08:33] floods.trainer.base : INFO - val/f1: 0.3603, val/iou: 0.2197, val/precision: 0.3488, val/recall: 0.3
726, val/class_iou: 0.2197, val/class_f1: 0.3603
[2023-11-09 08:33] floods.trainer.callbacks: INFO - [Epoch 15] Early stopping patience increased to: 9/25
[2023-11-09 08:33] floods.trainer.callbacks: INFO - [Epoch 15] No checkpoint saved
[2023-11-09 08:33] floods.trainer.base : INFO - [Epoch 16]
Epoch 16 - train: 100%|| 1545/1545 [35:20<00:00, 1.37s/batch, loss=0.1894]
[2023-11-09 09:09] floods.trainer.base : INFO - train/iou: 0.3755
Epoch 16 - val: 100%|| 70/70 [00:33<00:00, 2.08batch/s, loss=0.5183]
[2023-11-09 09:09] floods.trainer.base : INFO - val/f1: 0.3650, val/iou: 0.2233, val/precision: 0.3838, val/recall: 0.3
480, val/class_iou: 0.2233, val/class_f1: 0.3650
[2023-11-09 09:09] floods.trainer.callbacks: INFO - [Epoch 16] Early stopping patience increased to: 10/25
[2023-11-09 09:09] floods.trainer.callbacks: INFO - [Epoch 16] No checkpoint saved
[2023-11-09 09:09] floods.trainer.base : INFO - [Epoch 17]
Epoch 17 - train: 100%|| 1545/1545 [35:37<00:00, 1.38s/batch, loss=0.2187]
[2023-11-09 09:45] floods.trainer.base : INFO - train/iou: 0.3761
Epoch 17 - val: 100%|| 70/70 [00:32<00:00, 2.15batch/s, loss=0.4508]
[2023-11-09 09:45] floods.trainer.base : INFO - val/f1: 0.3846, val/iou: 0.2381, val/precision: 0.3961, val/recall: 0.3
737, val/class_iou: 0.2381, val/class_f1: 0.3846
[2023-11-09 09:45] floods.trainer.callbacks: INFO - [Epoch 17] Early stopping patience increased to: 11/25
[2023-11-09 09:45] floods.trainer.callbacks: INFO - [Epoch 17] No checkpoint saved
[2023-11-09 09:45] floods.trainer.base : INFO - [Epoch 18]
Epoch 18 - train: 100%|| 1545/1545 [35:14<00:00, 1.37s/batch, loss=0.7247]
[2023-11-09 10:21] floods.trainer.base : INFO - train/iou: 0.3833
Epoch 18 - val: 100%|| 70/70 [00:35<00:00, 1.95batch/s, loss=0.5297]
[2023-11-09 10:21] floods.trainer.base : INFO - val/f1: 0.2582, val/iou: 0.1482, val/precision: 0.4622, val/recall: 0.1
791, val/class_iou: 0.1482, val/class_f1: 0.2582
[2023-11-09 10:21] floods.trainer.callbacks: INFO - [Epoch 18] Early stopping patience increased to: 12/25
[2023-11-09 10:21] floods.trainer.callbacks: INFO - [Epoch 18] No checkpoint saved
[2023-11-09 10:21] floods.trainer.base : INFO - [Epoch 19]
Epoch 19 - train: 100%|| 1545/1545 [35:36<00:00, 1.38s/batch, loss=0.2034]
[2023-11-09 10:57] floods.trainer.base : INFO - train/iou: 0.3750
Epoch 19 - val: 100%|_______________________________| 70/70 [00:37<00:00, 1.84batch/s, loss=0.4709]
[2023-11-09 10:58] floods.trainer.base : INFO - val/f1: 0.3656, val/iou: 0.2237, val/precision: 0.3960, val/recall: 0.3
396, val/class_iou: 0.2237, val/class_f1: 0.3656
[2023-11-09 10:58] floods.trainer.callbacks: INFO - [Epoch 19] Early stopping patience increased to: 13/25
[2023-11-09 10:58] floods.trainer.callbacks: INFO - [Epoch 19] No checkpoint saved
[2023-11-09 10:58] floods.trainer.base : INFO - [Epoch 20]