Giter Site home page Giter Site logo

Comments (17)

mittagessen avatar mittagessen commented on August 30, 2024 1

Or still can be finetuned the old models?

I assume fine-tuning should be sufficient. It is a very minor change in input data presentation but we haven't tested it extensively. As a note, because you're training segmentation models: it only concerns recognition. The generation of baselines and bounding polygons remains unchanged. There's an explanation of what exactly changed in the 5.x release changelog.

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Trying to train old seg model raises error:

ketos segtrain -d cuda:0 -f alto -t output.txt -tl --resize both -i /home/incognito/kraken-train/sam_finetuning/ubma_sam_v2.mlmodel -o /home/incognito/kraken-train/sam_finetuning/ubma_sam_v4/ubma_sam_v4

TypeError: MetricsTextColumn.__init__() missing 2 required positional arguments: 'text_delimiter' and 'metrics_format'

from kraken.

mittagessen avatar mittagessen commented on August 30, 2024

Can you run the first command with --raise-on-error? That should give me some information what exactly is breaking. The models should be compatible, kraken will just default to a version of the polygon extractor that has not all speedups enabled.

For the second part, I think that's a regression caused by bumping up lightning to 2.2.

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Sure, here we go:

(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~$ kraken --raise-on-error -i 159.jpg lines.json segment -bl -i ubma_sam_v2.mlmodel
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2.post303 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading ANN ubma_sam_v2.mlmodel ✓
Segmenting      [04/22/24 12:15:30] ERROR    Failed processing 159.jpg: list index out of range
   kraken.py:429
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/incognito/miniconda3/envs/kraken-5.2.0/bin/kraken:8 in <module>                            │
│                                                                                                  │
│   5 from kraken.kraken import cli                                                                │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1157 in  │
│ __call__                                                                                         │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1078 in  │
│ main                                                                                             │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1720 in  │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1657 in  │
│ _process_result                                                                                  │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:783 in   │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/kraken.py:427   │
│ in process_pipeline                                                                              │
│                                                                                                  │
│   424 │   │   │   │   if len(fc) - 2 == idx:                                                     │
│   425 │   │   │   │   │   ctx.meta['last_process'] = True                                        │
│   426 │   │   │   │   with threadpool_limits(limits=ctx.meta['threads']):                        │
│ ❱ 427 │   │   │   │   │   task(input=input, output=output)                                       │
│   428 │   │   except Exception as e:                                                             │
│   429 │   │   │   logger.error(f'Failed processing {io_pair[0]}: {str(e)}')                      │
│   430 │   │   │   if ctx.meta['raise_failed']:                                                   │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/kraken.py:154   │
│ in segmenter                                                                                     │
│                                                                                                  │
│   151 │   │   │   │   │   │   │   │     pad=pad,                                                 │
│   152 │   │   │   │   │   │   │   │     mask=mask)                                               │
│   153 │   │   else:                                                                              │
│ ❱ 154 │   │   │   res = blla.segment(im, text_direction, mask=mask, model=model, device=device   │
│   155 │   │   │   │   │   │   │      raise_on_error=ctx.meta['raise_failed'], autocast=ctx.met   │
│   156 │   except Exception:                                                                      │
│   157 │   │   if ctx.meta['raise_failed']:                                                       │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/blla.py:396 in  │
│ segment                                                                                          │
│                                                                                                  │
│   393 │   │   │   │   line_regs.append(reg_id)                                                   │
│   394 │   │   blls.append(BaselineLine(id=str(uuid.uuid4()), baseline=line['baseline'], bounda   │
│   395 │                                                                                          │
│ ❱ 396 │   return Segmentation(text_direction=text_direction,                                     │
│   397 │   │   │   │   │   │   imagename=getattr(im, 'filename', None),                           │
│   398 │   │   │   │   │   │   type='baselines',                                                  │
│   399 │   │   │   │   │   │   lines=blls,                                                        │
│ in __init__:10                                                                                   │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/containers.py:1 │
│ 97 in __post_init__                                                                              │
│                                                                                                  │
│   194 │   │   if len(self.lines) and not isinstance(self.lines[0], BBoxLine) and not isinstanc   │
│   195 │   │   │   line_cls = BBoxLine if self.type == 'bbox' else BaselineLine                   │
│   196 │   │   │   self.lines = [line_cls(**line) for line in self.lines]                         │
│ ❱ 197 │   │   if len(self.regions) and not isinstance(next(iter(self.regions.values()))[0], Re   │
│   198 │   │   │   regs = {}                                                                      │
│   199 │   │   │   for k, v in self.regions.items():                                              │
│   200 │   │   │   │   regs[k] = [Region(**reg) for reg in v]                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

For the second part, I think that's a regression caused by bumping up lightning to 2.2.

Here?

(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~/kraken-train/sam_finetuning$ grep 'lightning' ~/kraken/environment_cuda.yml
  - conda-forge::pytorch-lightning~=2.0.0
(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~/kraken-train/sam_finetuning$ grep 'lightning' ~/kraken-5.2.0/environment_cuda.yml
  - conda-forge::lightning~=2.2.0

from kraken.

mittagessen avatar mittagessen commented on August 30, 2024

For the second part, I think that's a regression caused by bumping up lightning to 2.2.

Here?

Yes, I've just confirmed it. And pushed a bugfix.

For the second part, I suspect it is an issue with the model not having any regions and the incorrect test for it. I'll fix it as soon as I've found a new laptop charger.

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Perfect, I'll reinstall and test the first part.

EDIT: I'll just patch kraken/lib/progress.py

EDIT: reinstalled fresh

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

I don't know, just tested and same happens...

(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~$ kraken --raise-on-error -i 159.jpg lines.json segment -bl -i ubma_sam_v2.mlmodel
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2.post303 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading ANN ubma_sam_v2.mlmodel ✓
Segmenting      [04/22/24 12:32:58] ERROR    Failed processing 159.jpg: list index out of range
   kraken.py:429
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/incognito/miniconda3/envs/kraken-5.2.0/bin/kraken:8 in <module>                            │
│                                                                                                  │
│   5 from kraken.kraken import cli                                                                │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1157 in  │
│ __call__                                                                                         │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1078 in  │
│ main                                                                                             │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1720 in  │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1657 in  │
│ _process_result                                                                                  │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:783 in   │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/kraken.py:427   │
│ in process_pipeline                                                                              │
│                                                                                                  │
│   424 │   │   │   │   if len(fc) - 2 == idx:                                                     │
│   425 │   │   │   │   │   ctx.meta['last_process'] = True                                        │
│   426 │   │   │   │   with threadpool_limits(limits=ctx.meta['threads']):                        │
│ ❱ 427 │   │   │   │   │   task(input=input, output=output)                                       │
│   428 │   │   except Exception as e:                                                             │
│   429 │   │   │   logger.error(f'Failed processing {io_pair[0]}: {str(e)}')                      │
│   430 │   │   │   if ctx.meta['raise_failed']:                                                   │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/kraken.py:154   │
│ in segmenter                                                                                     │
│                                                                                                  │
│   151 │   │   │   │   │   │   │   │     pad=pad,                                                 │
│   152 │   │   │   │   │   │   │   │     mask=mask)                                               │
│   153 │   │   else:                                                                              │
│ ❱ 154 │   │   │   res = blla.segment(im, text_direction, mask=mask, model=model, device=device   │
│   155 │   │   │   │   │   │   │      raise_on_error=ctx.meta['raise_failed'], autocast=ctx.met   │
│   156 │   except Exception:                                                                      │
│   157 │   │   if ctx.meta['raise_failed']:                                                       │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/blla.py:396 in  │
│ segment                                                                                          │
│                                                                                                  │
│   393 │   │   │   │   line_regs.append(reg_id)                                                   │
│   394 │   │   blls.append(BaselineLine(id=str(uuid.uuid4()), baseline=line['baseline'], bounda   │
│   395 │                                                                                          │
│ ❱ 396 │   return Segmentation(text_direction=text_direction,                                     │
│   397 │   │   │   │   │   │   imagename=getattr(im, 'filename', None),                           │
│   398 │   │   │   │   │   │   type='baselines',                                                  │
│   399 │   │   │   │   │   │   lines=blls,                                                        │
│ in __init__:10                                                                                   │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/containers.py:1 │
│ 97 in __post_init__                                                                              │
│                                                                                                  │
│   194 │   │   if len(self.lines) and not isinstance(self.lines[0], BBoxLine) and not isinstanc   │
│   195 │   │   │   line_cls = BBoxLine if self.type == 'bbox' else BaselineLine                   │
│   196 │   │   │   self.lines = [line_cls(**line) for line in self.lines]                         │
│ ❱ 197 │   │   if len(self.regions) and not isinstance(next(iter(self.regions.values()))[0], Re   │
│   198 │   │   │   regs = {}                                                                      │
│   199 │   │   │   for k, v in self.regions.items():                                              │
│   200 │   │   │   │   regs[k] = [Region(**reg) for reg in v]                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
IndexError: list index out of range

from kraken.

mittagessen avatar mittagessen commented on August 30, 2024

I don't know, just tested and same happens...

Yes, those are two separate problems. The segtrain thing was caused by the progress bar, the inference one is most likely a faulty test that I'll fix once I've got laptop battery charge again.

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

I think is something more than that, at least in my env...
Why do I get now CUDA library error is a mistery... CUDA works for me, I'm using in other versions of kraken and training BERT models also.

(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~/kraken-train/sam_finetuning$ ketos segtrain -d cuda:0 -f alto -t output.txt -tl --resize both -i /home/incognito/kraken-train/sam_finetuning/ubma_sam_v2.mlmodel -o /home/incognito/kraken-train/sam_finetuning/ubma_sam_v4/ubma_sa
m_v4
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2.post303 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Training line types:
  default       2       1912
Training region types:
  Main  3       63
  text  4       2
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
`Trainer(val_check_interval=1.0)` was configured so validation will run at the end of the training epoch..
You are using a CUDA device ('NVIDIA GeForce RTX 4070') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
┏━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃    ┃ Name              ┃ Type                     ┃ Params ┃                      In sizes ┃                Out sizes ┃
┡━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ 0  │ net               │ MultiParamSequential     │  1.3 M │             [1, 3, 1800, 300] │   [[1, 5, 450, 75], '?'] │
│ 1  │ net.C_0           │ ActConv2D                │  9.5 K │      [[1, 3, 1800, 300], '?'] │ [[1, 64, 900, 150], '?'] │
│ 2  │ net.Gn_1          │ GroupNorm                │    128 │ [[1, 64, 900, 150], '?', '?'] │ [[1, 64, 900, 150], '?'] │
│ 3  │ net.C_2           │ ActConv2D                │ 73.9 K │ [[1, 64, 900, 150], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 4  │ net.Gn_3          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 5  │ net.C_4           │ ActConv2D                │  147 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 6  │ net.Gn_5          │ GroupNorm                │    256 │ [[1, 128, 450, 75], '?', '?'] │ [[1, 128, 450, 75], '?'] │
│ 7  │ net.C_6           │ ActConv2D                │  295 K │ [[1, 128, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 8  │ net.Gn_7          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 9  │ net.C_8           │ ActConv2D                │  590 K │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 10 │ net.Gn_9          │ GroupNorm                │    512 │ [[1, 256, 450, 75], '?', '?'] │ [[1, 256, 450, 75], '?'] │
│ 11 │ net.L_10          │ TransposedSummarizingRNN │ 74.2 K │ [[1, 256, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 12 │ net.L_11          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 13 │ net.C_12          │ ActConv2D                │  2.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 14 │ net.Gn_13         │ GroupNorm                │     64 │  [[1, 32, 450, 75], '?', '?'] │  [[1, 32, 450, 75], '?'] │
│ 15 │ net.L_14          │ TransposedSummarizingRNN │ 16.9 K │  [[1, 32, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 16 │ net.L_15          │ TransposedSummarizingRNN │ 25.1 K │  [[1, 64, 450, 75], '?', '?'] │  [[1, 64, 450, 75], '?'] │
│ 17 │ net.l_16          │ ActConv2D                │    325 │  [[1, 64, 450, 75], '?', '?'] │   [[1, 5, 450, 75], '?'] │
│ 18 │ val_px_accuracy   │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 19 │ val_mean_accuracy │ MultilabelAccuracy       │      0 │                             ? │                        ? │
│ 20 │ val_mean_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
│ 21 │ val_freq_iu       │ MultilabelJaccardIndex   │      0 │                             ? │                        ? │
└────┴───────────────────┴──────────────────────────┴────────┴───────────────────────────────┴──────────────────────────┘
Trainable params: 1.3 M
Non-trainable params: 0
Total params: 1.3 M
Total estimated model params size (MB): 5
stage 0/50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/56 0:00:00 • -:--:-- 0.00it/s val_accuracy: 0.994 val_mean_acc: 0.994 val_mean_iu: 0.533 val_freq_iu: 0.944Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/pytorch-pretrained-bert-py3.10/lib/python3.10/site-packages/nvidia/cudnn/lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn3cnn34layerNormFwd_execute_internal_implERKNS_7backend11VariantPackEP11CUstream_stRNS0_18LayerNormFwdParamsERNS1_12OperationSetERP12cudnnContextmb, version libcudnn_cnn_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
Could not load library libcudnn_cnn_train.so.8. Error: /home/incognito/miniconda3/envs/kraken-5.2.0/bin/../lib/libcudnn_cnn_train.so.8: undefined symbol: _ZN5cudnn14cublasSaxpy_v2EP13cublasContextiPKfS3_iPfi, version libcudnn_ops_infer.so.8
stage 0/50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/56 0:00:00 • -:--:-- 0.00it/s val_accuracy: 0.994 val_mean_acc: 0.994 val_mean_iu: 0.533 val_freq_iu: 0.944
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/incognito/miniconda3/envs/kraken-5.2.0/bin/ketos:8 in <module>                             │
│                                                                                                  │
│   5 from kraken.ketos import cli                                                                 │
│   6 if __name__ == '__main__':                                                                   │
│   7 │   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])                         │
│ ❱ 8 │   sys.exit(cli())                                                                          │
│   9                                                                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1157 in  │
│ __call__                                                                                         │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1078 in  │
│ main                                                                                             │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1688 in  │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:1434 in  │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/core.py:783 in   │
│ invoke                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/click/decorators.py:33 │
│ in new_func                                                                                      │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/ketos/segmentat │
│ ion.py:366 in segtrain                                                                           │
│                                                                                                  │
│   363 │   │   │   │   │   │   │   **val_check_interval)                                          │
│   364 │                                                                                          │
│   365 │   with threadpool_limits(limits=threads):                                                │
│ ❱ 366 │   │   trainer.fit(model)                                                                 │
│   367 │                                                                                          │
│   368 │   if model.best_epoch == -1:                                                             │
│   369 │   │   logger.warning('Model did not improve during training.')                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/lib/train.py:13 │
│ 2 in fit                                                                                         │
│                                                                                                  │
│    129 │   │   with warnings.catch_warnings():                                                   │
│    130 │   │   │   warnings.filterwarnings(action='ignore', category=UserWarning,                │
│    131 │   │   │   │   │   │   │   │   │   message='The dataloader,')                            │
│ ❱  132 │   │   │   super().fit(*args, **kwargs)                                                  │
│    133                                                                                           │
│    134                                                                                           │
│    135 class KrakenFreezeBackbone(BaseFinetuning):                                               │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/trainer.py:544 in fit                                                                        │
│                                                                                                  │
│    541 │   │   self.state.fn = TrainerFn.FITTING                                                 │
│    542 │   │   self.state.status = TrainerStatus.RUNNING                                         │
│    543 │   │   self.training = True                                                              │
│ ❱  544 │   │   call._call_and_handle_interrupt(                                                  │
│    545 │   │   │   self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule,  │
│    546 │   │   )                                                                                 │
│    547                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/call.py:44 in _call_and_handle_interrupt                                                     │
│                                                                                                  │
│    41 │   try:                                                                                   │
│    42 │   │   if trainer.strategy.launcher is not None:                                          │
│    43 │   │   │   return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer,    │
│ ❱  44 │   │   return trainer_fn(*args, **kwargs)                                                 │
│    45 │                                                                                          │
│    46 │   except _TunerExitException:                                                            │
│    47 │   │   _call_teardown_hook(trainer)                                                       │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/trainer.py:580 in _fit_impl                                                                  │
│                                                                                                  │
│    577 │   │   │   model_provided=True,                                                          │
│    578 │   │   │   model_connected=self.lightning_module is not None,                            │
│    579 │   │   )                                                                                 │
│ ❱  580 │   │   self._run(model, ckpt_path=ckpt_path)                                             │
│    581 │   │                                                                                     │
│    582 │   │   assert self.state.stopped                                                         │
│    583 │   │   self.training = False                                                             │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/trainer.py:987 in _run                                                                       │
│                                                                                                  │
│    984 │   │   # ----------------------------                                                    │
│    985 │   │   # RUN THE TRAINER                                                                 │
│    986 │   │   # ----------------------------                                                    │
│ ❱  987 │   │   results = self._run_stage()                                                       │
│    988 │   │                                                                                     │
│    989 │   │   # ----------------------------                                                    │
│    990 │   │   # POST-Training CLEAN UP                                                          │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/trainer.py:1033 in _run_stage                                                                │
│                                                                                                  │
│   1030 │   │   │   with isolate_rng():                                                           │
│   1031 │   │   │   │   self._run_sanity_check()                                                  │
│   1032 │   │   │   with torch.autograd.set_detect_anomaly(self._detect_anomaly):                 │
│ ❱ 1033 │   │   │   │   self.fit_loop.run()                                                       │
│   1034 │   │   │   return None                                                                   │
│   1035 │   │   raise RuntimeError(f"Unexpected state {self.state}")                              │
│   1036                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/fit_loop.py:205 in run                                                                         │
│                                                                                                  │
│   202 │   │   while not self.done:                                                               │
│   203 │   │   │   try:                                                                           │
│   204 │   │   │   │   self.on_advance_start()                                                    │
│ ❱ 205 │   │   │   │   self.advance()                                                             │
│   206 │   │   │   │   self.on_advance_end()                                                      │
│   207 │   │   │   │   self._restarting = False                                                   │
│   208 │   │   │   except StopIteration:                                                          │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/fit_loop.py:363 in advance                                                                     │
│                                                                                                  │
│   360 │   │   │   )                                                                              │
│   361 │   │   with self.trainer.profiler.profile("run_training_epoch"):                          │
│   362 │   │   │   assert self._data_fetcher is not None                                          │
│ ❱ 363 │   │   │   self.epoch_loop.run(self._data_fetcher)                                        │
│   364 │                                                                                          │
│   365 │   def on_advance_end(self) -> None:                                                      │
│   366 │   │   trainer = self.trainer                                                             │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/training_epoch_loop.py:140 in run                                                              │
│                                                                                                  │
│   137 │   │   self.on_run_start(data_fetcher)                                                    │
│   138 │   │   while not self.done:                                                               │
│   139 │   │   │   try:                                                                           │
│ ❱ 140 │   │   │   │   self.advance(data_fetcher)                                                 │
│   141 │   │   │   │   self.on_advance_end(data_fetcher)                                          │
│   142 │   │   │   │   self._restarting = False                                                   │
│   143 │   │   │   except StopIteration:                                                          │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/training_epoch_loop.py:250 in advance                                                          │
│                                                                                                  │
│   247 │   │   │   with trainer.profiler.profile("run_training_batch"):                           │
│   248 │   │   │   │   if trainer.lightning_module.automatic_optimization:                        │
│   249 │   │   │   │   │   # in automatic optimization, there can only be one optimizer           │
│ ❱ 250 │   │   │   │   │   batch_output = self.automatic_optimization.run(trainer.optimizers[0]   │
│   251 │   │   │   │   else:                                                                      │
│   252 │   │   │   │   │   batch_output = self.manual_optimization.run(kwargs)                    │
│   253                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/optimization/automatic.py:190 in run                                                           │
│                                                                                                  │
│   187 │   │   # ------------------------------                                                   │
│   188 │   │   # gradient update with accumulated gradients                                       │
│   189 │   │   else:                                                                              │
│ ❱ 190 │   │   │   self._optimizer_step(batch_idx, closure)                                       │
│   191 │   │                                                                                      │
│   192 │   │   result = closure.consume_result()                                                  │
│   193 │   │   if result.loss is None:                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/optimization/automatic.py:268 in _optimizer_step                                               │
│                                                                                                  │
│   265 │   │   │   self.optim_progress.optimizer.step.increment_ready()                           │
│   266 │   │                                                                                      │
│   267 │   │   # model hook                                                                       │
│ ❱ 268 │   │   call._call_lightning_module_hook(                                                  │
│   269 │   │   │   trainer,                                                                       │
│   270 │   │   │   "optimizer_step",                                                              │
│   271 │   │   │   trainer.current_epoch,                                                         │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/call.py:157 in _call_lightning_module_hook                                                   │
│                                                                                                  │
│   154 │   pl_module._current_fx_name = hook_name                                                 │
│   155 │                                                                                          │
│   156 │   with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hoo   │
│ ❱ 157 │   │   output = fn(*args, **kwargs)                                                       │
│   158 │                                                                                          │
│   159 │   # restore current_fx when nested context                                               │
│   160 │   pl_module._current_fx_name = prev_fx_name                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/kraken/lib/train.py:11 │
│ 06 in optimizer_step                                                                             │
│                                                                                                  │
│   1103 │                                                                                         │
│   1104 │   def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_closure):             │
│   1105 │   │   # update params                                                                   │
│ ❱ 1106 │   │   optimizer.step(closure=optimizer_closure)                                         │
│   1107 │   │                                                                                     │
│   1108 │   │   # linear warmup between 0 and the initial learning rate `lrate` in `warmup`       │
│   1109 │   │   # steps.                                                                          │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/core │
│ /optimizer.py:152 in step                                                                        │
│                                                                                                  │
│   149 │   │   │   raise MisconfigurationException("When `optimizer.step(closure)` is called, t   │
│   150 │   │                                                                                      │
│   151 │   │   assert self._strategy is not None                                                  │
│ ❱ 152 │   │   step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)    │
│   153 │   │                                                                                      │
│   154 │   │   self._on_after_step()                                                              │
│   155                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/stra │
│ tegies/strategy.py:239 in optimizer_step                                                         │
│                                                                                                  │
│   236 │   │   model = model or self.lightning_module                                             │
│   237 │   │   # TODO(fabric): remove assertion once strategy's optimizer_step typing is fixed    │
│   238 │   │   assert isinstance(model, pl.LightningModule)                                       │
│ ❱ 239 │   │   return self.precision_plugin.optimizer_step(optimizer, model=model, closure=clos   │
│   240 │                                                                                          │
│   241 │   def _setup_model_and_optimizers(self, model: Module, optimizers: List[Optimizer]) ->   │
│   242 │   │   """Setup a model and multiple optimizers together.                                 │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/plug │
│ ins/precision/precision.py:122 in optimizer_step                                                 │
│                                                                                                  │
│   119 │   ) -> Any:                                                                              │
│   120 │   │   """Hook to run the optimizer step."""                                              │
│   121 │   │   closure = partial(self._wrap_closure, model, optimizer, closure)                   │
│ ❱ 122 │   │   return optimizer.step(closure=closure, **kwargs)                                   │
│   123 │                                                                                          │
│   124 │   def _clip_gradients(                                                                   │
│   125 │   │   self,                                                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/optim/optimizer. │
│ py:373 in wrapper                                                                                │
│                                                                                                  │
│   370 │   │   │   │   │   │   │   │   f"{func} must return None or a tuple of (new_args, new_k   │
│   371 │   │   │   │   │   │   │   )                                                              │
│   372 │   │   │   │                                                                              │
│ ❱ 373 │   │   │   │   out = func(*args, **kwargs)                                                │
│   374 │   │   │   │   self._optimizer_step_code()                                                │
│   375 │   │   │   │                                                                              │
│   376 │   │   │   │   # call optimizer step post hooks                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/optim/optimizer. │
│ py:76 in _use_grad                                                                               │
│                                                                                                  │
│    73 │   │   │   # see https://github.com/pytorch/pytorch/issues/104053                         │
│    74 │   │   │   torch.set_grad_enabled(self.defaults['differentiable'])                        │
│    75 │   │   │   torch._dynamo.graph_break()                                                    │
│ ❱  76 │   │   │   ret = func(self, *args, **kwargs)                                              │
│    77 │   │   finally:                                                                           │
│    78 │   │   │   torch._dynamo.graph_break()                                                    │
│    79 │   │   │   torch.set_grad_enabled(prev_grad)                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/optim/adam.py:14 │
│ 3 in step                                                                                        │
│                                                                                                  │
│   140 │   │   loss = None                                                                        │
│   141 │   │   if closure is not None:                                                            │
│   142 │   │   │   with torch.enable_grad():                                                      │
│ ❱ 143 │   │   │   │   loss = closure()                                                           │
│   144 │   │                                                                                      │
│   145 │   │   for group in self.param_groups:                                                    │
│   146 │   │   │   params_with_grad = []                                                          │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/plug │
│ ins/precision/precision.py:108 in _wrap_closure                                                  │
│                                                                                                  │
│   105 │   │   consistent with the ``Precision`` subclasses that cannot pass ``optimizer.step(c   │
│   106 │   │                                                                                      │
│   107 │   │   """                                                                                │
│ ❱ 108 │   │   closure_result = closure()                                                         │
│   109 │   │   self._after_closure(model, optimizer)                                              │
│   110 │   │   return closure_result                                                              │
│   111                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/optimization/automatic.py:144 in __call__                                                      │
│                                                                                                  │
│   141 │                                                                                          │
│   142 │   @override                                                                              │
│   143 │   def __call__(self, *args: Any, **kwargs: Any) -> Optional[Tensor]:                     │
│ ❱ 144 │   │   self._result = self.closure(*args, **kwargs)                                       │
│   145 │   │   return self._result.loss                                                           │
│   146                                                                                            │
│   147                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/utils/_contextli │
│ b.py:115 in decorate_context                                                                     │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/optimization/automatic.py:138 in closure                                                       │
│                                                                                                  │
│   135 │   │   │   self._zero_grad_fn()                                                           │
│   136 │   │                                                                                      │
│   137 │   │   if self._backward_fn is not None and step_output.closure_loss is not None:         │
│ ❱ 138 │   │   │   self._backward_fn(step_output.closure_loss)                                    │
│   139 │   │                                                                                      │
│   140 │   │   return step_output                                                                 │
│   141                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/loop │
│ s/optimization/automatic.py:239 in backward_fn                                                   │
│                                                                                                  │
│   236 │   │   │   return None                                                                    │
│   237 │   │                                                                                      │
│   238 │   │   def backward_fn(loss: Tensor) -> None:                                             │
│ ❱ 239 │   │   │   call._call_strategy_hook(self.trainer, "backward", loss, optimizer)            │
│   240 │   │                                                                                      │
│   241 │   │   return backward_fn                                                                 │
│   242                                                                                            │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/trai │
│ ner/call.py:309 in _call_strategy_hook                                                           │
│                                                                                                  │
│   306 │   │   return None                                                                        │
│   307 │                                                                                          │
│   308 │   with trainer.profiler.profile(f"[Strategy]{trainer.strategy.__class__.__name__}.{hoo   │
│ ❱ 309 │   │   output = fn(*args, **kwargs)                                                       │
│   310 │                                                                                          │
│   311 │   # restore current_fx when nested context                                               │
│   312 │   pl_module._current_fx_name = prev_fx_name                                              │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/stra │
│ tegies/strategy.py:213 in backward                                                               │
│                                                                                                  │
│   210 │   │   assert self.lightning_module is not None                                           │
│   211 │   │   closure_loss = self.precision_plugin.pre_backward(closure_loss, self.lightning_m   │
│   212 │   │                                                                                      │
│ ❱ 213 │   │   self.precision_plugin.backward(closure_loss, self.lightning_module, optimizer, *   │
│   214 │   │                                                                                      │
│   215 │   │   closure_loss = self.precision_plugin.post_backward(closure_loss, self.lightning_   │
│   216 │   │   self.post_backward(closure_loss)                                                   │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/plug │
│ ins/precision/precision.py:72 in backward                                                        │
│                                                                                                  │
│    69 │   │   │   \**kwargs: Keyword arguments for the same purpose as ``*args``.                │
│    70 │   │                                                                                      │
│    71 │   │   """                                                                                │
│ ❱  72 │   │   model.backward(tensor, *args, **kwargs)                                            │
│    73 │                                                                                          │
│    74 │   @override                                                                              │
│    75 │   def post_backward(self, tensor: Tensor, module: "pl.LightningModule") -> Tensor:  #    │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/lightning/pytorch/core │
│ /module.py:1090 in backward                                                                      │
│                                                                                                  │
│   1087 │   │   if self._fabric:                                                                  │
│   1088 │   │   │   self._fabric.backward(loss, *args, **kwargs)                                  │
│   1089 │   │   else:                                                                             │
│ ❱ 1090 │   │   │   loss.backward(*args, **kwargs)                                                │
│   1091 │                                                                                         │
│   1092 │   def toggle_optimizer(self, optimizer: Union[Optimizer, LightningOptimizer]) -> None:  │
│   1093 │   │   """Makes sure only the gradients of the current optimizer's parameters are calcu  │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/_tensor.py:492   │
│ in backward                                                                                      │
│                                                                                                  │
│    489 │   │   │   │   create_graph=create_graph,                                                │
│    490 │   │   │   │   inputs=inputs,                                                            │
│    491 │   │   │   )                                                                             │
│ ❱  492 │   │   torch.autograd.backward(                                                          │
│    493 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs                     │
│    494 │   │   )                                                                                 │
│    495                                                                                           │
│                                                                                                  │
│ /home/incognito/miniconda3/envs/kraken-5.2.0/lib/python3.11/site-packages/torch/autograd/__init_ │
│ _.py:251 in backward                                                                             │
│                                                                                                  │
│   248 │   # The reason we repeat the same comment below is that                                  │
│   249 │   # some Python versions print out the first line of a multi-line function               │
│   250 │   # calls in the traceback and some print out the last line                              │
│ ❱ 251 │   Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the bac   │
│   252 │   │   tensors,                                                                           │
│   253 │   │   grad_tensors_,                                                                     │
│   254 │   │   retain_graph,                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: GET was unable to find an engine to execute this computation

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Can be related to my version of python under conda?

(kraken-5.2.0) incognito@DESKTOP-NHKR7QL:~/kraken-train/sam_finetuning$ python --version
Python 3.11.9

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Latest problem, please disregard... was my misstake with $LD_LIBRARY_PATH having other libraries, as soon as I deleted the LD_LIBRARY_PATH ENV, it works...

Trainable params: 1.3 M
Non-trainable params: 0
Total params: 1.3 M
Total estimated model params size (MB): 5
stage 0/50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0:00:18 • 0:00:00 3.11it/s val_accuracy: 0.991 val_mean_acc: 0.991 val_mean_iu: 0.470 val_freq_iu: 0.917
stage 1/50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0:00:18 • 0:00:00 3.11it/s val_accuracy: 0.993 val_mean_acc: 0.993 val_mean_iu: 0.504 val_freq_iu: 0.931
stage 2/50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━ 53/56 0:00:17 • 0:00:01 3.03it/s val_accuracy: 0.993 val_mean_acc: 0.993 val_mean_iu: 0.504 val_freq_iu: 0.931

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Side question: to use the new implemented line extractor should I train a new model from scratch? Or still can be finetuned the old models?

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Just made a test and I fine-tuned an old model, as far as I'm concernet in this case it outperforms with roughly 40-50% in performance/accuracy the old kraken 4.x, I just love it! I will test further. Good job @mittagessen !
EDIT: thanks for the link to the changelog, couldn't find it yet and I was very curious.

from kraken.

mittagessen avatar mittagessen commented on August 30, 2024

I've tagged a 5.2.1 release and pushed the packages to pypi/anaconda.

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

from kraken.

johnlockejrr avatar johnlockejrr commented on August 30, 2024

Seems ok now. I can do segmentation now.

New trained model on v5.2.0:

(kraken-5.2.1) incognito@DESKTOP-NHKR7QL:~$ kraken -i 159.jpg lines.json segment -bl -i ./kraken-train/sam_finetuning/ubma_sam_v4.mlmodel
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2.post303 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading ANN ./kraken-train/sam_finetuning/ubma_sam_v4.mlmodel   ✓
Segmenting      [04/22/24 17:31:44] WARNING  Polygonizer failed on line 0: LineStrings must have at least 2 coordinate tuples                                                                                                                                              segmentation.py:771
✓

Old trained model on 4.3.13:

(kraken-5.2.1) incognito@DESKTOP-NHKR7QL:~$ kraken -i 159.jpg lines-old.json segment -bl -i ./ubma_sam_v2.mlmodel
scikit-learn version 1.2.2 is not supported. Minimum required version: 0.17. Maximum required version: 1.1.2. Disabling scikit-learn conversion API.
Torch version 2.1.2.post303 has not been tested with coremltools. You may run into unexpected errors. Torch 2.0.0 is the most recent version that has been tested.
Loading ANN ./ubma_sam_v2.mlmodel       ✓
Segmenting      ✓

from kraken.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.