When I run the experiment with training bert based qe models and predicting only the s

Hi, <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

maybe another solution? You could edit</p

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

training failed when only predicting source tags about openkiwi HOT 10 CLOSED

iamhere1 commented on August 20, 2024

training failed when only predicting source tags

from openkiwi.

Comments (10)

daandouwe commented on August 20, 2024

This is interesting, thank you. I believe we have never trained on just source tags, so that must be why we have never encountered this error before. Thank you for reporting it.

Let me just setup a little experiment to try and reproduce this, and will report to you when I understand how this could be fixed.

from openkiwi.

daandouwe commented on August 20, 2024

I just tried to train Bert and XLM-R based models on GPU with predicting just source tags, and did not encounter this problem. I used

trainer:
    main_metric:
        - source_tags_MCC

and

system:
    model:
        outputs:
            word_level:
                target: false
                gaps: false
                source: true

Could you share (the relevant parts of) your config? For example, what is main_metric that you are using?

from openkiwi.

iamhere1 commented on August 20, 2024

Thank you for your help, I use source_tags_F1_MULT and source_tags_CORRECT as my main metric, and the sentence level tag is not predicted in my experiment. The relevant parts in my config are as following.

outputs:
        ####################################################
        # Output options configure the downstream tasks the
        #  model will be trained on by adding specific layers
        #  responsible for transforming decoder features into
        #  predictions.
        word_level:
            target: false
            gaps: false
            source: true
            class_weights:
                target_tags:
                    BAD: 3.0
                gap_tags:
                    BAD: 5.0
                source_tags:
                    BAD: 3.0
        sentence_level:
            hter: false
            use_distribution: false
            binary: false
        n_layers_output: 2
        sentence_loss_weight: 1

and

main_metric:
    - source_tags_F1_MULT
    - source_tags_CORRECT

from openkiwi.

iamhere1 commented on August 20, 2024

Hi, @daandouwe
I tried to replace the code in callbacks.py like this, and it seems worked now

    # mode_dict = {
    #     'min': np.less,
    #     'max': np.greater,
    #     'auto': np.greater if 'acc' in self.monitor else np.less,
    # }
    mode_dict = {
        'min': torch.lt,
        'max': torch.gt,
        'auto': torch.gt if 'acc' in self.monitor else torch.lt,
    }

from openkiwi.

daandouwe commented on August 20, 2024

Thanks!

Turns out that the problem is caused by the fact that not all the metric values have been moved to CPU.

Inspecting the values in the dictionary metrics in https://github.com/Unbabel/OpenKiwi/blob/master/kiwi/training/callbacks.py#L94 tells us the following:

{'F1_BAD': tensor(0.6000),
 'F1_OK': tensor(0.6092),
 'loss': tensor(205.8955, device='cuda:0'),
 'metrics': {'F1_BAD': tensor(0.6000),
             'F1_OK': tensor(0.6092),
             'source_tags_CORRECT': tensor(0.6047, device='cuda:0'),
             'source_tags_F1_MULT': 0.36551724137931035,
             'source_tags_F1_MULT+source_tags_CORRECT': tensor(0.9702, device='cuda:0'),
             'source_tags_MCC': tensor(0.2713, dtype=torch.float64)},
 'source_tags_CORRECT': tensor(0.6047, device='cuda:0'),
 'source_tags_F1_MULT': 0.36551724137931035,
 'source_tags_F1_MULT+source_tags_CORRECT': tensor(0.9702, device='cuda:0'),
 'source_tags_MCC': tensor(0.2713, dtype=torch.float64),
 'val_F1_BAD': tensor(0.4514),
 'val_F1_OK': tensor(0.6182),
 'val_loss': tensor(141.6612, device='cuda:0'),
 'val_loss_source_tags': tensor(141.6612, device='cuda:0'),
 'val_source_tags_CORRECT': tensor(0.5497, device='cuda:0'),
 'val_source_tags_F1_MULT': 0.27903935726135765,
 'val_source_tags_F1_MULT+source_tags_CORRECT': tensor(0.8288, device='cuda:0'),
 'val_source_tags_MCC': tensor(0.1847, dtype=torch.float64)}

This is also the reason that main metric source_tags_MCC works but source_tags_CORRECT does not. (I believe this flew under the radar because it seems to be the case for the lesser-used metrics).

We will need to solve this by making sure all the metrics return torch tensors that have been moved to CPU. Preferably, all the values returned by metrics will actually just be python floats.

We will try and fix this in a PR.

from openkiwi.

iamhere1 commented on August 20, 2024

Yes, Thank you for your help! And to replace the fuctions of numpy with the operation of pytorch, maybe another solution?

from openkiwi.

iamhere1 commented on August 20, 2024

Thanks!

Turns out that the problem is caused by the fact that not all the metric values have been moved to CPU.

Inspecting the values in the dictionary metrics in https://github.com/Unbabel/OpenKiwi/blob/master/kiwi/training/callbacks.py#L94 tells us the following:
{'F1_BAD': tensor(0.6000),
 'F1_OK': tensor(0.6092),
 'loss': tensor(205.8955, device='cuda:0'),
 'metrics': {'F1_BAD': tensor(0.6000),
             'F1_OK': tensor(0.6092),
             'source_tags_CORRECT': tensor(0.6047, device='cuda:0'),
             'source_tags_F1_MULT': 0.36551724137931035,
             'source_tags_F1_MULT+source_tags_CORRECT': tensor(0.9702, device='cuda:0'),
             'source_tags_MCC': tensor(0.2713, dtype=torch.float64)},
 'source_tags_CORRECT': tensor(0.6047, device='cuda:0'),
 'source_tags_F1_MULT': 0.36551724137931035,
 'source_tags_F1_MULT+source_tags_CORRECT': tensor(0.9702, device='cuda:0'),
 'source_tags_MCC': tensor(0.2713, dtype=torch.float64),
 'val_F1_BAD': tensor(0.4514),
 'val_F1_OK': tensor(0.6182),
 'val_loss': tensor(141.6612, device='cuda:0'),
 'val_loss_source_tags': tensor(141.6612, device='cuda:0'),
 'val_source_tags_CORRECT': tensor(0.5497, device='cuda:0'),
 'val_source_tags_F1_MULT': 0.27903935726135765,
 'val_source_tags_F1_MULT+source_tags_CORRECT': tensor(0.8288, device='cuda:0'),
 'val_source_tags_MCC': tensor(0.1847, dtype=torch.float64)}
This is also the reason that main metric source_tags_MCC works but source_tags_CORRECT does not. (I believe this flew under the radar because it seems to be the case for the lesser-used metrics).

We will need to solve this by making sure all the metrics return torch tensors that have been moved to CPU. Preferably, all the values returned by metrics will actually just be python floats.

We will try and fix this in a PR.

Yeah, that's more reasonable. Thank you!

from openkiwi.

daandouwe commented on August 20, 2024

maybe another solution?

You could edit

        current = metrics.get(self.monitor)
        if self.monitor_op(current - self.min_delta, self.best):

        current = metrics.get(self.monitor)
        if not isinstance(current, float):
            current = current.cpu().item()
        if self.monitor_op(current - self.min_delta, self.best):

in https://github.com/Unbabel/OpenKiwi/blob/master/kiwi/training/callbacks.py#L96 (in your local kiwi path).

I tried this and it worked, but of course it's not as nice as dealing with the problem at the root ;). We'll keep you updated on that.

from openkiwi.

iamhere1 commented on August 20, 2024

maybe another solution?

You could edit
        current = metrics.get(self.monitor)
        if self.monitor_op(current - self.min_delta, self.best):
to
        current = metrics.get(self.monitor)
        if not isinstance(current, float):
            current = current.cpu().item()
        if self.monitor_op(current - self.min_delta, self.best):
in https://github.com/Unbabel/OpenKiwi/blob/master/kiwi/training/callbacks.py#L96 (in your local kiwi path).

I tried this and it worked, but of course it's not at nice and dealing with the problem at the root ;). We'll keep you updated on that.

OK, thanks for your help, now it's woking.

from openkiwi.

captainvera commented on August 20, 2024

Hey @iamhere1,

I'm glad it's working, closing this issue due to inactivity.

from openkiwi.

training failed when only predicting source tags about openkiwi HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent