While investigating using argos-translate as a library, I have noticed non-determinist

I <a href="https://forum.opennmt.net/t/suggested-value-for-length-penalty/4134" rel="n

Question about non-deterministic results about argos-translate HOT 6 CLOSED

argosopentech commented on May 21, 2024

Question about non-deterministic results

from argos-translate.

Comments (6)

mikemoritz commented on May 21, 2024 1

I had also noticed that the English -> Chinese translation was just providing the English string back. Looking at multiple hypotheses for that shows that the "valid" translation is getting a lower score:

hypotheses=0
        batch=0:
                raw: {'tokens': ['▁H', 'ello', '▁world', '!'], 'score': -2.543808937072754}
                tokens: ['▁H', 'ello', '▁world', '!']
        debug translation:
                "Hello world!"
hypotheses=1
        batch=0:
                raw: {'tokens': ['▁', '希', '洛', '世界', '!'], 'score': -3.0267810821533203}
                tokens: ['▁', '希', '洛', '世界', '!']
        debug translation:
                "希洛世界!"

from argos-translate.

mikemoritz commented on May 21, 2024 1

Thanks for the quick response.

Yes I ruled out the Stanza and SentencePiece steps as the source of non-determinism.

I didn't have any luck with CTranslate, but I did find this interesting paper from Google on their NMT and beam search implementation: https://arxiv.org/abs/1609.08144

Specifically this on length normalization/penalty:

With length normalization, we aim to account for
the fact that we have to compare hypotheses of different length. Without some form of length-normalization
regular beam search will favor shorter results over longer ones on average since a negative log-probability
is added at each step, yielding lower (more negative) scores for longer sentences.

They recommend defaults for length normalization/penalty and coverage penalty as 0.2. CTranslate defaults these to zero, but the forums do recommend them as tuning parameters. Setting these parameters to 0.2 did provide consistent results between my two hosts, so this could be something to consider but may need to be experimentally determined with your models?

Thanks for the details on the Chinese translation (and sorry for conflating it in this issue...).

from argos-translate.

PJ-Finlay commented on May 21, 2024 1

I asked about length_penalty on the OpenNMT forum and it sounds like ideally this would fine-tuned for individual language pairs. For now I'm just setting it to 0.2 which seems like a better default then 0 based on the linked paper and keeps the simplicity of one pipeline for all languages.

I'm closing this issue now but leaving the other one open for Chinese translations.

from argos-translate.

PJ-Finlay commented on May 21, 2024

Thanks for the detailed bug report and happy to hear your considering using the library. I didn't attempt to make translations deterministic but you're right this would be a good feature to have for people that need it.

You said, "[you] didn't notice any difference in the actual argos-translate parsing logic", so you've ruled out Stanza sentence boundary detection and SentencePiece tokenization as the source of non-determinism? If not it's possible that CTranslate is deterministic but the sentence boundary detection or tokenization is causing overall non-determinism.

I don’t think the random seed issue you linked is going to be the solution if this is a CTranslate issue. My understanding is that CTranslate does not depend on PyTorch and Argos Translate only requires PyTorch for Stanza. Also the models are trained using OpenNMT for TensorFlow.

I looked at the CTranslate documentation and I don’t see anything about it being deterministic or having the option of being deterministic. I'm not that familiar with the specifics of CTranslate and I would recommend making an issue on the CTranslate project. I've also gotten pretty good support from the OpenNMT forum in the past so you could post there too.

For debugging the only thing I can think of is that CTranslate generates a random seed somewhere or somehow behaves differently with different amounts of memory available. If this is true the best solution may be to use a container or virtual machine for a standard environment if you need deterministic results. It looks like a larger beam size uses more memory so if CTranslate is using a smaller beam size when there is less memory available then setting the beam size to 1 could give you deterministic translations (but potentially lower quality).

The Chinese translation is a separate issue that I am aware of and is caused by a lack of data. I got all of the translation data from the Opus Open Parallel Corpus and there’s a surprising lack of data available for the English-Chinese pair. There are only 333 million tokens available for English-Chinese which isn’t very many compared to other languages (there are 4.5 billion for English-German). This causes it to often repeat the English input when translating to Chinese especially with single words or short sentences. I’ve found that entering a longer sentence or putting a period after your input increases the likelihood of getting a real translation. This gets to your question of whether “Hello world!” is a good test string, it should be fine in most cases but a full sentence is probably closer to the training data and may give you better results.

The ideas I’ve had to fix this are to either find more data (including maybe a English-Chinese dictionary to supplement what is mostly full sentence data and help with short translations), or if this is over-training issue train for fewer epochs than I am for languages with more data. It’s interesting that if you generate multiple hypothesises you do get a real translation just at a worse score. I hadn’t thought to try this, and it seems like evidence for overfitting. There could be a hack where you use the second best translation for short English to Chinese translations but that’s probably not an ideal solution.

Please share if you find any good solutions, adding support of deterministic translations would be a nice feature to have. Ideally this could be done in a way that relies on a stable interface from CTranslate.

from argos-translate.

PJ-Finlay commented on May 21, 2024

Very interesting, the coverage penalty looks promising to look into as an enhancement. It could lead to to improved translation quality with a small code change and no model retraining.

from argos-translate.

pierotofy commented on May 21, 2024

I had also noticed that the English -> Chinese translation was just providing the English string back.

Came to open an issue about this, but somebody already beat me to it. 🥂

from argos-translate.

Question about non-deterministic results about argos-translate HOT 6 CLOSED

Comments (6)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent