Giter Site home page Giter Site logo

Comments (6)

captainvera avatar captainvera commented on August 20, 2024 1

Aah, that sounds good. I'll follow these instructions. Is there any reason why QUETCH wasn't a part of the OpenKiwi submission? Is it because QUETCH is low performing when compared to others?

Yes, exactly. Also, since it works similarly to NuQE, it didn't provide enough predictive variety to be worth including in the ensemble.

Well, to answer your question, I first need to disambiguate that the NuQE model and a predictor model are different things. The Predictor model is part of a two-part QE model called Predictor-Estimator. This predictor is a pre-training step that became popular due to the lack of QE-specific data. In order to have it predict QE labels you would also have to train the Estimator part. NuQE on the other hand is an end-to-end QE model that can learn how to predict word quality tags.

In order to train (any) QE model you need the following data:
(names might not be exactly as in the config files as I'm doing it from memory)

  • train-source: English content (src)
  • train-Target: Translated content (tgt)
  • (optionally) train-alignments: Models like NuQE depend on having alignments between src and tgt
  • (optionally) train-pe: Post-edited data (derived from translations) to continue training the Predictor in a Predictor-Estimator model.
    Most importantly, one (or both) of the following:
  • train-target-tags: OK/BAD tags derived from comparing a translation with a correct post-edition of the data.
  • train-sentence-scores: [0-1] scores derived from either DAs of the translation or HTER between translation and post-edition.

If you're trying to train your own QE model then you need to have these triplets (src, tgt, correct translation) to be able to generate the tags and then train a QE model. If you only want to analyse the quality of your translated content but possess no fixed translation then I am afraid you cannot train your own model and would need to use a pre-trained model. Unfortunately the ones we provide in this Repo are very WMT specific.

from openkiwi.

captainvera avatar captainvera commented on August 20, 2024

Hey Achyuth,

The config files for QUETCH are very similar to NuQE, since this model is based on QUETCH. We didn't make these files available as we didn't use QUETCH for our OpenKiwi paper submission.

If you take a look at kiwi/cli/models/quetch.py you'll see that the options available are the same as in NuQE (kiwi/cli/models/nuqe.py). The two models support the same set of configuration options. So you can interchangeable use the configs.

As for the prediction and evaluation configuration files, they are mostly model independent. You can tweak the parameters and use them to predict and evaluate a trained QUETCH model.

Let me know if you have any other questions,

Miguel

from openkiwi.

warlock2k avatar warlock2k commented on August 20, 2024

Aah, that sounds good. I'll follow these instructions. Is there any reason why QUETCH wasn't a part of the OpenKiwi submission? Is it because QUETCH is low performing when compared to others?

from openkiwi.

warlock2k avatar warlock2k commented on August 20, 2024

@captainvera Also, could you kindly help with letting me know the inputs required for training a the NuQE predictor model. It is at the moment unclear and we are trying to train a NuQE model with our own data. Is there a simple document that explains what exactly the training module consumes if we consider that as a black box.

So far I gathered that these are the following files required for training a predictor model which will Help evaluate the quality of translations.

Let us take an example of English - DE

  • train-source : English content.
  • train-target : Translated content corresponding to English.
  • extend-source-vocab, extend-target-vocab, valid-source, valid-target.
    (what is valid-source, valid-target here?)

I'm just interested right now to evaluate the quality of translated content for my English content.

from openkiwi.

warlock2k avatar warlock2k commented on August 20, 2024

It all makes a lot more sense to me now, thanks for taking the time. Appreciate it!

from openkiwi.

captainvera avatar captainvera commented on August 20, 2024

No worries!
I will close this issue for now, feel free to open another if you have any other questions!

from openkiwi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.