Giter Site home page Giter Site logo

Comments (6)

captainvera avatar captainvera commented on July 20, 2024 1

Closing this since the issue has been solved.

Feel free to reopen if you have any further questions @MiroFurtado

from openkiwi.

MiroFurtado avatar MiroFurtado commented on July 20, 2024

oops - didn't mean to tag as bug!

from openkiwi.

captainvera avatar captainvera commented on July 20, 2024

Hello @MiroFurtado

First of all, thanks for your interest in OpenKiwi!

You are right, the artificially generated training corpus (with ~500k triplets) we reference on our paper is not available publicly. It is based on the in-domain German corpus provided by WMT which is what we used to pre-train the predictors. We provide a link for this corpus in our Quickstart section here.

We had no plans for making this data available, but you raise a valid concern for reproducibility.
Thanks for bringing this to our attention!
We have to take some things into consideration but we will get back to you soon!

from openkiwi.

MiroFurtado avatar MiroFurtado commented on July 20, 2024

Great! Let me know what you end up deciding.

from openkiwi.

trenous avatar trenous commented on July 20, 2024

Actually the triplets are available in the data section on this site. The download there contains both the 500k triplets we used, and a larger corpus of 4 million triplets.

To generate QE tags from the triplets, you can follow the instructions in this repository

from openkiwi.

trenous avatar trenous commented on July 20, 2024

Hello @MiroFurtado ,

I uploaded the missing tags and sentence scores for the artificial roundtrip data to our releases.

If you wanted to recreate these files using the repository I posted in my previous comment, you need to train a FastAlign model, which is used to generate the source tags. We trained FastAlign on the English-German indomain corpus of 3 million parallel sentences, if you use a different FastAlign model you will get slightly different source tags.

from openkiwi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.