Giter Site home page Giter Site logo

fredblain / docqe Goto Github PK

View Code? Open in Web Editor NEW
6.0 4.0 0.0 550.77 MB

Resources for sentence- and document-level Quality Estimation

License: BSD 3-Clause "New" or "Revised" License

Python 0.01% Shell 0.02% Perl 0.01% JavaScript 42.23% Ruby 56.39% Smalltalk 0.75% HTML 0.59% Forth 0.01%

docqe's Introduction

Resources for sentence- and document-level Quality Estimation

This repository contains the resources used for our COLING'18 paper, and released as part of our framework for neural-based Quality Estimation (DeepQuest). If you use this data, please cite:

DeepQuest: a framework for neural-based Quality Estimation. Julia Ive, Frédéric Blain, Lucia Specia (2018).

@article{ive2018deepquest,
  title={DeepQuest: a framework for neural-based Quality Estimation},
  author={Julia Ive and Frédéric Blain and Lucia Specia},
  journal={In the Proceedings of COLING 2018, the 27th International Conference on Computational Linguistics, Sante Fe, New Mexico, USA},
  year={2018}
}

**Update May '19**

Following the update adding both 2018 and 2019 datasets and submissions to the WMT MT task, I've added a script to compute and gather both TER (using TERcom) and BLEU (using NLTK) scores at sentence-level (note: scores computed against the reference, not PE!). This results into a parallel corpus, aligned at sentence-level, with both TER and BLEU scores as quality labels.

Instructions

Before running anything, check the shell scripts (under scripts/) to update the paths to the required third-party tools (such as TERcom, Moses' tokenizer, etc.). Then, to download and process the datasets along with the official submissions, simply run: bash process_all.sh.

By default, the scripts download and process the language pairs I needed: e.g. English<>German, English-French and English-Russian, from WMT'2008 to 2019. Feel free to update get_newstests.sh and get_systems.sh according to your needs. The way to modify the scripts should be straightforward. If not, feel free to ask for help by opening an issue (so others in similar situation can benefit from the discussion).

docqe's People

Contributors

fredblain avatar

Stargazers

Javad Pourmostafa avatar Yaya Shi avatar  avatar Alex Jiang avatar mengshi Yu avatar Chris Hokamp avatar

Watchers

 avatar James Cloos avatar Chris Hokamp avatar paper2code - bot avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.