Giter Site home page Giter Site logo

goandseeleah / sacrebleu Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mjpost/sacrebleu

0.0 0.0 0.0 134 KB

Reference BLEU implementation that auto-downloads test sets and reports a version string to facilitate cross-lab comparisons

License: Apache License 2.0

Python 95.20% Shell 4.80%

sacrebleu's Introduction

SacreBLEU (Post, 2018) provides hassle-free computation of shareable, comparable, and reproducible BLEU scores. Inspired by Rico Sennrich's multi-bleu-detok.perl, it produces the official WMT scores but works with plain text. It also knows all the standard test sets and handles downloading, processing, and tokenization for you.

Why use this version of BLEU?

  • It automatically downloads common WMT test sets and processes them to plain text
  • It produces a short version string that facilitates cross-paper comparisons
  • It properly computes scores on detokenized outputs, using WMT (Conference on Machine Translation) standard tokenization
  • It produces the same values as official script (mteval-v13a.pl) used by WMT
  • It outputs the BLEU score without the comma, so you don't have to remove it with sed (Looking at you, multi-bleu.perl)

QUICK START

Install the Python module (Python 3 only)

pip3 install sacrebleu

This installs a shell script, sacrebleu. (You can also directly run the shell script sacrebleu.py in the source repository).

Get a list of available test sets:

sacrebleu

Download the source for one of the pre-defined test sets:

sacrebleu -t wmt14 -l de-en --echo src > wmt14-de-en.src

(you can also use long parameter names for readability):

sacrebleu --test-set wmt14 --langpair de-en --echo src > wmt14-de-en.src

After tokenizing, translating, and detokenizing it, you can score your decoder output easily:

cat output.detok.txt | sacrebleu -t wmt14 -l de-en

SacreBLEU knows about common WMT test sets, but you can also use it to score system outputs with arbitrary references. It also works in backwards compatible model where you manually specify the reference(s), similar to the format of multi-bleu.txt:

cat output.detok.txt | sacrebleu REF1 [REF2 ...]

Note that the system output and references will all be tokenized internally.

SacreBLEU generates version strings like the following. Put them in a footnote in your paper! Use --short for a shorter hash if you like.

BLEU+case.mixed+lang.de-en+test.wmt17 = 32.97 66.1/40.2/26.6/18.1 (BP = 0.980 ratio = 0.980 hyp_len = 63134 ref_len = 64399)

MOTIVATION

Comparing BLEU scores is harder than it should be. Every decoder has its own implementation, often borrowed from Moses, but maybe with subtle changes. Moses itself has a number of implementations as standalone scripts, with little indication of how they differ (note: they mostly don't, but multi-bleu.pl expects tokenized input). Different flags passed to each of these scripts can produce wide swings in the final score. All of these may handle tokenization in different ways. On top of this, downloading and managing test sets is a moderate annoyance. Sacre bleu! What a mess.

SacreBLEU aims to solve these problems by wrapping the original Papineni reference implementation together with other useful features. The defaults are set the way that BLEU should be computed, and furthermore, the script outputs a short version string that allows others to know exactly what you did. As an added bonus, it automatically downloads and manages test sets for you, so that you can simply tell it to score against 'wmt14', without having to hunt down a path on your local file system. It is all designed to take BLEU a little more seriously. After all, even with all its problems, BLEU is the default and---admit it---well-loved metric of our entire research community. Sacre BLEU.

LICENSE

SacreBLEU is licensed under the Apache 2.0 License.

CREDITS

This was all Rico Sennrich's idea. Originally written by Matt Post. The official version can be found at https://github.com/awslabs/sockeye/tree/master/contrib/sacrebleu.

sacrebleu's People

Contributors

mjpost avatar ozancaglayan avatar cfedermann avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.