Giter Site home page Giter Site logo

apertium / apertium-regtest Goto Github PK

View Code? Open in Web Editor NEW
1.0 10.0 0.0 332 KB

Regression testing system for Apertium language data and translators

Home Page: https://wiki.apertium.org/wiki/Apertium-regtest

License: GNU General Public License v3.0

Python 53.59% HTML 7.27% CSS 1.95% JavaScript 35.30% Makefile 0.65% M4 0.23% Shell 1.03%
testing golden-master characterization-tests apertium mt approval-testing

apertium-regtest's Introduction

apertium-regtest

Regression testing system for Apertium.

Full documentation: https://wiki.apertium.org/wiki/Apertium-regtest

Installation

Can be run as-is by invoking the apertium-regtest.py file in this directory or by running

$ autoreconf -fvi
$ ./configure
# make install

Which will install it as the command apertium-regtest

Usage

Static testing

apertium-regtest test runs all tests and reports the results, exiting with error code 0 if all pass and 1 otherwise.

Interactively updating tests

Test data can be updated either from a browser or from a terminal. For browser mode, run apertium-regtest web and for terminal apertium-regtest cli.

apertium-regtest's People

Contributors

mr-martian avatar tinodidriksen avatar unhammer avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

apertium-regtest's Issues

show changed, final results first?

Here's a typical session of how it looks when I open the web UI after make test complained:

regtest-changes.webm

I try to hide/show unchanged, nothing happens. I try to only show generated, that doesn't seem to help. Then I remember that one of the modes uses postgen, that changes something but I guess I'm still including unchanged things. Another hide/show unchanged. Finally I notice there's a page 2 – and there are the entries I'm after (ie. things that changed in generator).

I feel like something could be done to improve the flow here, but I'm not sure what. I think showing things that actually changed in the last step of the pipeline first in the list would help though (most of the time I don't care about changes in the middle of the pipeline.)

Alternate success threshold

We often want nightly builds to succeed even if they aren't perfect. It would be nice with a way to set a different criteria for success. E.g., envvar AP_REGTEST_MIN=80 saying 80% is good enough to pass, with default value being 100.

Should also be settable as a cmdline parameter, but envvars are easier to use in many builds.

diff acts weird when non-ascii characters

In a bunch of different words with æ, we get all the characters before æ diffed, and not æ or any of the characters after æ diffed. E.g., (with parts in red bolded and parts in green in italics):

  • xyXyæhargle
  • xyzXyzæbargle
  • xyzaXyzaæfoo

(Note that this issue was encountered while fiddling with modes and we hadn't added back the -g switch, which is why the capitalisation is what's different. That issue is resolved and we are no longer getting these diffs)

Extra resources pulled from the page

It seems these resources are not really needed (at least, they do not exist in the repo).

<link rel="stylesheet" type="text/css" href="local.css">
<script src="local.js"></script>

loses accepted-choices after hitting `run`

 apertium-regtest cli

Running regression tests for apertium-sme-smj
Type `help` for a list of available commands.

Loading corpora...
Corpus sme-smj has 0 lines to be examined.
Corpus sme-smj-pending has 0 lines to be examined.
Corpus sme-smj-regression has 1 lines to be examined.
sme-smj-regression 63 of 181
INPUT:
  Dávda sáhttá dagahit šattalmasa ja oktiišaddama.
EXPECTED OUTPUT:
  Dávdda máhttá sjattalvisáv ja aktijsjaddamav dahkat.
ACTUAL OUTPUT:
  Dávdda máhttá sjattalvisáv ja aktij sjaddamav dahkat.
IDEAL OUTPUTS:
  Dávdda máhttá sjattalvissaj ja aktijsjaddamij vájkkudit.
> a
> run
Running sme-smj
Corpus sme-smj has 0 lines to be examined.
Running sme-smj-pending
Corpus sme-smj-pending has 0 lines to be examined.
Running sme-smj-regression
Corpus sme-smj-regression has 1 lines to be examined.
sme-smj-regression 63 of 181
INPUT:
  Dávda sáhttá dagahit šattalmasa ja oktiišaddama.
EXPECTED OUTPUT:
  Dávdda máhttá sjattalvisáv ja aktijsjaddamav dahkat.
ACTUAL OUTPUT:
  Dávdda máhttá sjattalvisáv ja aktij sjaddamav dahkat.
IDEAL OUTPUTS:
  Dávdda máhttá sjattalvissaj ja aktijsjaddamij vájkkudit.

very annoying if you've carefully accepted/golded/skipped 30 things

can't add as gold

 apertium-regtest -p 3333 web
Starting server
Open http://localhost:3333 in your browser
----------------------------------------
Exception happened during processing of request from ('127.0.0.1', 57376)
Traceback (most recent call last):
  File "/usr/lib/python3.8/socketserver.py", line 683, in process_request_thread
    self.finish_request(request, client_address)
  File "/usr/lib/python3.8/socketserver.py", line 360, in finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/usr/bin/apertium-regtest", line 649, in __init__
    super().__init__(request, client_address, server, directory=directory)
  File "/usr/lib/python3.8/http/server.py", line 647, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/lib/python3.8/socketserver.py", line 747, in __init__
    self.handle()
  File "/usr/lib/python3.8/http/server.py", line 427, in handle
    self.handle_one_request()
  File "/usr/lib/python3.8/http/server.py", line 415, in handle_one_request
    method()
  File "/usr/bin/apertium-regtest", line 662, in do_POST
    self.do_callback(urllib.parse.parse_qs(data.decode('utf-8')))
  File "/usr/bin/apertium-regtest", line 743, in do_callback
    Corpus.all_corpora[corp].set_gold(hsh, golds, stp)
  File "/usr/bin/apertium-regtest", line 556, in set_gold
    blob = self.step(step)
  File "/usr/bin/apertium-regtest", line 472, in step
    return self.data['cmds'][self.commands.get(s, -1)]
KeyError: 'cmds'
----------------------------------------

optionally show forms that correspond to the numbers in test mode

Currently apertium-regtest test has output like this:

$ apertium-regtest -c .-morph test
Corpus 1 of 3: -s plurals-morph
  13/13 (100.0%) tests pass (3/13 (23.08%) match gold)

Corpus 2 of 3: -es plurals-morph
  20/20 (100.0%) tests pass (3/20 (15.0%) match gold)

Corpus 3 of 3: irregular plurals-morph
  22/22 (100.0%) tests pass (2/22 (9.09%) match gold)

All tests pass.

It would be nice if there were a way to have output like morph-test/aq-morphtest to show forms that do and/or don't match gold (or, alternatively, tests that are and/or are not expected).

wrong order of debug-modes, irrelevant steps shown

generator should be last, autoseq between pretransfer and biltrans:
bilde
and nob-dan doesn't even have interchunk/postchunk/chunker! They shouldn't show when nob-dan is selected.
(The other directions still do though.)

Default to last step

I always end up clicking the last three steps (because they're named slightly differently in the different modes) whenever I open up regtest. I always want to first see the difference in output (typically it's expected improvements – or I can tell from the final output what went wrong), only when there's something I don't understand why changed do I want to delve into the pipeline.

Can we default to showing the last possible step, or make it a preference cookie or something?

multiline diff

I got confused by the inline red/green colors of the diffs, is green expected/current/gold here?:

bilde

It'd be nice to have (perhaps optional) multiline diffs like:

EXPECTED: ^Det<SN><@SUBJ→><nt><pl><nom>{^den<det><nt><pl><acc>$}$ …
CURRENT: ^Det<SN><@SUBJ→><nt><pl><nom>{^den<det><dem><nt><pl><acc>$}$  …
(GOLDs: …)

And notably, including the label to the left so new users (or old users with bad memories) immediately see what's what.
(golds only applicable for generation of course)

Also, even better would be if the changed words were colored like in magit and wikipedia diffs:
bilde

Option to remove golds

In case of accidental addition of the wrong gold or a change in what the user decides they want to be gold, it would be cool if there were an option to select golds and remove them as the user saw fit.

sort analyses

^a/b<n>/c<n>$ and a/c<n>/b<n>$ should be treated as the same string.

single test focus / IDE-mode

I use a script to recompile and re-run on save, looks like

Peek.12-12-2022.14-28.webm

but for Most People I think it'd be really nice to have something like that in regtest – regtest already kind of does this, you can change files and click re-run corpus and see the new output. But when working on transfer (in particular) I often like to zoom in on one sentence and see the full analysis tree as well as the generated output and final translation all in one go, every time I click save.

Regtest could have a 🔍 button on each input sentence, and on click you are shown a page for just that sentence, looking a bit like apertium-viewer, but preferably with a fairly big <div> for the -tree output :)

There could be a button to re-run tests for that sentence, but even more magical would be a "recompile-and-rerun-on-change" button (like dev/r does). If this is easiest to do with entr rather than an external python lib for inotify, IMO it'd be fine if that one feature depended on having entr installed.

make clearer a 'make test' error

I get:

$ make test
apertium-regtest test
Corpus 1 of 1: analysis
  7/9 (77.78%) tests pass (0/7 (0.0%) match gold)

There were changes! Rerun in interactive mode to update tests.
Changed corpora: analysis
make: *** [Makefile:928: test] Error 1

It would be easier if the message explained what has to be done to "rerun in interactive mode". Moreover, a search in the wiki of "rerun in interactive mode" or "run in interactive mode" does not give meaningful results.

Very long distance testing

For tests where the result involves very long distances. E.g., identifying characters in a novel, done with at least +/- 6 windows and relations that span whole chapters via stepping-stones.

I think this falls out of scope, but maybe I or someone else can come up with a workable method.

Interactive streaming results

It should be possible to stream results to the interface and allow interactions with it while the test is running, instead of having to wait for the full run to be done. May require storing results in SQLite temporarily.

bigger manual gold field

bilde

could at least go as far right as the above buttons

Also, could be prefilled with the current output if there's like one minor thing to fix

Support preferences

[17:12:31] <Unhammer> yeah, been wondering what the best way to do that would be. Simplest is probably to just have a whole test set limited to a value of AP_SETVAR, e.g. 
[17:12:33] <Unhammer>     "dan-nno-moderate": {
[17:12:35] <Unhammer>         "input": "dan-nno-input.txt",
[17:12:37] <Unhammer>         "mode": "dan-nno",
[17:12:39] <Unhammer>         "setvar": "infa_infe,me_vi,ggj_gg",
[17:12:41] <Unhammer>     },

Parallelize runs

A pipe often has a single bottleneck (usually a complex CG), so even though the pipe is multi-process, the benefit is reduced. Splitting the input in ~4 and running that many pipes can thus take full advantage of available CPUs.

This should both be per-corpus and across corpora, so runs should internally be changed to a single tasks list.

(quick'n'dirty per-corpus TinoDidriksen/regtest@c18ed0c)

error on printing error: ident is not defined

This happens in sme-nob:


Traceback (most recent call last):

  File "/usr/local/bin/apertium-regtest", line 1175, in <module>

    if not static_test(args.ignore_add, threshold=args.threshold,

  File "/usr/local/bin/apertium-regtest", line 1058, in static_test

    corp.load()

  File "/usr/local/bin/apertium-regtest", line 499, in load

    golddata = load_gold(goldfile)

  File "/usr/local/bin/apertium-regtest", line 119, in load_gold

    print('ERROR: Empty entry %s in %s' % (ident, fname))

NameError: name 'ident' is not defined

make: *** [test] Error 1

Can we have generated and editable files in different folders?

Currently there are 165 files in nno-nob/test for 6 testsets. If I want to add inputs/golds, that's quite a lot of file names for my human eyes to parse. Would it be possible to have something like

/test/humaneditable.input.txt
/test/humaneditable.gold.txt
/test/generated/pendingtaggeroutputexpected.txt

?

(In my quickly-hacked-together regtest approximation in nno-nob/tests I do this, with 20 different test sets)

total count of gold matches in test mode

Currently apertium-regtest test has output like this:

$ apertium-regtest -c .-morph test
Corpus 1 of 3: -s plurals-morph
  13/13 (100.0%) tests pass (3/13 (23.08%) match gold)

Corpus 2 of 3: -es plurals-morph
  20/20 (100.0%) tests pass (3/20 (15.0%) match gold)

Corpus 3 of 3: irregular plurals-morph
  22/22 (100.0%) tests pass (2/22 (9.09%) match gold)

All tests pass.

It would be nice if the bottom line (or somewhere around there) said something like "8/55 tests match gold" or similar.

Tooltip for number of inputs matching gold

It would be cool if on hover of a corpus filter, you could see the number of passing tests/total tests for that filter, including the all corpora button. Right now, interpreting the percentages below is slightly complicated.

Interface idea: button partially filled by percent passing tests

Buttons in web need some kind of explanation on hover or similar

bilde
Nothing happens when I click Diff/Inserted/Deleted – when are they supposed to do anything? There is a diff between output and gold.

Also, especially when looking at Generator output, it's a bit confusing that the second line there actually is the generator output (gold and input are labelled, but it doesn't say Output before the output).

And why are there three "Replace as gold", "Add as gold" and "Add new gold" – why isn't "Replace as gold" and "Add new gold" enough? (If there isn't a gold already, perhaps s/Replace/Add.)

I can see that generator is accepted because I can't press the button, but what then does Accept result do?

Could perhaps have a little tutorial thing at the top of the screen.

Add pkg-config warning to langs/pairs

Modules that use regtest should warn if it's not present, a'la:

PKG_CHECK_MODULES(REGTEST, apertium-regtest >= 0.0.1, [], [AC_MSG_WARN([Running tests requires apertium-regtest])])

Editability for non-apertium developers

With the old, wiki-based tests, it was possible for people who didn't have apertium/git/developer-knowhow installed – but linguistic knowledge – to check and edit tests in a basic web UI. I didn't even know about this workflow before, but apparantly it happened a lot with the sme pairs. What are the possibilities for making the current regtests editable by non-developer linguists?

  • Editing through Github – possible, but gold/input are in different files and it seems very easy to mess up the hashes
  • Run apertium-regtest web on a server – possible, but then you have to deal with logins and access
  • Some kind of export/import/sync to e.g. wiki – sounds complicated and fragile
  • Other ideas?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.