innolitics / hdat Goto Github PK

High dimensionality algorithm tester

License: MIT License

Python 100.00%

testing testing-tools python image-processing snapshot-testing numerical-methods

hdat's Introduction

High Dimensional Algorithm Tester (HDAT)

HDAT is a python library and command line tool that simplifies testing algorithms with high dimensional inputs and outputs, e.g. image processing algorithms.

HDAT shares some principles with snapshot testing tools, such as those in Jest. These tools are usually designed to test for regressions or changes of user interfaces in specific frameworks, instead of any high dimensional python algorithm like HDAT.

An example implementation of HDAT testing can be found in the example directory.

Suites

A suite is a python class that subclasses the Suite class in the hdat module implements a few methods. In particular, it implements methods that:

Collect test cases
Run a test case
Compare the result of running a test case against a previously verified result for the same test case
View a result

Each suite must also have a unique id that identifies it which is referred to as the suite_id.

We discuss each of these methods in detail below.

Collecting Test Cases

A suite is responsible for collecting all of the test cases that it contains, and it must return a dict whose keys are ids that uniquely identify a test case over time, and whose values contain all of the inputs necessary to run the test.

Running Test Cases

A suite is responsible for running test cases. A test case must be run with the inputs provided from the collection stage.

The output of the algorithm we are testing will usually be a large array of some sort, which is difficult to compare over time as the algorithm evolves due to its large size, floating point changes, etc.

In other words, unlike "low-dimensional" algorithms where it is possible to write assertions to verify that the output is correct, when working with "high-dimensional" algorithms, the outputs are too large and complex for simple assertions; in fact, for many types of problems, finding a simple verification assertion is equivalent to solving the underlying problem! Thus, the only way we can really verify the result is through semi-manual verification by a human.

But of course, we don't want to manually verify the test case output every time! To get around this, we reduce the high dimensionality algorithm output into one or more "metrics" that can be compared automatically most of the time. The first time we run a new test case, a human manually verifies the output. We then save these metrics into a "golden result" file. Subsequent algorithm runs are compared with these golden results to see if the metrics have changed too much (where "too much" is set by the hdat suite).

If the automated comparison fails, it may not mean that the algorithm has actually errored. It may simply mean that a change in the algorithm has resulted in the metrics sufficiently changing so as to require the full-output to be re-verified by a human.

The metrics alone are insufficient to make a manual verification, thus the run method should also record the full dimensional output (and possibly intermediate output) that can help a human manually verify that the result is in fact correct. This additional information is called the "context".

Thus, the run method of the suite must return two items:

Metrics - Reduced, low-dimensional numbers that can be used to automatically verify if the algorithm is still running as expected, but are sensitive enough to change if the algorithm has changed substantially
Context - High dimensional output of the algorithm, as well as any other intermediate output that can be used by a human to verify a result.

Comparing the result

A suite is also responsible for comparing two result of running the algorithm. Typically this method will be used to compare the result of running a new version of the algorithm against the last human-verified result of running the algorithm for a given test case.

This method will be given the metrics from a previous, verified run (the golden metrics), as well as the metrics from a new run. It must return a boolean indicating whether it must be re-verified by a human, and a string with any comments about the failure.

Visualizing the result

A suite must also make it easy for humans to verify the results of running its test cases. The show method is given a full result (including the metrics and the context), and it must display these results in an easy to view manner.

Stores

Unlike many test runners that treat previous results as being transient items that lose their interest quickly, the nature of hdat requires that historical test results be given a little more respect! After all, the historical results are being depended on to avoid requiring all of the high-dimensional algorithm outputs from being inspected manually after each test run.

Test results are stored in a Store. A store is a class that implements a few methods.

When running the hdat CLI, there are always two stores involved:

The golden store - the current "gold standard" metrics for each test case
The save store - where all historical runs are stored for easy comparison

The requirements of each of these stores are different.

The golden store should be saved inside the git repository, so that if two developers are working on an algorithm at once, they will know if there is a merge conflict. Because the golden store must be stored in the repository, it must also be small; we do not want it to save the result context, but just the result metrics. Finally, we want the golden store to only retain a single result per test case.

The save store, on the other hand, should keep all historical results, and should keep the full context of the results so that they can be visually compared. It should also not be kept inside the git repository.

Abstractions

HDAT has the following abstractions:

A test suite runs tests against a particular algorithm
A test case is a particular "scenario" that a test suite runs
A test result is the result of running a test case

Test results contain three pieces of information:

metrics
context
meta data

Metrics are low-dimensional and easily comparable pieces of data that have been derived from the algorithm's raw high dimensional output. The metrics are used by the test suite to automatically verify new test results against older human-verified test results.

The context includes the high-dimensional output of the algorithm, along with any relevant intermediate data. The context is used when a human verifies a test run.

Meta data about the test result includes the current git commit, the date and time of the run, etc.

Test results are kept in stores. There are two stores that the command line tool interacts with--the golden store and the . There is the golden store, which contains all of the most up-to-date, human-verified test results. Then there is the *save store

Design Goals

A conceptually simple API
Be picky and abort easily at the start of a run, but after the run begins, try to catch any errors and continue.

Casespecs

A casespec is a string that selects one or more test cases. A casespec may specify a single test case, or it may specify many test cases.

Here are several casespecs along with the test cases they would select:

`` Selects all test cases in all suites.
a Selects all cases in the test suite with id "a".
a/b Selects test case with id "b" in the suite with id "b".

Resultspecs

A resultspec is a string that selects one or more test results. A result spec may specify a single result, or many results. Every casespec can also act as a resultspec; , because there are typically many more results that need to be selected among.

Here are several resultspec along with the test cases they would select:

`` Selects the most recent result for every test case in every test suite.
a Selects the most recent results for every test case in the test suite with id "a".
a/b Selects the most recent result for the test case with id "b" in the test suite with id "a".
a/b/c Selects the test result with id "c" for the test case with id "b" in the test suite with id "a".
a/b/~0 Selects the most recent result for the test case with id "b" in the test suite with id "a".
a/b/~1 Selects the previous result for the test case with id "b" in the test suite with id "a".
a/b/~4 Selects the 4 test older than the previous result for the test case with id "b" in the test suite with id "a".

hdat's People

Contributors

Stargazers

Watchers

Forkers

mirror-dump

hdat's Issues

Results reporting

We need a way to generate tables with the results of running a test suite. Perhaps the API could work something like:

hdat csv --keys=test_id,metrics.a,metrics.b my_test_suite

and would print the following to standard output:

test_id, metrics.a, metrics.b
my_test_suite/case-001, 13.1, 32.1
my_test_suite/case-002, 22.1, 40.4

runshow command failing with AttributeError

The list of casespecs is passed into the resultspec resolver, causing problems. If we want to keep supporting multiple casespecs on a command, they need to be passed in one-by-one.

Traceback (most recent call last):
  File "/Users/dillon/Applications/miniconda3/bin/hdat", line 11, in <module>
    load_entry_point('hdat', 'console_scripts', 'hdat')()
  File "/Users/dillon/innolitics/hdat/hdat/main.py", line 31, in main
    hdat_cli(sys.argv[1:], suites, golden_store, archive, git_info)
  File "/Users/dillon/innolitics/hdat/hdat/hdat_cli.py", line 76, in hdat_cli
    results = resolve_resultspecs(archive, args.casespecs)
  File "/Users/dillon/innolitics/hdat/hdat/resultspec.py", line 19, in resolve_resultspecs
    resultspec_parts = resultspec.split('/')
AttributeError: 'list' object has no attribute 'split'

Add bash auto-complete for the command line tool

I don't know exactly how bash auto complete works, however, it would be nice if:

We could autocomplete subcommands
We could auto complete result and case specs

Regarding item (2), if you type in:

$ hdat run <TAB><TAB>

It would be nice if it printed out a list of available suites, just like when you type:

$ ls <TAB><TAB>

Also, if you type:

$ hdat run SUITE_ID/<TAB><TAB>

it would be nice if it printed the available case ids. Auto completing result ids is less important.

Get the tests passing on travis

I pulled this code out of another project.

I also started getting it prepped to be pushed to pypi, but it is not quite ready.

The first step will be getting all the tests passing. Note, unless it is easy, we do NOT need to support python 2.7, although the travis config is setup test on 2.7, 3.4, 3.5. and 3.6 at the moment.

You can see the travis tester runs here:

https://travis-ci.org/innolitics/hdat

Setup a PyPi package

Add the ability to mark cases as "expected to fail"

Make resultspec searches more efficient

Currently result files are stored in big pickeled blobs; the pickled blobs can be quite big, depending on how much information is put into the "context" in the HDAT suite.

Because the pickle files are so big, and because we need to unpick each file in order to resolve a resultspec, resolving result specs can be quite slow.

It would be nice if we could resolve result specs only using the filesnames... this would probably be a lot faster.

Hook up hdat_main_script in setup.py

After running pip install hdat we should have the hdat_main_script available as hdat on the shell PATH. I am not sure how to do this, but we will probably need to update our setup.py and hook up either a "script" or an "entrypoint".

Import Locations

We should expose Suite and MetricsChecker at the package level, so that users can do:

from hdat import Suite, MetricsChecker

instead of

from hdat.suite import Suite, MetricsChecker

Also, perhaps the MetricsChecker should go in a separate module from the suite?

Create a small assertion library for metrics

We would want to keep the number of supported assertions quite small.

Flesh out the documentation

Find a way to avoid needing `.hdatsuites`

As usual, it would be good to write out your design here so we can discuss, before implementing.

Add ability to tag results and specify them

While working on image processing algorithms, a frequently encountered workflow is:

Run the HDAT suite
Show the results
Make some changes to the code
Run the suite
Show the results and possibly compare to the results from (1)
... repeat

It is easy to get confused regarding which run is which. It would be nice to be able to tag the results of a particular hdat run ..., and to provide a way specify a particular tagged run in a result spec.

I think there should be a single result with a particular tag in a given store for a particular suite.

Maybe we should look at git for inspiration on the CLI, since developers are used to git.

Perhaps we can use symbolic links to implement the tags? Thus, resolving a resultspec for a tagged result would be checking for the existence of symbolic links with a particular name (I think this is what Git does).

Further simplifications to the HDAT API

We should consider even further simplifications to the HDAT Api in a future pre v1 version.

For example, we should discuss:

Removing the need for an id attribute on hdat.Suite classes
Whether the current hdat.MetricsChecker method names are intuitive. For example, @zach-waggoner made a good point that .close is usually used to be the opposite of .open, thus perhaps .close should maybe be renamed to .isclose.

It would be good to discuss this with the whole team to see if there are other possible improvements.

`hdat show suite` displays results for cases that no longer exist

i.e., if an HDAT suite S has cases A, B, C

and we run hdat run S, then we delete case B, then run hdat run S, then run hdat show S, hdat will still try to show B even though its case was deleted.

If a case is deleted, we should not display old results for it.

Add an example

Improve missing method error messages

I forgot that we had changed Suite.verify to Verify.check.

When I ran an hdat case I saw:

Traceback (most recent call last):
  File "/Users/johndavidgiese/.pyenv/versions/3.6.3/envs/lib/python3.6/site-packages/hdat/runner.py", line 27, in run_cases
    status, comments = run_case(suite, golden_store, archive, git_info, case_id)
  File "/Users/johndavidgiese/.pyenv/versions/3.6.3/envs/lib/python3.6/site-packages/hdat/runner.py", line 50, in run_case
    passed, comments = suite.check(golden_result['metrics'], metrics)
  File "/Users/johndavidgiese/.pyenv/versions/3.6.3/envs/lib/python3.6/site-packages/hdat/suite.py", line 30, in check
    raise NotImplementedError()
NotImplementedError

It would be nice if the error messages were more helpful if there is a missing method like this. E.g. new users may not fully understand what all of the methods are for.

Add ability to specify more than one result spec or case spec

For most of the hdat subcommands, it makes sense to allow users to specify multiple result specs or multiple case specs.

In particular, it would be nice if:

hdat run suite-name/case-01 suite-name/case-03 would run case 1 and 3.

Currently, it seems that this is not supported by at least some of the subcommands.

Add ability to run tests in parallel

Since each test run is should be independent of the others, it would be nice to be able to do:

hdat run -j4 my-suite

Consider simplifying the APIs

We should think through the:

API for the HDAT suites
the CLI

and try and find places where we can simplify them.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.