Giter Site home page Giter Site logo

metadata-automation-challenge's Introduction

Metadata Automation Challenge

Using the baseline demo in RStudio

Environment setup

  1. Clone this repository

  2. Open metadata-automation-challenge.Rproj

  3. Install packages. In the RStudio console, run:

renv::restore()

This may take some time to complete - get something nice to drink :)

  1. Create the folders input, data and output in your current directory.

  2. Create .synapseConfig file

See this vignette about Managing Synapse Credentials to learn how to store credentials to login without needing to specify your username and password each time.

Open and run the demo notebook

You can find the baseline demo R Notebook at baseline_demo/baseline_demo.Rmd. After opening the notebook, you should be able to step through and execute each chunk in order.

Building Docker images

docker build -t metadata-baseline -f Dockerfile.baseline .
docker build -t metadata-validation -f Dockerfile.validation .
docker build -t metadata-scoring -f Dockerfile.scoring .

Running the baseline method with Docker

Here we describe how to apply the baseline method to automatically annotate a dataset (see Data Description).

  1. Create the folders input, data and output in your current directory.
  2. Place the input dataset in input, e.g. input/APOLLO-2-leaderboard.tsv
  3. Run the following command
docker run \
  -v $(pwd)/input:/input:ro \
  -v $(pwd)/data:/data:ro \
  -v $(pwd)/output:/output \
  metadata-baseline APOLLO-2-leaderboard

where APOLLO-2 is the name of the dataset in the folder input (without the extension .tsv). Here $(pwd) is automatically replaced to the absolute path of the current directory.

The file /output/APOLLO-2-leaderboard-Submission.json is created upon successful completion of the above command.

Validating the submission file

The following command checks that the format of the submission file generated is valid.

$ docker run \
  -v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/input.json:ro \
  metadata-validation \
  validate-submission --json_filepath /input.json
Your JSON file is valid!

where $(pwd)/output/APOLLO-2-leaderboard-Submission.json points to the location of the submission file generated in the previous section.

Alternatively, the validation script can be run directly using Python.

$ python3 -m venv venv
$ pip install click jsonschema

Here is the generic command to validate the format of a submission file.

$ python schema/validate.py validate-submission \
  --json_filepath yourjson.json \
  --schema_filepath schema/output-schema.json

To validate the submission file generated in the previous section, the command becomes:

$ python schema/validate.py validate-submission \
  --json_filepath output/APOLLO-2-leaderboard-Submission.json \
  --schema_filepath schema/output-schema.json
Your JSON file is valid!

Scoring the submission

Here we evaluate the performance of the submission by comparing the content of the submission file to a gold standard (e.g. manual annotations).

$ docker run \
  -v $(pwd)/output/APOLLO-2-leaderboard-Submission.json:/submission.json:ro \
  -v $(pwd)/data/Annotated-APOLLO-2-leaderboard.json:/goldstandard.json:ro \
  metadata-scoring score-submission /submission.json /goldstandard.json
1.24839015151515

metadata-automation-challenge's People

Contributors

jaeddy avatar thomasyu888 avatar tschaffter avatar vpchung avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

metadata-automation-challenge's Issues

Create script/function to score entire submission file

In the scoring_demo app, there's a function (get_res_score) to compute the score for each result submitted for a column, and #2 will extend that to an aggregate score for each column. Eventually (for the scoring harness), we want to iterate over all columns and results to compute a combined score for the submission — but this should also be useful for the app, minimizing the number of runtime calculations.

Update public and private Leaderboard datasets

  • Give instructions to Denise and Gilberto so they can modify the existing file
  • Denise and Gilberto to update the file
  • Release the new public data and update the cloud environment with the private data

Double check the scores of the baseline method

@vpchung submitted the baseline method today to check that the pipeline is working well. I noticed that the performance for the dataset Outcome-Predictors is slightly different from the (same?) model submitted on Feb 24 (this looks actually like an old submission and since then we have updated the scoring script).

  • Submit the baseline model and check that we get the expected score
  • If the submission from Feb 24 has been made using the old scoring script, make this submission INVALID.
    • The score of this submission are up to date

Scoring script returns NA

I'm using the latest version of the scoring script from the branch scoring-baseline-update.

Here is the output for a subset of Outcome-Predictors:

Scoring column 1...
Scoring column 2...
Scoring column 3...
Scoring column 4...
Scoring column 5...
Scoring column 6...
Scoring column 7...
Scoring column 8...
Scoring column 9...
Scoring column 10...
Scoring column 11...
Scoring column 12...
Scoring column 13...
Scoring column 14...
Scoring column 15...
Scoring column 16...
Scoring column 17...
Error: `.x` must be a list, not NULL
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/purrr_error_bad_type>
`.x` must be a list, not NULL
Backtrace:
  1. global::get_overall_score(submission_data, anno_data)
 18. purrr:::stop_bad_type(NULL, "a list", what = NULL, arg = ".x")
Run `rlang::last_trace()` to see the full context.
rlang::last_trace()
<error/purrr_error_bad_type>
`.x` must be a list, not NULL
Backtrace:
     █
  1. ├─global::get_overall_score(submission_data, anno_data)
  2. │ └─purrr::map(...)
  3. │   └─.f(.x[[i]], ...)
  4. │     └─global::get_col_score(...)
  5. │       └─purrr::map(...)
  6. │         └─.f(.x[[i]], ...)
  7. │           └─global::get_res_score(...)
  8. │             └─global::get_result_data(anno_col_data)
  9. │               └─purrr::keep(col_results, ~.$resultNumber == res_num) %>% purrr::flatten()
 10. │                 ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 11. │                 └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 12. │                   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 13. │                     └─`_fseq`(`_lhs`)
 14. │                       └─magrittr::freduce(value, `_function_list`)
 15. │                         ├─base::withVisible(function_list[[k]](value))
 16. │                         └─function_list[[k]](value)
 17. │                           └─purrr::flatten(.)
 18. └─purrr:::stop_bad_type(NULL, "a list", what = NULL, arg = ".x")

Make sure input and reference files are mounted to correct locations in containers

These are the main folder paths that should be available to participant tools (input, output, and data), where...

  • input: should include each of the four (currently) "testing" TSV input files and the "validation" input file for Apollo5
  • output should be a writable directory where the output of submitted tools are saved
  • data should include all annotated JSON files for challenge datasets, as well as any reference data files (e.g., the caDSR dump table)

image

Note: we may or may not want to use a separate directory (e.g., user_data) to store reference data that is generated by the participant's tool. Or... they can include that data at any point they want within their Docker image, as long as it doesn't collide with the three "protected" paths above.

Add functionality to compute score for entire column

Currently, scores are computed for each row, but we need to aggregate them in some way. I have a placeholder function (get_col_score) in global.R, but haven't attempted to get it working yet. Because submissions might not necessarily include the same number of results (for different columns and for different participants), we should aggregate using something like max() or median() to avoid size bias.

Update caDSR file

In the evaluation environment, the caDSR file is mounted with the data in its filename. This is not convenient when we need to update the file as we are doing now. Instead the file will be mounted with a more generic name from now on (caDSR-export.tsv).

Fix Thesaurus file extension issue

A participant reported to Gilberto that the file Thesaurus.tsv is mounted as Thesaurus.txt. I guess the documentation may not be accurate. The participants said that this affected two of their submission.

  • Figure out whether the issue is on the way the file is mounted or the documentation
  • Tom reported that there may be a discrepancy in file version. Clarify with him and Denise/Gilberto
  • Figure out how to compensate the team (CEDAR? to confirm). They may not be the only team that may have been affected so one solution could be to look at what other team have made submission in this round and give additional submission credits only to them.

Update JSON Schema and set up validation

The current schema (schema/output-schema.json) needs to be updated to incorporate recent changes to the submission file structure (see scoring/test_rembrandt.json). Once the schema is good to go, we can presumably use an out-of-the-box JSON Schema validator to validate submission outputs.

Clean up Validation dataset

Opening a ticket to track the progress on the curation of the preparation of the dataset for the validation phase of the Metadata DREAM Challenge.

To further validate the validation dataset, we plan to apply one of the best methods submitted in the leaderboard phase.

As a reminder, we should not post here sensitive information about the validation dataset (public repo).

Test scoring code on gold standard files

If one of the manually curated annotation files is provided as input as well as the gold standard, the resulting score should be "perfect" (5.0). If not, we need to do some debugging.

Connect match weighting/threshold parameters to scoring function

The function in #2 should be able to take inputs for the weightings and thresholds used to calculate scores for each check — either as individual arguments or as a single list/table. When the user updates the values for these parameters in the sidenav, we want some sort of button+observe combo to update scores.

Fix scoring code

Running the scoring locally, I got the following scores:

APOLLO-2: 4.56666666666667
Outcome-Predictors: 4.70588235294118
REMBRANDT: 4.5625
ROI-Masks: 4.8125

Originally posted by @vpchung in #61 (comment)

Scoring environment updates

Code:

  • R/scoring.R (needs updated Docker image)
  • R/baseline_annotator.R (needs updated Docker image)

Inputs:

Gold standard:

Schema:

  • Verify that local schema/output-schema.json is latest version
  • Verify that local schema/validate.py is latest version

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.