The metadata-automation-challenge from sage-bionetworks

Create script/function to score entire submission file

In the scoring_demo app, there's a function (get_res_score) to compute the score for each result submitted for a column, and #2 will extend that to an aggregate score for each column. Eventually (for the scoring harness), we want to iterate over all columns and results to compute a combined score for the submission — but this should also be useful for the app, minimizing the number of runtime calculations.

Upload and share submission video tutorial

Contacted Hsiao-Ching for guidelines.

Do not return logs or email score for final round

Update public and private Leaderboard datasets

Give instructions to Denise and Gilberto so they can modify the existing file
Denise and Gilberto to update the file
Release the new public data and update the cloud environment with the private data

Double check the scores of the baseline method

@vpchung submitted the baseline method today to check that the pipeline is working well. I noticed that the performance for the dataset Outcome-Predictors is slightly different from the (same?) model submitted on Feb 24 (this looks actually like an old submission and since then we have updated the scoring script).

Submit the baseline model and check that we get the expected score
If the submission from Feb 24 has been made using the old scoring script, make this submission INVALID.
- The score of this submission are up to date

Scoring script returns NA

I'm using the latest version of the scoring script from the branch scoring-baseline-update.

Here is the output for a subset of Outcome-Predictors:

Scoring column 1...
Scoring column 2...
Scoring column 3...
Scoring column 4...
Scoring column 5...
Scoring column 6...
Scoring column 7...
Scoring column 8...
Scoring column 9...
Scoring column 10...
Scoring column 11...
Scoring column 12...
Scoring column 13...
Scoring column 14...
Scoring column 15...
Scoring column 16...
Scoring column 17...
Error: `.x` must be a list, not NULL
Run `rlang::last_error()` to see where the error occurred.

> rlang::last_error()
<error/purrr_error_bad_type>
`.x` must be a list, not NULL
Backtrace:
  1. global::get_overall_score(submission_data, anno_data)
 18. purrr:::stop_bad_type(NULL, "a list", what = NULL, arg = ".x")
Run `rlang::last_trace()` to see the full context.

rlang::last_trace()
<error/purrr_error_bad_type>
`.x` must be a list, not NULL
Backtrace:
     █
  1. ├─global::get_overall_score(submission_data, anno_data)
  2. │ └─purrr::map(...)
  3. │   └─.f(.x[[i]], ...)
  4. │     └─global::get_col_score(...)
  5. │       └─purrr::map(...)
  6. │         └─.f(.x[[i]], ...)
  7. │           └─global::get_res_score(...)
  8. │             └─global::get_result_data(anno_col_data)
  9. │               └─purrr::keep(col_results, ~.$resultNumber == res_num) %>% purrr::flatten()
 10. │                 ├─base::withVisible(eval(quote(`_fseq`(`_lhs`)), env, env))
 11. │                 └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 12. │                   └─base::eval(quote(`_fseq`(`_lhs`)), env, env)
 13. │                     └─`_fseq`(`_lhs`)
 14. │                       └─magrittr::freduce(value, `_function_list`)
 15. │                         ├─base::withVisible(function_list[[k]](value))
 16. │                         └─function_list[[k]](value)
 17. │                           └─purrr::flatten(.)
 18. └─purrr:::stop_bad_type(NULL, "a list", what = NULL, arg = ".x")

Make sure input and reference files are mounted to correct locations in containers

These are the main folder paths that should be available to participant tools (input, output, and data), where...

input: should include each of the four (currently) "testing" TSV input files and the "validation" input file for Apollo5
output should be a writable directory where the output of submitted tools are saved
data should include all annotated JSON files for challenge datasets, as well as any reference data files (e.g., the caDSR dump table)

Note: we may or may not want to use a separate directory (e.g., user_data) to store reference data that is generated by the participant's tool. Or... they can include that data at any point they want within their Docker image, as long as it doesn't collide with the three "protected" paths above.

Add functionality to compute score for entire column

Currently, scores are computed for each row, but we need to aggregate them in some way. I have a placeholder function (get_col_score) in global.R, but haven't attempted to get it working yet. Because submissions might not necessarily include the same number of results (for different columns and for different participants), we should aggregate using something like max() or median() to avoid size bias.

Update caDSR file

In the evaluation environment, the caDSR file is mounted with the data in its filename. This is not convenient when we need to update the file as we are doing now. Instead the file will be mounted with a more generic name from now on (caDSR-export.tsv).

Update the evaluation environment
Add notification here: https://www.synapse.org/#!Synapse:syn18065892/wiki/588186
Add information in the Files Update Changelog

Use message instead of print in scoring

Fix Thesaurus file extension issue

A participant reported to Gilberto that the file Thesaurus.tsv is mounted as Thesaurus.txt. I guess the documentation may not be accurate. The participants said that this affected two of their submission.

Figure out whether the issue is on the way the file is mounted or the documentation
Tom reported that there may be a discrepancy in file version. Clarify with him and Denise/Gilberto
Figure out how to compensate the team (CEDAR? to confirm). They may not be the only team that may have been affected so one solution could be to look at what other team have made submission in this round and give additional submission credits only to them.

Post scoring demo app on Shiny server or shinyapps.io

Doesn't need to be embedded in Synapse or require login, just hosted somewhere that other organizers can access and play around with the app.

Update code and data on scoring instance with JSON schema fixes

Besides pulling down the latest version of schema/output-schema.json, we also need to sync changes made to R/scoring.R.

From Synapse, we should grab all of the Annotated-<dataset_name>.json files in the manually-curated_annotated folder.

Update JSON Schema and set up validation

The current schema (schema/output-schema.json) needs to be updated to incorporate recent changes to the submission file structure (see scoring/test_rembrandt.json). Once the schema is good to go, we can presumably use an out-of-the-box JSON Schema validator to validate submission outputs.

Clean up Validation dataset

Opening a ticket to track the progress on the curation of the preparation of the dataset for the validation phase of the Metadata DREAM Challenge.

To further validate the validation dataset, we plan to apply one of the best methods submitted in the leaderboard phase.

As a reminder, we should not post here sensitive information about the validation dataset (public repo).

Test scoring code on gold standard files

If one of the manually curated annotation files is provided as input as well as the gold standard, the resulting score should be "perfect" (5.0). If not, we need to do some debugging.

Connect match weighting/threshold parameters to scoring function

The function in #2 should be able to take inputs for the weightings and thresholds used to calculate scores for each check — either as individual arguments or as a single list/table. When the user updates the values for these parameters in the sidenav, we want some sort of button+observe combo to update scores.

Fix scoring code

Running the scoring locally, I got the following scores:

APOLLO-2: 4.56666666666667
Outcome-Predictors: 4.70588235294118
REMBRANDT: 4.5625
ROI-Masks: 4.8125

Originally posted by @vpchung in #61 (comment)

Send email communication with update to the participants

Send an email communication to participants with the following information (add as needed):

The competitive phase of the challenge is extended (+ Round 3)
Share submission video tutorial made @vpchung

Depends on

#68

Scoring environment updates

Code:

R/scoring.R (needs updated Docker image)
R/baseline_annotator.R (needs updated Docker image)

Inputs:

Gold standard:

Schema:

Verify that local schema/output-schema.json is latest version
Verify that local schema/validate.py is latest version

sage-bionetworks / metadata-automation-challenge Goto Github PK

metadata-automation-challenge's Introduction

Metadata Automation Challenge

Using the baseline demo in RStudio

Environment setup

Open and run the demo notebook

Building Docker images

Running the baseline method with Docker

Validating the submission file

Scoring the submission

metadata-automation-challenge's People

Contributors

Stargazers

Watchers

Forkers

metadata-automation-challenge's Issues

Depends on

Recommend Projects

Recommend Topics

Recommend Org