Giter Site home page Giter Site logo

dream-challenge's People

Contributors

cgcook avatar eichmann avatar jguinney avatar jmcmurry avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dream-challenge's Issues

Problem with permissions of UW logs

When the pipeline at UW writes the pipeline generated logfile to Synapse, the permissions are set so that only the ehrdreamservice account can access it. We need to change this so that all log files written to Synapse can be read by the submitter and by the EHR DREAM admin team.

Draft the challenge timeline

Background: DREAM challenges typically provides a figure representing the timeline of the challenge. This figure, which is often used in presentations too, should provide participants with the start and end date of the different phases of the challenge (open phase, leaderboard rounds, validation phase). Here is an example of timeline:

image

Task: Draft the challenge timeline, which will be displayed on the EHR Challenge wiki.

Identify the Challenge Questions

  • Describe the Challenge Question(s) that participants will address
    • Include a brief description of the input data. Another page is used to provide detailed information about the data.
    • Describe the format of the predictions that models must generate

Please add content to the wiki page 2.1 - Challenge Questions

Instantiate UW server replica and give access to Sage IT Engineers

There are currently technical issues that prevent UW engineers to run the challenge infrastructure developed by Sage on their server. In order to speed up the resolution of the process, it has been decided yesterday that @jprosser will instantiate a second server which has the same configuration (at least OS and security wise) than the server that will be used for the challenge. This replica will hold no data. @jprosser will then give Sage team (Bruce, @thomasyu888 and myself) ssh access to this second server so that we can troubleshoot the errors more effectively.

Come up with 4-5 suggestions of names for this challenge

On March 18, Justin Guinney asked us to come up with 4-5 suggestions of names for this challenge.

Notes:

  • The "Patient mortality Challenge" name used in some documents is too specific as this challenge will address additional questions (?)

Timed out docker containers are not being stopped

When containers run for more than the allotted time (10 hrs.) the toil workflow hook that is running the container is stopped but the submitted docker container is not. It will continue until the administrator manually kills it or it stops on its own.
This could be a problem if the workflow hook thinks that the still running container has been stopped and it pulls in a new submission leading to a memory overflow.
The other issue is the workflow hook is not being stopped gracefully, so logs are not being saved after the time quota.

Identify how best use the WashU synthetic dataset

These are "meaningful" synthetic data that we could use for a sub-challenge, for example.

We have a call with Randi on June 18 between 8 and 9am PDT.

  • Contact Randi for additional information about the dataset (number of positive, etc.)
  • Depending on the number, prepare a proposal on how we recommend using the data

Deploy and test challenge infrastructure at NCATS

Justin mentioned that NCATS will host the synthetic data (Synpuf and/or Wash U) while the challenge data will be hosted at UW.

Notes:

  • If we want to automatically run the models on Synpuf data before running on the real data, then a copy of Synpuf data must be hosted on UW cloud.
    • Is it really beneficial to run model on Synpuf data before running on the challenge data? The answer depends on the amount of time we expect the model to take to make prediction for all the subject in the evaluation set.
    • If the purpose of NCATS is to host only the Synpuf data, why not use only UW cloud? (I understand that in the long run we want to make our infrastructure available on NCATS so that anyone could host a challenge).

Depending on the above answers:

  • Contact Usman Sheikh [email protected] and Raju Hemadri [email protected] (NCATS) to initial the deployment of the infrastructure at NCATS
  • Connect to NCATS (AWS?) resources allocated to this project
  • Deploy the Synapse Workflow Hook + Synpuf data
  • Test the infrastructure

Provide challenge data to Sage to enable development and testing of the submission infrastructure

Background: Sage is taking care of developing the IT infrastructure responsible for:

  1. Pulling participant submissions from Synapse (Docker images)
  2. Run the submissions (training, inference) and push results to Synapse

Task: Provide Sage with the following components to enable the development and testing of the IT infrastructure for the EHR Challenge:

  • Data (synthetic training and evaluation)
  • Model (docker image, description of input and output)
  • Gold standard
    • synpuf_clean/evaluate/evaluation_patient_status.csv
  • Scoring script (could be a dummy script at first that will be replaced later)

According to Tom, we could deploy and test an initial version of the IT infrastructure on Sage AWS instances in 1-2 days once we have received the above components.

Quota time leads models sent on the fast lane to fail

On Monday 14, @thomasyu888 reported that models are failing on the NCATS server because they hit the maximum runtime. I believe that the maximum runtime is currently set to 1 hour.

Resolution:

  • @thomasyu888 to confirm that the maximum runtime is set to 1h
  • @trberg to provide an update regarding the update of the Synpuf data on the NCATS server and how to make the size of the dataset so that methods can reasonably run on it for less than 1h.

Test challenge infrastructure at UW

@thomasyu888 tested the challenge infrastructure on Sage network using a part of the Synpuf data and one of Yao's model. The goal of this task is for Tim to confirm that he can run the infrastructure on UW network using the instructions provided by Tim. Once done, the next step will be to have the infrastructure deployed by NCATS.

Prepare 10-min presentation for DREAM Directors (May 28)

On May 28, I'll be giving a 10-min overview of the EHR DREAM Challenge during the DREAM Directors meeting.

Here are the information that I plan to include:

  • Identity of the organizers and partner organizations
  • Scientific questions
  • Timeline
  • Data
    • Source of data: synpuf, UW data
    • Description of the population and curation protocol
  • Scoring metrics
  • Submission & IT infrastructure
  • Baseline method
    • High-level description
    • Performance on UW data?
  • Where we stand regarding the timeline / remaining tasks

Note: Whenever relevant, include link to resources: Synapse/GitHub project, code of the baseline method, etc.

@trberg @yy6linda Can you please point me to existing presentations on the EHR DREAM Challenge? If not done yet, we need to add them to a Google folder where we can place them so we can easily refer and reuse them.

Submit revised IRB request

Original IRB was narrow in scope, limiting who could submit models. New IRB will expand the challenge to enable public submissions

Build UW infrastructure

Pilot a pipeline that pulls docker submissions from Synapse into UW RIT environments that will train and test on UW OMOP data.

Create challenge advertisement material

Background: DREAM Challenges usually have a graphic banner that advertises the challenge on the home page of the wiki. A placeholder for this banner is actually included in the DREAM Challenge Wiki Template which has been used for the EHR Challenge Staging project. The banner usually includes: 1) the full name of the challenge, 2) one short sentence that describes what the challenge is about, 3) an illustration and 4) the logos of participating organizations.

Task:

  1. Provide name of the challenge (tracked here: #10 )
  2. Provide list of organizing institutions and partners (tracked here: #12 )
    • Collect logos
  3. Provide illustration or ideas of illustrations (ideas tracked in this ticket)
  4. Make initial version of the banner (Thomas or X)

Improve workflow hook to enable horizontal deployment

Tess Thyer at Sage is working on improving the Synapse Workflow Hook (used in this challenge) to support among other horizontal deployment of Toil engines to process more submissions when under heavy load.

While horizontal deployment is not a requirement for this challenge, this could be a nice feature to have. This will also be a requirement for the infrastructure that we will deploy at NCATS in the ling run.

Create and release Synthetic Data Version 3

We have had a couple instances of models passing the fast lane but failing on the real UW data. Most of these seemed to be caused by small discrepancies between the real data and the synthetic data, mainly the presence of null values in the real data and none of those null values in the real data. We will be attempting to address these issues in the next version of the synthetic data.

Participant are unable to see their dashboard

The dashboard shows an error similar to is showing Index: -1, Size: 33.

According to Tim, the dashboard is displayed if there is at least one "successfully" submission. Users with only failed submissions can not see their dashboard.

Fill in challenge pre-registration page on Synapse

Background: The pre-registration page for the EHR Challenge is available here: https://www.synapse.org/#!Synapse:syn18405991/wiki/589657. The goal of this page is to provide information about the challenge (Overview, Challenge Organizers, Data Contributors, Journal Partners, Funders and Sponsors). This page also enable Synapse users to pre-register in order to receive news about the challenge launch. The pre-registration page is currently visible only to organizers.

Task: Fill in the different sections of the per-registration page before making it publicly visible to anyone.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.