data2health / dream-challenge Goto Github PK
View Code? Open in Web Editor NEWEHR DREAM Challenge
EHR DREAM Challenge
From Julie:
Hi Thomas
We are now aiming for end of July to align with an announcement for a challenge for the Allen Institute. Just plan to have something to me by mid July.
Thanks
When the pipeline at UW writes the pipeline generated logfile to Synapse, the permissions are set so that only the ehrdreamservice account can access it. We need to change this so that all log files written to Synapse can be read by the submitter and by the EHR DREAM admin team.
Jupyter notebook describing the curation of the data: https://github.com/data2health/DREAM-Challenge/tree/master/documentation
Background: DREAM challenges typically provides a figure representing the timeline of the challenge. This figure, which is often used in presentations too, should provide participants with the start and end date of the different phases of the challenge (open phase, leaderboard rounds, validation phase). Here is an example of timeline:
Task: Draft the challenge timeline, which will be displayed on the EHR Challenge wiki.
On Monday 14, @thomasyu888 reported that
there are some issues on NCATS that lead submissions to stop randomly, Tom is restarting the submissions but we need a better fix.
@thomasyu888 Can you provide a better description of the issue?
We needed to adjust the gold standard benchmarks to calculate time to death of patients using 180 days rather than the 6 month function. This was to make the prediction window more consistent.
Please add content to the wiki page 2.1 - Challenge Questions
Waiting on #33 to get access to the UW server on which we will troubleshoot the issues that the challenge infrastructure has due to the specific configuration of the UW servers.
@thomasyu888 Is this correct that you are doing some work on tiding the submissions IDs? Can you please keep track of what you are doing here and close this ticket once the task is completed?
We are working with the Mt. Sinai site to onboard them into the DREAM Challenge evaluation network.
There are currently technical issues that prevent UW engineers to run the challenge infrastructure developed by Sage on their server. In order to speed up the resolution of the process, it has been decided yesterday that @jprosser will instantiate a second server which has the same configuration (at least OS and security wise) than the server that will be used for the challenge. This replica will hold no data. @jprosser will then give Sage team (Bruce, @thomasyu888 and myself) ssh access to this second server so that we can troubleshoot the errors more effectively.
On March 18, Justin Guinney asked us to come up with 4-5 suggestions of names for this challenge.
Notes:
We have created a list of concepts that appear in more than 100 patients in the UW data. We are currently reviewing this list and plan to release it shortly.
Currently, the codes are uploaded here.
@yy6linda and @tschaffter can you review this data?
When containers run for more than the allotted time (10 hrs.) the toil workflow hook that is running the container is stopped but the submitted docker container is not. It will continue until the administrator manually kills it or it stops on its own.
This could be a problem if the workflow hook thinks that the still running container has been stopped and it pulls in a new submission leading to a memory overflow.
The other issue is the workflow hook is not being stopped gracefully, so logs are not being saved after the time quota.
These are "meaningful" synthetic data that we could use for a sub-challenge, for example.
We have a call with Randi on June 18 between 8 and 9am PDT.
Justin mentioned that NCATS will host the synthetic data (Synpuf and/or Wash U) while the challenge data will be hosted at UW.
Notes:
Depending on the above answers:
Background: Sage is taking care of developing the IT infrastructure responsible for:
Task: Provide Sage with the following components to enable the development and testing of the IT infrastructure for the EHR Challenge:
According to Tom, we could deploy and test an initial version of the IT infrastructure on Sage AWS instances in 1-2 days once we have received the above components.
On Monday 14, @thomasyu888 reported that models are failing on the NCATS server because they hit the maximum runtime. I believe that the maximum runtime is currently set to 1 hour.
Resolution:
@thomasyu888 tested the challenge infrastructure on Sage network using a part of the Synpuf data and one of Yao's model. The goal of this task is for Tim to confirm that he can run the infrastructure on UW network using the instructions provided by Tim. Once done, the next step will be to have the infrastructure deployed by NCATS.
@trberg Can you please describe here what the participant who is requesting additional data would like us to provide. Can you also describe the solution that you had in mind on Monday for us to review? Thanks!
On May 28, I'll be giving a 10-min overview of the EHR DREAM Challenge during the DREAM Directors meeting.
Here are the information that I plan to include:
Note: Whenever relevant, include link to resources: Synapse/GitHub project, code of the baseline method, etc.
@trberg @yy6linda Can you please point me to existing presentations on the EHR DREAM Challenge? If not done yet, we need to add them to a Google folder where we can place them so we can easily refer and reuse them.
The goal of this task is to provide a description of how to dockerize and run locally the baseline model developed by Yao.
Baseline codebase: https://github.com/yy6linda/mortality_prediction_docker_model
Documentation: https://www.synapse.org/#!Synapse:syn18405992/wiki/589659
Original IRB was narrow in scope, limiting who could submit models. New IRB will expand the challenge to enable public submissions
Pilot a pipeline that pulls docker submissions from Synapse into UW RIT environments that will train and test on UW OMOP data.
Release Webinar #2 material
Background: DREAM Challenges usually have a graphic banner that advertises the challenge on the home page of the wiki. A placeholder for this banner is actually included in the DREAM Challenge Wiki Template which has been used for the EHR Challenge Staging project. The banner usually includes: 1) the full name of the challenge, 2) one short sentence that describes what the challenge is about, 3) an illustration and 4) the logos of participating organizations.
Task:
Send SNOMED use license language and finalize the MoU with Sean and Julie
E.g. cancer, coronary heart disease, type-II diabetes, chronic obstructive pulmonary disease.
@yy6linda Could you have a look at the number of patients who died for each of these disease?
Email address: [email protected]
Tess Thyer at Sage is working on improving the Synapse Workflow Hook (used in this challenge) to support among other horizontal deployment of Toil engines to process more submissions when under heavy load.
While horizontal deployment is not a requirement for this challenge, this could be a nice feature to have. This will also be a requirement for the infrastructure that we will deploy at NCATS in the ling run.
We have had a couple instances of models passing the fast lane but failing on the real UW data. Most of these seemed to be caused by small discrepancies between the real data and the synthetic data, mainly the presence of null values in the real data and none of those null values in the real data. We will be attempting to address these issues in the next version of the synthetic data.
The dashboard shows an error similar to is showing Index: -1, Size: 33
.
According to Tim, the dashboard is displayed if there is at least one "successfully" submission. Users with only failed submissions can not see their dashboard.
We are working with Randi to onboard the WashU site into the DREAM Challenge evaluation network.
Background: The pre-registration page for the EHR Challenge is available here: https://www.synapse.org/#!Synapse:syn18405991/wiki/589657. The goal of this page is to provide information about the challenge (Overview, Challenge Organizers, Data Contributors, Journal Partners, Funders and Sponsors). This page also enable Synapse users to pre-register in order to receive news about the challenge launch. The pre-registration page is currently visible only to organizers.
Task: Fill in the different sections of the per-registration page before making it publicly visible to anyone.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.