healthylaife / mimic-iv-data-pipeline Goto Github PK
View Code? Open in Web Editor NEWA customizable pipeline for data extraction from MIMIC-IV.
License: MIT License
A customizable pipeline for data extraction from MIMIC-IV.
License: MIT License
Please refer UserInterface file to understand this.
We need to create a function such as extract() which takes the following inputs:
Output is data files stored according to chosen options and summary of data.
I found in your project for specific diseases you let the icd_code to have different value (like for CKD, the icd_code is N18), I just wonder do you offer exact item ids for the specific disease? Where can I find them? Or can you tell me how do you define the kidney disease? Thanks.
I have some problem whenever I want to preprocess non-ICU data.
ValueError: could not broadcast input array from shape (36,13) into shape (35,13)
Hi there,
Thanks for the amazing work!
When selecting the prediction window using the jupyter notebook (mainPipeline.ipynb), it seems that the corresponding code should be:
if (radio_input6.value=='Custom'): predW=int(text3.value) else: predW=int(radio_input6.value[0].strip())
for 'section 7. Time-Series Representation', cell 2. The current code seems to mix up the inputs of the prediction window and the bucket.
Best wishes
For procedures data in data\long_format\proc\long_proc_icd10_norm.csv.gz.
Add time column such that it tells if a particular procedure happened at which hour of the admission.
For example, a certain procedure was performed 3 hours after admit time, than its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.
Dear all,
When I ran the following codes from 'mainPipeline.ipynb', I got an error in ''preprocess_features_icu''. It seems 'left_thresh' was not defined. Is there any requirement for ''left_thresh'?
if data_icu:
if diag_flag:
group_diag=radio_input4.value
preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
else:
if diag_flag:
group_diag=radio_input4.value
if med_flag:
group_med=radio_input5.value
if proc_flag:
group_proc=radio_input6.value
preprocess_features_hosp(cohort_output, diag_flag,proc_flag,med_flag,False,group_diag,group_med,group_proc,False,False,0)
TypeError Traceback (most recent call last)
/tmp/ipykernel_867107/3036563163.py in
5 if diag_flag:
6 group_diag=radio_input4.value
----> 7 preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
8 else:
9 if diag_flag:
TypeError: preprocess_features_icu() missing 1 required positional argument: 'left_thresh'
Inputs:
Output:
Stored files and summary
Could you please declare a license in your repo? Without a LICENSE
file or declaration of the license terms in the README.md
, the code is technically "all rights reserved" by the authors and reuse of the code is limited. If you are unopinionated on the matter, I suggest the MIT license as it is highly compatible--particularly with university IP licenses (such as this one).
If an MIT license is acceptable I'd be happy to submit a PR to include it.
Thanks!
For procedures data in data\long_format\meds\long_med_nonproprietaryname_norm.csv.gz.
Add time column such that it tells if particular meds happened at which hour of the admission.
For example, a certain med was given 3 hours after admit time, then its time column should say 3 and if it is given 30 hours after admit time then its time column should say 30.
Similarly, create a time_end column to get the end time of each med.
For procedures data in data\long_format\labs\long_labs_units_cleaned_norm.csv.gz.
Add time column such that it tells if particular labs happened at which hour of the admission.
For example, a certain lab was performed 3 hours after admit time, then its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.
Also, find hadm_id for labs with missing hadm_id by seeing charttime of labs and calculate the time for labs in each admission.
Hi,
Thank you for creating such a useful pipeline. I have a question regarding feature selection in the pipeline.
As you have mentioned in the paper, the list of selected features is provided in the work of Wang et al. (2020). However, it is not presented in mainPipeline.ipynb. Is it included anywhere in the code? I'd like to take a look for reproducing purpose.
Thanks again,
Hung
Hi,
Thank you for this useful pipeline.
One suggestion: as you may know, many papers use hospital lab measurements to predict outcomes of ICU stays. In particular e.g. this paper uses many items in the "hosp/labevents.csv.gz" file as features for the ICU stay (linking via the hospital admission ID, "hadm_id", which is a column in the ICU stay matrix). However, I noticed that your pipeline does not natively allow the user to include lab events if they select the ICU flag.
It shouldn't be too hard to support that with the code you've already written, so just wanted to flag this point. Let me know if I'm misunderstanding something and this is already an option. Thanks!
Check the previous pipeline to replicate the outlier detection tasks in our pipeline.
Hi,
Thank you all for creating this useful and flexible pipeline.
Can you please suggest the System hardware configuration suggested for the pipeline and what will be the time taken to pre-process the data for version1. Thanks!
Thanks for releasing this wonderful pipeline! I am interested in using this pipeline, but also adding features extracted from the raw text of the notes to help prediction.
Is it possible to optionally load the raw text of the note alongside the current features?
Thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.