healthylaife / mimic-iv-data-pipeline Goto Github PK

View Code? Open in Web Editor NEW

153.0 4.0 50.0 229.94 MB

A customizable pipeline for data extraction from MIMIC-IV.

License: MIT License

Jupyter Notebook 93.97% Python 6.03%

electronic-health-records machine-learning reproducible-research mimic mimic-iv

mimic-iv-data-pipeline's People

Contributors

Stargazers

Watchers

Forkers

ruiatelsevier vzshi mehak25 docunichorn rajsinghusa su-lan yirufang2001 sakheed pickleyang zero506 bsconsoli will570 stijnberendse yihaotan amnalhosani shaoxuanren duygutopaloglu wetiqe jxtreehouse isalj yaorongge hyeonhoonlee harel-coffee bernardo1998 fdvanleeuwen xiaochenwang-psu ocelottamer raif-fl mathias-samuelides vanderschaarlab lengocduc195khtn dalanpaa annie983284450-1 dtopaloglu tomerzipori sophiewharrie za3331 bryanjangeesingh drfaisal mosthumble jjfeng sophmrtn kdongyoung andystevens98 rohanpandey opencv13 damaohongtu

mimic-iv-data-pipeline's Issues

Create Function to extract data

Please refer UserInterface file to understand this.
We need to create a function such as extract() which takes the following inputs:

Type of data - ICU or Non-ICU
Prediction task - 30-day, 60-day readmission or mortality

Output is data files stored according to chosen options and summary of data.

How to get exact item code for specific disease

I found in your project for specific diseases you let the icd_code to have different value (like for CKD, the icd_code is N18), I just wonder do you offer exact item ids for the specific disease? Where can I find them? Or can you tell me how do you define the kidney disease? Thanks.

non-ICU data

I have some problem whenever I want to preprocess non-ICU data.
ValueError: could not broadcast input array from shape (36,13) into shape (35,13)

Issue of prediction window

Hi there,

Thanks for the amazing work!

When selecting the prediction window using the jupyter notebook (mainPipeline.ipynb), it seems that the corresponding code should be:
if (radio_input6.value=='Custom'): predW=int(text3.value) else: predW=int(radio_input6.value[0].strip())

for 'section 7. Time-Series Representation', cell 2. The current code seems to mix up the inputs of the prediction window and the bucket.

Best wishes

Add time information to procedures

For procedures data in data\long_format\proc\long_proc_icd10_norm.csv.gz.
Add time column such that it tells if a particular procedure happened at which hour of the admission.
For example, a certain procedure was performed 3 hours after admit time, than its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.

TypeError: preprocess_features_icu() missing 1 required positional argument: 'left_thresh'

Dear all,

When I ran the following codes from 'mainPipeline.ipynb', I got an error in ''preprocess_features_icu''. It seems 'left_thresh' was not defined. Is there any requirement for ''left_thresh'?

if data_icu:
if diag_flag:
group_diag=radio_input4.value
preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
else:
if diag_flag:
group_diag=radio_input4.value
if med_flag:
group_med=radio_input5.value
if proc_flag:
group_proc=radio_input6.value
preprocess_features_hosp(cohort_output, diag_flag,proc_flag,med_flag,False,group_diag,group_med,group_proc,False,False,0)

TypeError Traceback (most recent call last)
/tmp/ipykernel_867107/3036563163.py in
5 if diag_flag:
6 group_diag=radio_input4.value
----> 7 preprocess_features_icu(cohort_output, diag_flag, group_diag,False,False,False,0)
8 else:
9 if diag_flag:

TypeError: preprocess_features_icu() missing 1 required positional argument: 'left_thresh'

create a function to preprocess data

Inputs:

whether to group codes
1. outlier removal.
time series smoothing.
If all admission data is needed or only last 24 or last 48 hours of data.

Output:
Stored files and summary

No license is declared

Could you please declare a license in your repo? Without a LICENSE file or declaration of the license terms in the README.md, the code is technically "all rights reserved" by the authors and reuse of the code is limited. If you are unopinionated on the matter, I suggest the MIT license as it is highly compatible--particularly with university IP licenses (such as this one).

If an MIT license is acceptable I'd be happy to submit a PR to include it.

Thanks!

Add time to Medications

For procedures data in data\long_format\meds\long_med_nonproprietaryname_norm.csv.gz.
Add time column such that it tells if particular meds happened at which hour of the admission.
For example, a certain med was given 3 hours after admit time, then its time column should say 3 and if it is given 30 hours after admit time then its time column should say 30.
Similarly, create a time_end column to get the end time of each med.

Add time for labs

For procedures data in data\long_format\labs\long_labs_units_cleaned_norm.csv.gz.
Add time column such that it tells if particular labs happened at which hour of the admission.
For example, a certain lab was performed 3 hours after admit time, then its time column should say 3 and if it is performed 30 hours after admit time then its time column should say 30.
Also, find hadm_id for labs with missing hadm_id by seeing charttime of labs and calculate the time for labs in each admission.

Code for Labs/Vitals Fetures Selection, that was mentioned in the paper.

Hi,

Thank you for creating such a useful pipeline. I have a question regarding feature selection in the pipeline.
As you have mentioned in the paper, the list of selected features is provided in the work of Wang et al. (2020). However, it is not presented in mainPipeline.ipynb. Is it included anywhere in the code? I'd like to take a look for reproducing purpose.

Thanks again,
Hung

Using hospital labs for ICU prediction tasks?

Hi,

Thank you for this useful pipeline.
One suggestion: as you may know, many papers use hospital lab measurements to predict outcomes of ICU stays. In particular e.g. this paper uses many items in the "hosp/labevents.csv.gz" file as features for the ICU stay (linking via the hospital admission ID, "hadm_id", which is a column in the ICU stay matrix). However, I noticed that your pipeline does not natively allow the user to include lab events if they select the ICU flag.

It shouldn't be too hard to support that with the code you've already written, so just wanted to flag this point. Let me know if I'm misunderstanding something and this is already an option. Thanks!

Outlier Detection

Check the previous pipeline to replicate the outlier detection tasks in our pipeline.

System hardware configuration suggested for the pipeline?

Hi,

Thank you all for creating this useful and flexible pipeline.

Can you please suggest the System hardware configuration suggested for the pipeline and what will be the time taken to pre-process the data for version1. Thanks!

Option to load raw notes?

Thanks for releasing this wonderful pipeline! I am interested in using this pipeline, but also adding features extracted from the raw text of the notes to help prediction.

Is it possible to optionally load the raw text of the note alongside the current features?

Thanks!

Errors exist in '9. Deep Learning Models' and '10. Running BEHRT'

Dear Sir/Madam,

It seems there are errors on '9. Deep Learning Models' and '10. Running BEHRT'. Could you help check it? Thanks.