mld3 / fiddle-experiments Goto Github PK

View Code? Open in Web Editor NEW

22.0 22.0 7.0 4.02 MB

Experiments applying FIDDLE on MIMIC-III and eICU. https://doi.org/10.1093/jamia/ocaa139

Python 34.11% Jupyter Notebook 62.74% Shell 3.15%

fiddle-experiments's People

Contributors

Stargazers

Watchers

Forkers

xufeng-gif harshblue pickleyang lengocduc195khtn jwu19 smidtfab harel-coffee

fiddle-experiments's Issues

FIDDLE output format

I tried to reproduce the FIDDLE experiments, however, the output X.npz is not a sparse matrix (and thus won't load using spicy.sparse.load_npz(), so I used lumpy.load()). X.npz contains:

X['data']: a long vector of only 1's
X['shape']: a vector describing the correct dimensions of the expected output tensor
X['fill_value']: a vector with just a single zero in it
X['coords']: a vector with 3 rows and the same number of columns as the length of X['data']

Is this an error or do I need to process this output first in order to get the sparse N x L x D tensor? I did not see anything in the documentation or paper regarding this. Cheers.

Bugs in mimic3_experiments

Hi Shengpu,

I have summarized some bugs in the mimic3_experiments directory. You may check them while available.

1_data_extraction

extract_data.py

Exceptions:

Line 251: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace x.INTIME with x.INTIME.to_pydatetime().

LabelDistributions.ipynb

Exceptions:

Line 44: FileNotFoundError
Line 54: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace open('config.yaml') with open('../config.yaml')
Replace x.INTIME with x.INTIME.to_pydatetime()

InclusionExclusion.ipynb

Exceptions:

Line 29: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace x.INTIME with x.INTIME.to_pydatetime()

PopulationSummary.ipynb

Exceptions:

Line 24: KeyError
Line 26: FileNotFoundError
Line 68: pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Result is too large for pandas.Timedelta. Convert inputs to datetime.datetime with 'Timestamp.to_pydatetime()' before subtracting.

Suggestions:

Replace set_index('ICUSTAY_ID') with set_index('D')
The file pop.mortality_benchmark.csv is not exist
Replace x.INTIME with x.INTIME.to_pydatetime()

2_apply_FIDDLE

Suggestion: I think it's better to include FIDDLE module in this directory. After that, there are some other bugs.

README.md

Exceptions:

Line 41: FileNotFoundError

Suggestion:

There is no file named make_features.py

run_make_all.sh

exceptions:

output_dir is required
FileNotFoundError

Suggestion:

You should set the output_dir for each run, since it's required in run.py

Since the dir features/outcome=mortality,T=48.0,dt=1.0 is replaced by features/benckmark,outcome=mortality,T=48.0,dt=1.0 in 1_data_extraction/run_prepare_all.sh, this script is not able to run:

OUTCOME=mortality
T=48.0
dt=1.0
python run.py \
    --data_fname="$DATAPATH/features/outcome=$OUTCOME,T=$T,dt=$dt/input_data.p" \

Since the file pop.mortality_benchmark.csv is not exist, this script is not able to run:

python run.py \
    --data_fname="$DATAPATH/features/benchmark,outcome=mortality,T=48.0,dt=1.0/input_data.p" \
    --population="$DATAPATH/population/pop.mortality_benchmark.csv" \

3_ML_models

lib/data.py

Exceptions:

Line 75, 121: FileNotFoundError
Line 123, 124: Directory not exist

Suggestion:

The file pop.mortality_benchmark.csv is not exist
The directory features/outcome=mortality,T=48.0,dt=1.0 is not exist and replaced by features/benckmark,outcome=mortality,T=48.0,dt=1.0

config.yaml

Exceptions:

Line 21: The feature_dimension of ARF 4.0 is not 4143

Suggestion:

Set to 4381

run_deep_eval.py

Exceptions:

Line 57: import error

Suggestion:

Replace from sklearn.externals.joblib import Parallel, delayed with from joblib import Parallel, delayed

Is the code for eICU data complete?

Hi,
Thanks for open-sourcing the code. However, it seems that the code snippets hosted in https://github.com/MLD3/FIDDLE-experiments/tree/jamia-replication/eicu_experiments/1_data_extraction only preprocess a subset of tables mentioned in the paper, namely "medication", "nurseCharting", 'patient', 'lab', 'respiratoryCare', 'intakeOutput' tables.
Am I missing something?

Thanks,
YD

Update README

IHM Benchmark: Location of "train_listfile.csv", etc not clear.

I am trying to reproduce your results using these instructions: https://github.com/MLD3/FIDDLE-experiments/tree/master/mimic3_experiments.

However, when running the IHM_Benchmark notebook, I get an error [Errno 2] No such file or directory 'train_listfile.csv'

I am not sure at which step the csv was supposed to be created since the other steps do not appear to generate it anywhere. Any help would be greatly appreciated.

Dimension error reproduction eICU experiments

Hi!

I tried to replicate the eICU experiments with the descretize option turned off, but got an error in the FIDDLE code saying "TypeError: bad operand type for unary ~: 'float' ". I adjusted the FIDDLE code and eventually it worked, but then I got a dimension error in the process of training the CNN, where it said that matrix 1 could not multiply with matrix 2 because they were not the right shape.

Do you have any idea in what direction I could go to fix this problem? Thank you so much!

Are FIDDLE features comparable across MIMIC and eICU?

I want to run an experiment to assess whether a model trained on MIMIC is able to generalize on eICU. Are the FIDDLE features comparable as it is? If not, is it possible to carve out a subset that is comparable across the datasets?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.