Giter Site home page Giter Site logo

barbarioli / dpsom Goto Github PK

View Code? Open in Web Editor NEW

This project forked from ratschlab/dpsom

0.0 0.0 0.0 599 KB

Code associated with arXiv manuscript 'DPSOM: Deep Probabilistic Clustering with Self-Organizing Maps'

Home Page: https://arxiv.org/abs/1910.01590

License: MIT License

Python 54.01% Jupyter Notebook 45.99%

dpsom's Introduction

T-DPSOM - An Interpretable Clustering Method for Unsupervised Learning of Patient Health States

Reference

Laura Manduchi, Matthias Hüser, Martin Faltys, Julia Vogt, Gunnar Rätsch,and Vincent Fortuin. 2021. T-DPSOM - An Interpretable Clustering Methodfor Unsupervised Learning of Patient Health States. InACM Conference onHealth, Inference, and Learning (ACM CHIL ’21), April 8–10, 2021, VirtualEvent, USA.ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3450439.3451872

Training and Evaluation

Deep Probabilistic SOM

The training script of DPSOM model is dpsom/DPSOM.py, the model is defined in dpsom/DPSOM_model.py. To train and test the DPSOM model on the MNIST dataset using default parameters and feed-forward layers:

python DPSOM.py

This will train the model and then it will output the clustering performance on test set.

To use convolutional layers:

python DPSOM.py with convolution=True

Other possible configurations:

  • validation: if True it will evaluate the model on validation set (default False).
  • val_epochs: if True the clustering results are saved every 10 epochs on default output files (default False).
  • more_runs: if True it will run the model 10 times and it will output the NMI and Purity means with standard errors (default False).

To train and test the DPSOM model on the Fashion MNIST dataset using default parameters and feed-forward layers:

python DPSOM.py with data_set="fMNIST" beta=0.4

To use convolutional layers:

python DPSOM.py with data_set="fMNIST" beta=0.4 convolution=True

To investigate the role of the weight of the SOM loss use

python DPSOM.py with beta=<new_value>

default is beta=0.25.

To reconstruct the centroids of the learned 2D SOM grid into the input space we refer to the Notebook notebooks/centroids_rec.ipynb.

Temporal DPSOM

eICU preprocessing pipeline

The major preprocessing steps, which have to be performed sequentially, starting from the raw eICU tables in CSV format, are listed below. The scripts expect the tables to be stored in data/csv. Intermediate data is stored in various sub-folders of data.

(a) Conversion of raw CSV tables, which can be downloaded from https://eicu-crd.mit.edu/ after access is granted, to HDF versions of the tables. (eicu_preproc/hdf_convert.py)

(b) Filtering of ICU stays based on inclusion criteria. (eicu_preproc/save_all_pids.py, eicu_preproc/filter_patients.py)

(c) Batching patient IDs for cluster processing (eicu_preproc/compute_patient_batches.py)

(d) Selection of variables to include in the multi-variate time series, from the vital signs and lab measurement tables. (eicu_preproc/filter_variables.py)

(e) Conversion of the eICU data to a regular time grid format using forward filling imputation, which can be processed by VarTPSOM. (eicu_preproc/timegrid_all_patients.py, eicu_preproc/timegrid_one_batch.py)

(f) Labeling of the time points in the time series with the current/future worse physiology scores as well as dynamic mortality, which are used in the enrichment analyses and data visualizations. (eicu_preproc/label_all_patients.py, eicu_preproc/label_one_batch.py)

Saving the eICU data-set

Insert the paths of the obtained preprocessed data into the script eicu_preproc/save_model_inputs.py and run it.

The script selects the last 72 time-step of each time-series and the following labels:

'full_score_1', 'full_score_6', 'full_score_12', 'full_score_24', 'hospital_discharge_expired_1', 'hospital_discharge_expired_6', 'hospital_discharge_expired_12', 'hospital_discharge_expired_24', 'unit_discharge_expired_1', 'unit_discharge_expired_6', 'unit_discharge_expired_12', 'unit_discharge_expired_24'

It then saves the dataset in a csv table in data/eICU_data.csv.

Training the model

Once the data is saved in data/eICU_data.csv, the entire model can be trained using:

python TempDPSOM.py

It will output NMI clustering results using APACHE scores as labels and save them in results_eICU.txt.

Better prediction performances can be obtain with:

python TempDPSOM.py with latent_dim=100

It will save the prediction performances on the file results_eICU_pred.txt.

To train the model without prediction, use:

python TempDPSOM.py with eta=0

To train the model without smoothness loss and without prediction, use:

python TempDPSOM.py with eta=0 kappa=0

Other experiments as computing heatmaps and trajectories can be found in notebook/eICU_experiments.ipynb.

dpsom's People

Contributors

lauramanduchi avatar mhueser avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.