Giter Site home page Giter Site logo

vdda / survival-analysis-benchmark Goto Github PK

View Code? Open in Web Editor NEW

This project forked from soda-inria/survival-analysis-benchmark

0.0 0.0 0.0 21.99 MB

Exploratory repository to study predictive survival analysis models

License: MIT License

Python 0.90% Jupyter Notebook 99.10%

survival-analysis-benchmark's Introduction

Benchmarking predictive survival analysis models

This repository is dedicated to the evaluation of predictive survival models on large-ish datasets.

Software dependencies

To run the jupytercon-2023 tutorial notebooks, you will need:

conda create -n jupytercon-survival -c conda-forge python jupyterlab scikit-learn lifelines scikit-survival matplotlib-base plotly seaborn pandas pyarrow ibis-duckdb polars 

conda activate jupytercon-survival
jupyter lab

Notebooks

The notebooks folder holds the two main notebooks for the jupytercon-2023, namely:

  • tutorial_part_1.ipynb
  • tutorial_part_2.ipynb

and the ancillary notebook used to generate the dataset used in "part 1", namely:

  • truck_dataset.ipynb

The notebooks display our benchmark results and show how to use our wrappers to cross validate various models.

  • kkbox_cv_benchmark.ipynb

    Benchmark of the KKBox challenge inspired from the pycox paper.

  • msk_mettropism.ipynb

    Exploration of the MSK cancer dataset and survival probability predictions using our models.

Datasets

WSDM - KKBox's Churn Prediction Challenge (from Kaggle)

The datasets/kkbox_churn folder contains Python code to efficiently preprocess the raw transaction logs of the KKBox's Churn Prediction Challenge using ibis and duckdb.

The objectives are to:

  • make everything reproducible from the event-based logs;
  • implement efficient, parallel and out-of-core "sessionization" of the past transactions for all members: here is a "session" is an uninterrupted sequence of transactions;
  • implement efficient, parallel and out-of-core tabularization (feature and churn target with censoring);
  • make it possible to compute the cumulative state of the subscription data and the censored churn events at any point in time.

Models

The models section define the following models:

  • Yet Another Gradient Survival Boosting Tree (YASGBT): wrapper around scikit-learn HistGradientBoostingTree optimizing the Brier Score by sampling observation times and recomputing the associated target y_c for each iteration.
    from models.yasgbt import YASGBTClassifier
  • Kaplan Tree and Kaplan Neighbor: ready to use models whose architecture is adapted from XGBSE with scikit-learn estimators.
    from models.kaplan_tree import KaplanTree
  • Meta GridBC and Tree transformer: wrappers to reproduce the XGBSEDebiasedBCE architecture.
    from sklearn.pipeline import make_pipeline
    from sklearn.linear_model import LogisticRegression
    from sklearn.ensemble import RandomForestRegressor
    
    from models.tree_transformer import TreeTransformer
    from models.meta_grid_bc import MetaGridBC
    from model_selection.wrappers import PipelineWrapper
    
    tree_transformer = TreeTransformer(
        # ignores censoring so it introduces some bias
        # at the cost of speed increase
        RandomForestRegressor()
    )
    
    meta_grid_bc = MetaGridBC(
        LogisticRegression(),
        verbose=False,
        n_jobs=4,
    )
    
    forest_grid_bc = make_pipeline(
        tree_transformer,
        meta_grid_bc,
    )
    
    forest_grid_bc = PipelineWrapper(
        forest_grid_bc,
        name="BiasedForestGridBC"
    )

survival-analysis-benchmark's People

Contributors

ogrisel avatar vincent-maladiere avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.