autonlab / auton-survival Goto Github PK

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events

Home Page: http://autonlab.github.io/auton-survival

License: MIT License

Python 100.00%

survival-analysis reliability-analysis python data-science deep-learning machine-learning time-to-event counterfactual-inference regression causal-inference

auton-survival's People

Contributors

Stargazers

Watchers

Forkers

agent-hellboy texchi2 1512474508 chufangao kishanmaharaj joebhakim shikhareddy tanmdl empyriumz jeanselme lamfts kaixinhuaihuai sunjae-p sertelnov jzk00 amirmina rosjoh linhduongtuan jiaxiang-cheng mononitogoswami hhenryd injuredroman vedant-sanil chiragnagpal anononymous-lab gatechke qinzzz oliverh32 juancq shibowen1999 seungbin-im gdupret raminsalmas rohitpandey13 ilovecodingforever engoiya horizonailab ziyit and2797 fmachiri ricklentz muxin0527 deephit atzenimichele serignecisse daveh19 techthiyanes davidlee1102 cipher-wzy sebge leo-o333 survml dongin914 ivanunito caaperezag matteo4diani luisperezlombardia jyluo1994 5l1v3r1 computerscienceiscool zanderkeith feckneck raheems rvandewater cctrotte yujing1997 maka19999 klementevae tdl77 junetaekim sandy4321

auton-survival's Issues

Time varying

Hi!

Thank you for this work. Are there any possibility of use this library for survival analysis with time varying using deep learning approaches???

Thanks
Pablo

Instantiating DeepRecurrentSurvivalMachines object without parameters breaks many things

Instantiating DeepRecurrentSurvivalMachines object without parameters does not raise errors/warnings, but it breaks when calling fit.

This is currently valid:

model = DeepRecurrentSurvivalMachines()

But breaks when calling fit:

model.fit(x, t, e)

Generates the following error with python 3.7:

File "test.py", line 13, in <module>
    model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 256, in fit
    model = self._gen_torch_model(inputdim, optimizer, risks=maxrisk)
File "auton_survival/models/dsm/__init__.py", line 535, in _gen_torch_model
risks=risks)
File "auton_survival/models/dsm/dsm_torch.py", line 271, in __init__
    self._init_dsm_layers(hidden)
File "auton_survival/models/dsm/dsm_torch.py", line 164, in _init_dsm_layers
    ) for r in range(self.risks)})
File "auton_survival/models/dsm/dsm_torch.py", line 164, in <dictcomp>
    ) for r in range(self.risks)})
File "anaconda3/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 76, in __init__
    self.weight = Parameter(torch.Tensor(out_features, in_features))
TypeError: new(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2

Error with python 3.9:

File "test.py", line 13, in <module>
    model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 256, in fit
    model = self._gen_torch_model(inputdim, optimizer, risks=maxrisk)
File "auton_survival/models/dsm/__init__.py", line 526, in _gen_torch_model
    return DeepRecurrentSurvivalMachinesTorch(inputdim,
File "auton_survival/models/dsm/dsm_torch.py", line 271, in __init__
    self._init_dsm_layers(hidden)
File "auton_survival/models/dsm/dsm_torch.py", line 162, in _init_dsm_layers
    self.gate = nn.ModuleDict({str(r+1): nn.Sequential(
File "auton_survival/models/dsm/dsm_torch.py", line 163, in <dictcomp>
    nn.Linear(lastdim, self.k, bias=False)
File "/anaconda3/lib/python3.9/site-packages/torch/nn/modules/linear.py", line 81, in __init__
    self.weight = Parameter(torch.empty((out_features, in_features), **factory_kwargs))
TypeError: empty(): argument 'size' must be tuple of ints, but found element of type NoneType at pos 2

A quick fix could be raising an error in the DeepRecurrentSurvivalMachines constructor based on required parameters or using appropriate default parameters.

RuntimeError: expected scalar type Float but found Double

Hi! Thanks for the great package!
Any ideas on why the same dataset works for all models, except the dsm one?

here is the error log:

At hyper-param {'distribution': 'Weibull', 'k': 2, 'layers': [100, 100], 'learning_rate': 1e-05}
At fold: 0
100%|███████████████████████████████████| 10000/10000 [00:05<00:00, 1678.94it/s]
100%|██████████████████████████████████████████| 50/50 [00:00<00:00, 222.40it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In [126], line 34
     24 # Instantiate an auton_survival Experiment 
     25 #dsm  cph
     26 #Survival model choices include:
   (...)
     30 # |      - 'rsf' : Random Survival Forests [1] model
     31 # |      - 'cph' : Cox Proportional Hazards [2] model
     32 experiment = SurvivalRegressionCV(model='dsm', num_folds=6, 
     33                                     hyperparam_grid=param_grid)
---> 34 model = experiment.fit(x, outcomes, metric='ibs',horizons=times)
     36 times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()
     38 # Fit the `experiment` object with the specified Cox model.
     39 #experiment = estimators.SurvivalModel(model='dsm')
     40 #model = experiment.fit(x, outcomes)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/experiments.py:164, in SurvivalRegressionCV.fit(self, features, outcomes, horizons, metric)
    162 model = SurvivalModel(self.model, random_seed=self.random_seed, **hyper_param)
    163 model.fit(features.loc[self.folds!=fold], outcomes.loc[self.folds!=fold])
--> 164 predictions = model.predict_survival(features.loc[self.folds==fold], times=horizons)
    166 score = survival_regression_metric(metric=self.metric, 
    167                                    outcomes=outcomes.loc[self.folds==fold],
    168                                    predictions=predictions,
    169                                    times=horizons,
    170                                    outcomes_train=outcomes.loc[self.folds!=fold])
    171 fold_scores.append(np.mean(score))

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/estimators.py:701, in SurvivalModel.predict_survival(self, features, times)
    699   return _predict_rsf(self._model, features, times)
    700 elif self.model == 'dsm':
--> 701   return _predict_dsm(self._model, features, times)
    702 elif self.model == 'dcph':
    703   return _predict_dcph(self._model, features, times)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/estimators.py:420, in _predict_dsm(model, features, times)
    400 def _predict_dsm(model, features, times):
    402   """Predict survival at specified time(s) using the Deep Survival Machines.
    403 
    404   Parameters
   (...)
    417 
    418   """
--> 420   survival_predictions = model.predict_survival(x=features.values, t=times)
    421   survival_predictions = pd.DataFrame(survival_predictions, columns=times).T
    423   return __interpolate_missing_times(survival_predictions, times)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/__init__.py:415, in DSMBase.predict_survival(self, x, t, risk)
    413   t = [t]
    414 if self.fitted:
--> 415   scores = losses.predict_cdf(self.torch_model, x, t, risk=str(risk))
    416   return np.exp(np.array(scores)).T
    417 else:

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/losses.py:518, in predict_cdf(model, x, t_horizon, risk)
    516 torch.no_grad()
    517 if model.dist == 'Weibull':
--> 518   return _weibull_cdf(model, x, t_horizon, risk)
    519 if model.dist == 'LogNormal':
    520   return _lognormal_cdf(model, x, t_horizon, risk)

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/losses.py:335, in _weibull_cdf(model, x, t_horizon, risk)
    331 def _weibull_cdf(model, x, t_horizon, risk='1'):
    333   squish = nn.LogSoftmax(dim=1)
--> 335   shape, scale, logits = model.forward(x, risk)
    336   logits = squish(logits)
    338   k_ = shape

File /mnt/survival_notebooks/../auton-survival-master/auton_survival/models/dsm/dsm_torch.py:204, in DeepSurvivalMachinesTorch.forward(self, x, risk)
    196 def forward(self, x, risk='1'):
    197   """The forward function that is called when data is passed through DSM.
    198 
    199   Args:
   (...)
    202 
    203   """
--> 204   xrep = self.embedding(x)
    205   dim = x.shape[0]
    206   return(self.act(self.shapeg[risk](xrep))+self.shape[risk].expand(dim, -1),
    207          self.act(self.scaleg[risk](xrep))+self.scale[risk].expand(dim, -1),
    208          self.gate[risk](xrep)/self.temp)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/container.py:139, in Sequential.forward(self, input)
    137 def forward(self, input):
    138     for module in self:
--> 139         input = module(input)
    140     return input

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/module.py:1130, in Module._call_impl(self, *input, **kwargs)
   1126 # If we don't have any hooks, we want to skip the rest of the logic in
   1127 # this function, and just call forward.
   1128 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1129         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1130     return forward_call(*input, **kwargs)
   1131 # Do not call functions when jit is used
   1132 full_backward_hooks, non_full_backward_hooks = [], []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: expected scalar type Float but found Double

and the code:

# auton_survival cross-validation experiment.
from auton_survival.datasets import load_dataset
from auton_survival.preprocessing import Preprocessor
from auton_survival.metrics import survival_regression_metric
from auton_survival import estimators

param_grid = {'k' : [2],
              'distribution' : ['Weibull'],
              'learning_rate' : [1e-5],
              'layers' : [[100,100]]}

#outcomes, features = load_dataset(dataset='SUPPORT')
cat_feats = []
num_feats = list(features.columns)

preprocessor = Preprocessor(cat_feat_strat='ignore', num_feat_strat= 'mean') 
x = preprocessor.fit_transform(features, cat_feats=cat_feats, num_feats=num_feats,
                                one_hot=True, fill_value=-1)

x_val = preprocessor.fit_transform(features_val, cat_feats=cat_feats, num_feats=num_feats,
                                one_hot=True, fill_value=-1)

from auton_survival.experiments import SurvivalRegressionCV
# Instantiate an auton_survival Experiment 
#dsm  cph
#Survival model choices include:
# |      - 'dsm' : Deep Survival Machines [3] model
# |      - 'dcph' : Deep Cox Proportional Hazards [2] model
# |      - 'dcm' : Deep Cox Mixtures [4] model
# |      - 'rsf' : Random Survival Forests [1] model
# |      - 'cph' : Cox Proportional Hazards [2] model
experiment = SurvivalRegressionCV(model='dsm', num_folds=6, 
                                    hyperparam_grid=param_grid)
model = experiment.fit(x, outcomes, metric='ibs',horizons=times)

times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()

# Fit the `experiment` object with the specified Cox model.
#experiment = estimators.SurvivalModel(model='dsm')
#model = experiment.fit(x, outcomes)

times_val = np.quantile(outcomes_val.time[outcomes_val.event==1], [0.25, 0.5, 0.6]).tolist()
out_risk = model.predict_risk(x_val, times)
out_survival = model.predict_survival(x_val, times)  

print("Times:",times_val)
print("Brier scores")
print(survival_regression_metric('brs', outcomes_val, 
                                     out_survival, 
                                     times=times_val))
    

print("Time Dependent Concordance Index")
print(survival_regression_metric('ctd', outcomes_val, 
                                     out_survival, 
                                     times=times_val))

Predict Score Function for Deep Cox Mixtures (PyTorch Implementation)

Hi! Really inspired by this work and tried to implement the DCM model in our project. Noticed that there was a predict_scores function in the tensorflow implementation of dcm but unable to find the same function in PyTorch. Would really appreciate your help here.
Thank you!

Deep Cox Mixtures Competing Risks

Is there any implementation of Deep Cox Mixtures that supports competing risks?

Usage of GPU

Hello. I'm jamie from Yonsei Grad school.

I've been researching for Deep survival models using SEER data and i found out yours!
I've tried to make the use of DSM(DeepSruvivalMachines) but, it takes so long to have results due to my heavy data. So, i tried gpu_support branch but it didn't work at all (It was working in only CPU) :(
As far as i'm concerned, DSM is an installed package based on pytorch, which GPU can be supported. I'm wondering whether all branches including 'GPU_support' work on only in CPU?
I'm looking forward to hearing you. Thanks for your sincere efforts.

Link with SHAP value

Hello. This is Jamie from South Korea.

https://github.com/slundberg/shap <- This is what i'd like to link with Deep Survival Machines. I've tried it with KernelExplainer to see SHAP value but it failed. As a publisher of DSM, do you think one of explainers support for DSM with Survival data?

Always thank you for your efforts!
Best,
Jamie.

Make code a package

Visualize the structure of the model and the tranining process

sorry for interruption, I want to know if there any way to visualize the best model or the final model just like summary function of print function. It would be much better if tensorboard is available for visualize the training process.

Increasing validation loss in RDSM with time-varying data

Hi auton-survival community @chiragnagpal @Jeanselme @chufangao @salvaRC , I appreciate your contribution to the time-varying survival analysis and thank you for making this library open to public.

I am having an issue with training RDSM using my own custom dataset. I made sure that my dataset looks like the one you used in jupyter notebook tutorials; my T column is the remaining time to event, and E is an event indicator. However, during training I have been continuously seeing decreasing training loss but increasing validation loss.

Then I tried to see what happens if I train with PBC dataset that you used in demo notebooks, and I noticed the same situation there; decreasing training loss and increasing validation loss.

I haven't made any changes to the methodology. Is this something intrinsic to RDSM or am I doing something wrong? These are the logs from the model

random_seed is not defined in DeepRecurrentSurvivalMachines class

NameError: name 'random_seed' is not defined

https://github.com/autonlab/auton-survival/blob/4af6ebe2bdb24e8840c50de86c1864b8fa3c18a/auton_survival/models/dsm/__init__.py#L517

Possible fix:

  def __init__(self, k=3, layers=None, hidden=None,
               distribution="Weibull", temp=1000., discount=1.0, typ="LSTM", random_seed=0):
    super(DeepRecurrentSurvivalMachines, self).__init__(k=k,
                                                        layers=layers,
                                                        distribution=distribution,
                                                        temp=temp,
                                                        discount=discount,
                                                        random_seed=random_seed)

ValueError: Input estimate contains NaN

Hello! Thanks for such unique package. I am trying to use DeepSurvivalMachines (note: for example, on the same dataset DeepCoxMixtures work without any issues), here is the error log:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [24], line 12
      8 et_val = np.array([(e_val[i], t_val[i]) for i in range(len(e_val))],
      9                  dtype = [('e', bool), ('t', float)])
     11 for i, _ in enumerate(times):
---> 12     cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
     13 #brs.append(brier_score(et_train, et_test, out_survival, times)[1])
     14 roc_auc = []

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/metrics.py:324, in concordance_index_ipcw(survival_train, survival_test, estimate, tau, tied_tol)
    321     mask = test_time < tau
    322     survival_test = survival_test[mask]
--> 324 estimate = _check_estimate_1d(estimate, test_time)
    326 cens = CensoringDistributionEstimator()
    327 cens.fit(survival_train)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sksurv/metrics.py:36, in _check_estimate_1d(estimate, test_time)
     35 def _check_estimate_1d(estimate, test_time):
---> 36     estimate = check_array(estimate, ensure_2d=False, input_name="estimate")
     37     if estimate.ndim != 1:
     38         raise ValueError(
     39             'Expected 1D array, got {:d}D array instead:\narray={}.\n'.format(
     40                 estimate.ndim, estimate))

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File ~/miniconda3/envs/pycox310/lib/python3.10/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input estimate contains NaN.

Some more details:

I convert all df to the float64, and to int64:

features = df_features.copy().astype('float64')

outcomes = pd.DataFrame()
outcomes['event'] = pd.DataFrame(data_y)['Status'].astype('int64')
outcomes['time'] = pd.DataFrame(data_y)['Survival_in_days'].astype('int64')

features_val = df_features_val.copy().astype('float64')
outcomes_val = pd.DataFrame()
outcomes_val['event'] = pd.DataFrame(data_y_val)['Status'].astype('int64')
outcomes_val['time'] = pd.DataFrame(data_y_val)['Survival_in_days'].astype('int64')

Then training the model:

from auton_survival.models.dsm import DeepSurvivalMachines
from sklearn.model_selection import ParameterGrid

param_grid = {'k' : [3, 4, 6],
              'distribution' : ['LogNormal', 'Weibull'],
              'learning_rate' : [ 1e-4, 1e-3],
              'layers' : [ [], [100], [100, 100] ]
             }
params = ParameterGrid(param_grid)

models = []
for param in params:
    model = DeepSurvivalMachines(k = param['k'],
                                 distribution = param['distribution'],
                                 layers = param['layers'])
    
    # The fit method is called to train the model
    model.fit(x, outcomes.time, outcomes.event, iters = 100, learning_rate = param['learning_rate'])
    models.append([[model.compute_nll(x_val, outcomes_val.time, outcomes_val.event), model]])
best_model = min(models)
model = best_model[0][1]

And then it fails on the evaluation step:

cis = []
brs = []

et_train = np.array([(e_train[i], t_train[i]) for i in range(len(e_train))],
                 dtype = [('e', bool), ('t', float)])
et_test = np.array([(e_test[i], t_test[i]) for i in range(len(e_test))],
                 dtype = [('e', bool), ('t', float)])
et_val = np.array([(e_val[i], t_val[i]) for i in range(len(e_val))],
                 dtype = [('e', bool), ('t', float)])
times = np.quantile(outcomes.time[outcomes.event==1], [0.25, 0.5, 0.6]).tolist()
for i, _ in enumerate(times):
    cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])

When I check out_risk[:, I] that was created by the model.predict_risk(x_val, times) its all nans:

[nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan,
       nan, nan, nan, nan, nan, nan, nan, nan, nan, nan]

Does that mean that the model did not converge? Any tips are appreciated!

`models.{dcm, sdcm, dsm, dcph}` should support fit on `pandas.Dataframe`

Where is dsm_api?

I can't find dsm_api. It would be helpful for debugging object instantiation of the recurrent deep survival machine class.

Add usage example notebooks

Currently we do not have notebooks that compare performance of DSM with other models.

We would need to compare against:
-DeepHit
-DeepSurv
-Random Survival Forest
-Cox PH
on Time Dependent CI and Brier Score.

DeepCoxMixtures is missing predict_risk method

The DeepCoxMixtures is missing the predict_risk method. Most other classes (I haven't fully checked) have a predict_risk method.

AUC time horizon

Thank you so much for the hard work
I have one question regarding the AUC metric.
And the time horizon.
The AUC is reported as an array and the times are reported as an array.

For example, if the time intervals are 0 to 1 , 1 to 2 and 2 to 3.

And the array is ( 0,1,2,3)

While the AUC is 0.6, 0.7, 0.8

So the array is (0.6,0.7.0.8)

Does this mean that the AUC for the period 0 till 3 is 0.8?
Or the AUC for the period 2 to 3 is 0.8?

The parameter of shape and scale become NaN

hello, I use to DeepRecurrentSurvivalMachines to model the staying length in ICU, the parameter of shape and scale become NaN after the first back propagation. how to deal with input data?

License issue

Hello,

I noticed that you use the MIT license for your library; however, in the dependencies, you have GPL3 libraries.

More exactly, the scikit-survival dependency is distributed under the GPL-3.0 License.

If you incorporate this kind of license, you also need to be GPL-3 (reference)

This can be problematic in the open-source community, for anyone using your library.

Don't get me wrong, I want to use your library, but your depends would force me to also distribute under the GPL-3 license.
Please remove the GPL-3 dependencies or update your license accordingly.

metrics.py indexes problem

Hi,
I am using your package for the development of a dcph model. In order to see the performances on a validation set, I call the functions from metrics.py; as an example:

brier = survival_regression_metric('brs', outcomes = validation_outcome,
                                      predictions = out_survival,
                                      times=times, outcomes_train=training_outcome)

However, it throws me the following error:

_File "/Users/micheleatzeni/PycharmProjects/brainteaser/auton-survival/auton_survival/metrics.py", line 235, in survival_regression_metric
return _metric(survival_train, survival_test, predictions, times)
File "/Users/micheleatzeni/PycharmProjects/brainteaser/auton-survival/auton_survival/metrics.py", line 265, in cumulative_dynamic_auc
return metrics.cumulative_dynamic_auc(survival_train, survival_test[idx], 1-predictions[idx], times)[0]
IndexError: index 628 is out of bounds for axis 0 with size 628

Since python lists as well as numpy arrays are 0-based, the largest index in an array of size 628 would be 627.
So looking at the specific metrics.py functions, I think the possible problem is in the return statement (idx):

def _brier_score(survival_train, survival_test, predictions, times, random_seed=None):

  idx = np.arange(len(predictions))
  if random_seed is not None:
    np.random.seed(random_seed)
    idx = np.random.choice(idx, len(predictions), replace=True)

  return metrics.brier_score(survival_train, survival_test[idx], predictions[idx], times)[-1]

It should be idx-1?

Need logging module for debugging

We need to be able to log events (Info, warnings etc) at various levels for debugging purposes.

install problem

hello:
I want to install the package,please tell mo how to install by conda or pip?
Thank you!

Implementation for time-varying survival analysis

Hello,

Thanks for all the good work put into this package, it is definitely a big contribution to the community. I am currently looking into this package to model moving dates in a real state context.

I wanted to know about the time-varying implementation of the RDSM model. Currently working with time series data and time to event, so I wanted to know what are the caviats that we need to keep in mind when implementing the RDSM in comparison to the baseline DSM. Are there any particular steps that we need to be careful of during the pre-processing (apart from the obvious expanded dataset) such as high dimensionality, categorical variables, missing values or any other ?

For anyone wondering about the time-varying implementation, there is an example added:
https://github.com/autonlab/auton-survival/blob/master/examples/RDSM%20on%20PBC%20Dataset.ipynb

and a publication:
http://proceedings.mlr.press/v146/nagpal21a/nagpal21a.pdf

Add support for GPU execution

Currently the torch implementations use CPUs, it should be easy to change the API to allow for GPU training.

survival_diff_metric is not defined

Hello!
In the readme there is an import of survival_diff_metric, but its not defined in the metrics.py file

what is "uncensored time control" and "uncensored time treated"

Hi there! Thanks for the great repo!
I have a q: what are "uncensored time control" and "uncensored time treated" in the CMHE notebook example in the outcomes df and how to derive them? Thanks! https://nbviewer.org/github/autonlab/auton-survival/blob/master/examples/Demo%20of%20CMHE%20on%20Synthetic%20Data.ipynb

ECE for survival as used in Deep Cox Mixtures Paper

Can you provide (via response to this message or via gist) the source code used to calculate the expected calibration error as used in the Deep Cox Mixtures paper?

Applying DSM on competing risks data

Hello!

I am trying to apply Deep Survival Machine to my dataset which has 23 events.
And the results are not as good as I expected.

In the DSM paper, with SEER data, DSM and DeepHit c-index results are pretty comparable,
but with my data DeepHit c-index results are about 10% better than DSM.

I made code based on your DSM example notebook code.
I only changed the evaluation part since the example notebook code is for a single event.

Is it possible that DSM is unsuitable for data with many events?
Or just because my code has a problem?

Here is my code.

from process_severance import make_data
from auton_survival.preprocessing import Preprocessor
from sklearn.model_selection import train_test_split
from sklearn.model_selection import KFold
from sklearn.model_selection import ParameterGrid
from auton_survival.models.dsm import DeepSurvivalMachines
from sksurv.metrics import concordance_index_ipcw, brier_score

data_path = './data/dummy_data.csv'
input, Y, features = make_data(data_path)

cat_feats = ["SEX1"]
num_feats = features
num_feats.remove('SEX1')
        
features = Preprocessor().fit_transform(input, cat_feats=cat_feats, num_feats=num_feats)

horizons = [0.25, 0.5, 0.75]
times = np.nanquantile(Y["event_time"], horizons).tolist()

x, t, e = input.to_numpy(), Y["event_time"].to_numpy(), Y["label"].to_numpy()

kf = KFold(n_splits=5, shuffle=True, random_state=1234)
fold = 0

for train_index, test_index in kf.split(x):
    x_train     = x[train_index]
    t_train     = t[train_index]
    e_train     = e[train_index]
    x_test      = x[test_index]
    t_test      = t[test_index]
    e_test      = e[test_index]

    (x_train, x_val, t_train,t_val, e_train,e_val)  = train_test_split(x_train, t_train, e_train, test_size=0.20, random_state=1234) 
    

    param_grid = {'k' : [3, 4, 6],
            'distribution' : ['LogNormal', 'Weibull'],
            'learning_rate' : [ 1e-4, 1e-3],
            'layers' : [ [], [100], [100, 100] ]
            }
    params = ParameterGrid(param_grid)

    models = []
    for param in params:
        model = DeepSurvivalMachines(k = param['k'],
                                    distribution = param['distribution'],
                                    layers = param['layers'])
        # The fit method is called to train the model
        model.fit(x_train, t_train, e_train, iters = 100, learning_rate = param['learning_rate'])
        models.append([[model.compute_nll(x_val, t_val, e_val), model, param]])
        #break
    best_model = min(models)

    out_risk = model.predict_risk(x_test, times)
    out_survival = model.predict_survival(x_test, times)

    for ev in range(23):
        cis = []
        brs = []
        e_train_new = (e_train == ev+1)
        e_test_new = (e_test == ev+1)

        et_train = np.array([(e_train_new[i], t_train[i]) for i in range(len(e_train_new))],
                        dtype = [('e', bool), ('t', float)])
        et_test = np.array([(e_test_new[i], t_test[i]) for i in range(len(e_test_new))],
                        dtype = [('e', bool), ('t', float)])

        for i, _ in enumerate(times):
            try:
                cis.append(concordance_index_ipcw(et_train, et_test, out_risk[:, i], times[i])[0])
            except:
                cis.append(np.nan)
        try:
            brs.append(brier_score(et_train, et_test, out_survival, times)[1])
        except:
            brs.append([np.nan,np.nan,np.nan])

        for i, horizon in enumerate(horizons):
            print(f"For {horizon} quantile,")
            print("TD Concordance Index:", cis[i])
            print("Brier Score:", brs[0][i])
    fold = fold + 1

Problems in DeepCoxMixture with possible solutions

When I used DeepCoxMixture in my synthetic data I found some problems:

Patience is too low. It cannot be controlled by the fit method, so I had to manually changed it to 50. I also change the code a little bit in that part, so the patience only take into account the best result:

var add2 = function(number) {
     if valcn > valc:
     patience_ += 1
   else:
     patience_ = 0
     valc = valcn
}

In some execution I get a SIGSEGV error. After some digging I found the error is caused by the UnivariateSpline module. I found the use of this spline causes some undesirable effects, like placing negative values. I have changed this module to the more stable Pchipinterpolator, which is able to preserve the monotony of the curve. I have obtained mode stable results with this approach. I have only changed the function:

def fit_spline(t, surv, s=1e-4):
  # return UnivariateSpline(t, surv, s=s, ext=3, k=1)
  return PchipInterpolator(t, surv)

I also suggest to change the repair_probs function to prevent some infinite values that can appear:

def repair_probs(probs):
  probs[torch.isnan(probs)] = -10
  probs[probs>10] = 10
  probs[probs<-10] = -10
  return probs

C-index calculation for RDSM

Hi there,

In your impressive work, you compared the performance of the longitudinal model RDSM with several time-independent models. And RDSM achieved the best performance in most cases.

In the example notebook demonstrating the usage of RDSM, I notice that in calculating the C-index, by using something like:

et_train = np.array([(e_train[i][j], t_train[i][j]) for i in range(len(e_train)) for j in range(len(e_train[i]))], dtype = [('e', bool), ('t', float)])

the input essentially treats each time step of a patient as an individual sample. This is rather different from the evaluation of DSM, and the interpretation of the resulting C-index should be different as well.

So I'm wondering how do you evaluate the performances of RDSM and DSM in your paper?
Thank you very much.

DSM / SUPPORT - Paper Results

Hi!

Thanks for creating this great repo with such clean code and documentation! It has been very helpful with understanding the papers, building benchmarks, and using nice survival datasets, easily!

Got a question about DSM / SUPPORT dataset: I've been using your code to reproduce the results of https://arxiv.org/pdf/2003.01176.pdf, but the scores that I get (on C-index, Brier, etc..) are a little different than the paper. I would appreciate any help/tip (preprocessing, model usage - hyperparameters, etc..)!

Best!

Readme.md needs to be cleaned

Loss function and the pretrained model (DSM)

Thank you for this wonderful job.

I have some questions about the loss function.

According to the original paper of DSM (Deep Survival Machines: Fully Parametric Survival Regression and Representation Learning for Censored Data with Competing Risks. IEEE Journal of Biomedical & Health Informatics (2021)), the loss function consists of three parts, i.e., uncensored, censoring and prior.

I have found the uncensored loss and censoring loss in the code losses.py, but I haven't seen anything related to the prior loss. In the code

auton-survival/auton_survival/models/dsm/utilities.py

Line 132 in eb88f79

premodel = pretrain_dsm(model,

I noticed that during the training of DSM model, a pretrained model is trained first to fill the shape and scale value of DSM. (I think these two parameters correspond to

and

). And this pre training process is not mentioned in the paper.

So my questions are:

Is the prior loss implemented in the code? If so, which part of code is about the prior loss?
Is the pretrained model related to the prior loss? (Because in the generative story of paper, it says

"the set of parameters ${\tilde{\beta}k}{k=1}^{K}$ and ${\tilde{\eta}k}{k=1}^{K}$ are drawn from the prior and ."

(This one is not about the loss) Why does follow ? Could you give me some hints?

Thank you very much.

Error evaluating AUC

Dear

Thank you very much for the hard work.
I am trying to evaluate the AUC
It is working very well for the training and validation dataset , but not work for the test dataset and I am getting the following error:

censoring survival function is zero at on or more time points

I wonder how to fix this problem?

Need Better Test Coverage

Currently dsm does not have adequate test coverage. We would need more unit tests to improve coverage.

Recurrent DSM on PBC Dataset notebook is broken

I have to go back to commit c454774 to make it work

Instaling by pip

Hi,

pip install https://github.com/autonlab/auton-survival.git - doesn't work for me.

Looking for your advice.

Thanks.

any way to visualize DSM features? Shap: Deepexplainer does not work with outputs

Hi,

I'm trying to find a way to interpret the model outputs by visualizing feature importance in the output. Is there a way to do that?

shap.DeepExplainer usually works with Pytorch models but I've been having problem with making it work for a DSM model output of tuple type (scale, shape, logits).

Run demonstration data error

Hello, I want to apply DSM to the construction of clinical prediction models. When running the demonstration data, an error occurred. You see where the problem lies. Thank you for your reply.

image_path = os.path.join(Base_Dir, filename) before = int(temp[0]) after = int(temp[1]) failure = int(temp[2]) image_path.append(image_path) NameError: name temp is not defined

Early stopping?

Hi,

I was wondering if there's an early stopping parameter for dsm that's already implemented, or an ideal work around that you would use now to get it going. I've been digging through the source code and can't seem to find any. I imagine its not too complicated as you're extending pytorch, but I'm also not sure what the best way to make it extend your functions would be.

test_pbc_dataset and test_framingham_dataset unit tests are wrongly indented

test_pbc_dataset and test_framingham_dataset unit tests are wrongly indented, so they fail to execute when running pytest. Fixing the indentation allows all 3 unit tests to run.

auton-survival/tests/test_dsm.py

Lines 39 to 84 in b9946a4

    
               def test_pbc_dataset(self): 
        
                 """Test function to load and test the PBC dataset. 
        
                 """ 
        
                 x, t, e = datasets.load_dataset('PBC') 
        
                 t_median = np.median(t[e==1]) 
        
                 self.assertIsInstance(x, np.ndarray) 
        
                 self.assertIsInstance(t, np.ndarray) 
        
                 self.assertIsInstance(e, np.ndarray) 
        
                 self.assertEqual(x.shape, (1945, 25)) 
        
                 self.assertEqual(t.shape, (1945,)) 
        
                 self.assertEqual(e.shape, (1945,)) 
        
                 model = DeepSurvivalMachines() 
        
                 self.assertIsInstance(model, DeepSurvivalMachines) 
        
                 model.fit(x, t, e, iters=10) 
        
                 self.assertIsInstance(model.torch_model, 
        
                                   DeepSurvivalMachinesTorch) 
        
                 risk_score = model.predict_risk(x, t_median) 
        
                 survival_probability = model.predict_survival(x, t_median) 
        
                 np.testing.assert_equal((risk_score+survival_probability).all(), 1.0) 
        
               def test_framingham_dataset(self): 
        
                 """Test function to load and test the Framingham dataset. 
        
                 """ 
        
                 x, t, e = datasets.load_dataset('FRAMINGHAM') 
        
                 t_median = np.median(t) 
        
                 self.assertIsInstance(x, np.ndarray) 
        
                 self.assertIsInstance(t, np.ndarray) 
        
                 self.assertIsInstance(e, np.ndarray) 
        
                 self.assertEqual(x.shape, (11627, 18)) 
        
                 self.assertEqual(t.shape, (11627,)) 
        
                 self.assertEqual(e.shape, (11627,)) 
        
                 model = DeepSurvivalMachines() 
        
                 self.assertIsInstance(model, DeepSurvivalMachines) 
        
                 model.fit(x, t, e, iters=10) 
        
                 self.assertIsInstance(model.torch_model, 
        
                                   DeepSurvivalMachinesTorch) 
        
                 risk_score = model.predict_risk(x, t_median) 
        
                 survival_probability = model.predict_survival(x, t_median) 
        
                 np.testing.assert_equal((risk_score+survival_probability).all(), 1.0)

typo

on https://autonlab.github.io/auton-survival/, installation instructions are:

foo@bar:$ git clone https://github.com/autonlab/auton_survival
foo@bar:$ pip install -r requirements.txt

It should be https://github.com/autonlab/auton-survival

ValueError: optimizer got an empty parameter list

param_grid = {'k' : [3, 4, 6],
              'distribution' : ['LogNormal', 'Weibull'],
              'learning_rate' : [1e-4, 1e-3],
              'batch_size': [64, 128],
              'hidden': [50, 100],
              'layers': [3, 2, 1],
              'typ': ['LSTM', 'GRU', 'RNN'],
              'optim': ['Adam', 'SGD'],
             }
params = ParameterGrid(param_grid)

models = []
for param in params:
    model = DeepRecurrentSurvivalMachines(k = param['k'],
                                          distribution = param['distribution'],
                                          hidden = param['hidden'], 
                                          typ = param['typ'],
                                          layers = param['layers'])
    # The fit method is called to train the model
    model.fit(x_train, t_train, e_train, iters = 1, learning_rate=param['learning_rate'], 
             batch_size=param['batch_size'], optimizer=param['optim'])
    models.append([[model.compute_nll(x_valid, t_valid, e_valid), model]])

best_model = min(models)
model = best_model[0][1]

As soon as I ran above script, I got below error. what should i do to solve this problem?

`---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
8 # The fit method is called to train the model
9 model.fit(x_train, t_train, e_train, iters = 1, learning_rate=param['learning_rate'],
---> 10 batch_size=param['batch_size'], optimizer=param['optim'])
11 models.append([[model.compute_nll(x_valid, t_valid, e_valid), model]])
12

~/data/nas125/hepa/codes/auton_survival/models/dsm/init.py in fit(self, x, t, e, vsize, val_data, iters, learning_rate, batch_size, elbo, optimizer)
265 elbo=elbo,
266 bs=batch_size,
--> 267 random_seed=self.random_seed)
268
269 self.torch_model = model.eval()

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in train_dsm(model, x_train, t_train, e_train, x_valid, t_valid, e_valid, n_iter, lr, elbo, bs, random_seed)
137 n_iter=10000,
138 lr=1e-2,
--> 139 thres=1e-4)
140
141 for r in range(model.risks):

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in pretrain_dsm(model, t_train, e_train, t_valid, e_valid, n_iter, lr, thres)
61 premodel.double()
62
---> 63 optimizer = get_optimizer(premodel, lr)
64
65 oldcost = float('inf')

~/data/nas125/hepa/codes/auton_survival/models/dsm/utilities.py in get_optimizer(model, lr)
43
44 if model.optimizer == 'Adam':
---> 45 return torch.optim.Adam(model.parameters(), lr=lr)
46 elif model.optimizer == 'SGD':
47 return torch.optim.SGD(model.parameters(), lr=lr)

~/anaconda3/envs/ml/lib/python3.7/site-packages/torch/optim/adam.py in init(self, params, lr, betas, eps, weight_decay, amsgrad)
40 defaults = dict(lr=lr, betas=betas, eps=eps,
41 weight_decay=weight_decay, amsgrad=amsgrad)
---> 42 super(Adam, self).init(params, defaults)
43
44 def setstate(self, state):

~/anaconda3/envs/ml/lib/python3.7/site-packages/torch/optim/optimizer.py in init(self, params, defaults)
44 param_groups = list(params)
45 if len(param_groups) == 0:
---> 46 raise ValueError("optimizer got an empty parameter list")
47 if not isinstance(param_groups[0], dict):
48 param_groups = [{'params': param_groups}]

ValueError: optimizer got an empty parameter list`

Measure harrell C score

Hello
Is there a way to evaluate the model using harrell c concordance index rather than time dependent concordance?
What would be the code for that?

Documentation is broken

Hi,

I was trying to checkout documentation but I am getting a 404.
docs

Format of time varying dataset

Hello,

I want to use RDSM, but I would like to confirm some things regarding the formatting:

Why does the time variable decreases as you move along the observations for a particular ID? Normally, when I use a survival package time is strictly increasing. For example:

id	status2	time	drug_D-penicil	sex_female
1	1	1.095170	1.0	1.0
1	1	0.569489	1.0	1.0
2	0	14.152338	1.0	1.0
2	0	13.654036	1.0	1.0
2	0	13.152995	1.0	1.0
2	0	12.049611	1.0	1.0
2	0	9.251451	1.0	1.0
2	0	8.263060	1.0	1.0
2	0	7.266455	1.0	1.0
2	0	6.261636	1.0	1.0
2	0	5.319790	1.0	1.0
3	1	2.770781	1.0	0.0
3	1	2.288906	1.0	0.0
3	1	1.774176	1.0	0.0
3	1	0.736502	1.0	0.0
4	1	5.270507	1.0	1.0
4	1	4.755777	1.0	1.0
4	1	4.251999	1.0	1.0
4	1	3.274559	1.0	1.0
4	1	1.837148	1.0	1.0

Does each record per ID represent a change in one of the covariates or each record it is just an increment in time regardless whether a covariate changed?
The format you always have to feed the model is a list that contains a separate numpy matrix containing each record per ID?

Thank you :)

Amazing package!

mat1 and mat2 shapes cannot be multiplied (10424x58 and 57x100)

Dear

When I run the deep cox mixtures and deep cox proportional hazard on my data, I get the following error:
RuntimeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_14400\3101778186.py in
23
24 # Obtain survival probabilities for validation set and compute the Integrated Brier Score
---> 25 predictions_val = model.predict_survival(x_val, times)
26 metric_val = survival_regression_metric('ibs', y_val, predictions_val, times, y_tr)
27 models.append([metric_val, model])

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\estimators.py in predict_survival(self, features, times)
701 return _predict_dsm(self._model, features, times)
702 elif self.model == 'dcph':
--> 703 return _predict_dcph(self._model, features, times)
704 elif self.model == 'dcm':
705 return _predict_dcm(self._model, features, times)

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\estimators.py in _predict_dcph(model, features, times)
232 times = times.ravel().tolist()
233
--> 234 return model.predict_survival(x=features.values, t=times)
235
236 def _fit_cph(features, outcomes, val_data, random_seed, **hyperparams):

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph_init_.py in predict_survival(self, x, t)
231 t = [t]
232
--> 233 scores = predict_survival(self.torch_model, x, t)
234 return scores
235

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph\dcph_utilities.py in predict_survival(model, x, t)
144
145 model, breslow_spline = model
--> 146 lrisks = model(x).detach().cpu().numpy()
147
148 unique_times = breslow_spline.baseline_survival_.x

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1193 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194 return forward_call(*input, **kwargs)
1195 # Do not call functions when jit is used
1196 full_backward_hooks, non_full_backward_hooks = [], []

~\Desktop\auton_survival 2\auton-survival-master\auton_survival\models\cph\dcph_torch.py in forward(self, x)
27 def forward(self, x):
28
---> 29 return self.expert(self.embedding(x))
30
31 class DeepRecurrentCoxPHTorch(DeepCoxPHTorch):

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\container.py in forward(self, input)
202 def forward(self, input):
203 for module in self:
--> 204 input = module(input)
205 return input
206

~\AppData\Roaming\Python\Python39\site-packages\torch\nn\modules\linear.py in forward(self, input)
112
113 def forward(self, input: Tensor) -> Tensor:
--> 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (10424x58 and 57x100)

I wonder if you can kindly help me with that

Importance Sampling: `auton_survival.estimators` to separately oversample the validation dataset.

there should be no leakage between the training and validation set.

Perform some validation of input to models

Some basic validation should be performed on the input, i.e. checking for NaN or proper datatype.

For example, a nan value in the time (event duration) array generates an obscure error (I discovered it because of a bug in my data preprocessing).

To replicate:

x, t, e = datasets.load_dataset('PBC')
model = DeepSurvivalMachines()
t[-1] = np.nan
model.fit(x, t, e)

Generates the following error:

File "test.py", line 17, in <module>
model.fit(x, t, e)
File "auton_survival/models/dsm/__init__.py", line 257, in fit
model, _ = train_dsm(model,
File "auton_survival/models/dsm/utilities.py", line 132, in train_dsm
premodel = pretrain_dsm(model,
File "auton_survival/models/dsm/utilities.py", line 73, in pretrain_dsm
loss += unconditional_loss(premodel, t_train, e_train, str(r+1))
File "auton_survival/models/dsm/losses.py", line 121, in unconditional_loss
return _weibull_loss(model, t, e, risk)
File "auton_survival/models/dsm/losses.py", line 113, in _weibull_loss
ll += f[uncens].sum() + s[cens].sum()
IndexError: index 1653 is out of bounds for dimension 0 with size 1653

	def test_pbc_dataset(self):
	"""Test function to load and test the PBC dataset.
	"""

	x, t, e = datasets.load_dataset('PBC')
	t_median = np.median(t[e==1])

	self.assertIsInstance(x, np.ndarray)
	self.assertIsInstance(t, np.ndarray)
	self.assertIsInstance(e, np.ndarray)

	self.assertEqual(x.shape, (1945, 25))
	self.assertEqual(t.shape, (1945,))
	self.assertEqual(e.shape, (1945,))

	model = DeepSurvivalMachines()
	self.assertIsInstance(model, DeepSurvivalMachines)
	model.fit(x, t, e, iters=10)
	self.assertIsInstance(model.torch_model,
	DeepSurvivalMachinesTorch)
	risk_score = model.predict_risk(x, t_median)
	survival_probability = model.predict_survival(x, t_median)
	np.testing.assert_equal((risk_score+survival_probability).all(), 1.0)

	def test_framingham_dataset(self):
	"""Test function to load and test the Framingham dataset.
	"""
	x, t, e = datasets.load_dataset('FRAMINGHAM')
	t_median = np.median(t)

	self.assertIsInstance(x, np.ndarray)
	self.assertIsInstance(t, np.ndarray)
	self.assertIsInstance(e, np.ndarray)

	self.assertEqual(x.shape, (11627, 18))
	self.assertEqual(t.shape, (11627,))
	self.assertEqual(e.shape, (11627,))

	model = DeepSurvivalMachines()
	self.assertIsInstance(model, DeepSurvivalMachines)
	model.fit(x, t, e, iters=10)
	self.assertIsInstance(model.torch_model,
	DeepSurvivalMachinesTorch)
	risk_score = model.predict_risk(x, t_median)
	survival_probability = model.predict_survival(x, t_median)
	np.testing.assert_equal((risk_score+survival_probability).all(), 1.0)

autonlab / auton-survival Goto Github PK

auton-survival's People

Contributors

Stargazers

Watchers

Forkers

auton-survival's Issues

Recommend Projects

Recommend Topics

Recommend Org