christophm / rulefit Goto Github PK

Python implementation of the rulefit algorithm

License: MIT License

Python 100.00%

rulefit's Introduction

! This package is no longer actively maintained. If you are interested in maintaining this package, please feel free to reach out to me via Github issue !

RuleFit

Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF)

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

You can use rulefit for predicting a numeric response (categorial not yet implemented). The input has to be a numpy matrix with only numeric values.

Installation

The latest version can be installed from the master branch using pip:

pip install git+https://github.com/christophM/rulefit.git

Another option is to clone the repository and install using python setup.py install or python setup.py develop.

Usage

Train your model:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

If you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

Predict

rf.predict(X)

Inspect rules:

rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

Notes

In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time
This implementation is in progress. If you find a bug, don't hesitate to contact me.

Changelog

All notable changes to this project will be documented here.

[v0.3] - IN PROGRESS

set default of exclude_zero_coef to False in get_rules():
syntax fix (Issue 21)

[v0.2] - 2017-11-24

Introduces classification for RuleFit
Adds scaling of variables (Friedscale)
Allows random size trees for creating rules

[v0.1] - 2016-06-18

Start changelog and versions

rulefit's People

Contributors

Stargazers

Watchers

Forkers

volodymyrk xylary eyspahn bendundee spencerx chriswbartley trygvebw sjachim prakash5801 colinsongf hengqujushi dchristle generalsemantics zkillpack afcarl mengyuan404 danallison openself benoitparis tomomotofactory iamavb vishalbelsare julianhatwell sandy4321 decpaul shaoqibnu muzhenxv maxwelllzh erikkallenbach spencerai chsafouane yama1968 skyjiao zhangzp517 sinoyster rizdiaprilian mcasx binyi10 jsunit6 stjordanis peeking edwardcheu axelreed zhangzhan4056 rpplayground sebrauschert cafew jimmyjoy43 vascosa alzmcr ajshearer124 yanchu1207 neeraj7799 atakemura ksairahul21 grenadesd souvickg caseywhorton thutzr a-ghorbani zeta1999 akuhnregnier bart-terpstra sainiudit yiwanchen burgersmoke dirknbr halicia chenyueg montsebennett manu87ds menghan1994 boringtb albertocasasortiz d4rk-lucif3r longshen931 serviolimareina mederrata akhilkapil tkante pranshu1993 jsong006 manuelhrokr adityan1198 lkampoli mantrabio tdl77 cpow24 ruiatelsevier eric-hsieh97 mrticker jckkvs 321hg techthiyanes wynmew bodiman lastoautumn jsliu ccth888 em00369

rulefit's Issues

Implement trees with with random depth

In the original paper, the trees have random depth, but this current implementation uses only fixed tree depth.

Be careful if you are using this package!

Here is an example of why:

X = X.matrix()
rf.fit(X, y, feature_names=features)

The function 'fit' accepts a matrix which converts your integers to floats. Then if you have a column with integer value, for example, gender (0, 1) then the code will create the rules that don't make sense. For instance, it will create a rule 'when gender > 0.5'. But it should create 'when gender == 1' or 'when gender == 0'.

Allow weighting of observations

Add max.rules argument

Limits the maximum number of rules coming out of the rule generation.

Inconsistent rules (a <5 and a < 10 in the same rule)

Let's say we have five numeric input features: a,b,c,d,e,f and target variable y.

When I run a rulefit model using this data, I get the following rule in the result:

a < 5 and a < 10 and b >5 | coeff = 0.23

Is anyone else facing this issue? Any reason why we get a<5 and a<10 in the same rule? How would we interpret the results in this case?

** This is just a sample data output for explanation.

Error on passed estimator

When I want to use my model:

gb = GradientBoostingClassifier(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)
rf.fit(X_train.as_matrix(), y_train.values, feature_names=feats)

I get the error:

File "rulefit.py", line 324, in fit
n_estimators_default=int(np.ceil(self.max_rules/self.tree_size))
TypeError: unsupported operand type(s) for /: 'int' and 'GradientBoostingClassifier'

predict() does not work in rules-only mode

Reproduction:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit(model_type='r')
rf.fit(X, y, feature_names=features)

rf.predict(X)

Produces:

IndexError: index 1717 is out of bounds for axis 0 with size 1717

I investigated the issue. It is about array addressing that works in linear and linear+rules mode, but not in r rules only mode. Will send a PR shortly.

Add modle.type argument ('linear', 'rules', 'both')

http://statweb.stanford.edu/~jhf/r-rulefit/RuleFit_help.html#rulefit

rulefit.py - SyntaxError: invalid syntax, line 105

Against the latest commit: 646d8ee

Mar 26 14:45:43 ubuntu-xenial celery[27022]:   File "/app/venv/local/lib/python2.7/site-packages/rulefit/__init__.py", line 1, in <module>
Mar 26 14:45:43 ubuntu-xenial celery[27022]:     from .rulefit import RuleCondition, Rule, RuleEnsemble, RuleFit, FriedScale
Mar 26 14:45:43 ubuntu-xenial celery[27022]:   File "/app/venv/local/lib/python2.7/site-packages/rulefit/rulefit.py", line 105
Mar 26 14:45:43 ubuntu-xenial celery[27022]:     self.scale_multipliers=scale_multipliers
Mar 26 14:45:43 ubuntu-xenial celery[27022]:        ^
Mar 26 14:45:43 ubuntu-xenial celery[27022]: SyntaxError: invalid syntax

Redundant rules in linear model

The current implementation of rulefit can sometimes produce redundant features that are then fed into the lasso. This comes from the stochastic nature of random trees and lack of rule pruning.

To illustrate this let's have a look at the Boston example in the readme:

import numpy as np
import pandas as pd
np.random.seed(42)

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

The inner workings of RuleFit rely on L1-penalised linear model which acts on the rules-transformed input variable, i.e.:

transformed_x = pd.DataFrame(rf.transform(X), 
                             columns=list(rf.rule_ensemble.rules))

Now what happens (even with the Boston dataset) is that this transformed_x contains columns that are in all purposes duplicated, given the training set. This result comes from the different decision trees having slightly different splits that end up having the same effect on X. To see this we can construct a hashtable of rules based on transformed_x values:

equivalence_classes = {}


for rule, col in transformed_x.T.iterrows():
    
    # Tuple is hashable
    key_pos = tuple(col)
    
    try:
        equivalence_classes[key_pos].append(rule)
    except KeyError:
        equivalence_classes[key_pos] = [rule]
        
redundant_rules = {k: v for k, v in equivalence_classes.items() if len(v) > 1}
for v in redundant_rules.values():
    print('Redundant rules')
    for vv in v:
        print('{}: support={:.3f}, prediction={:.3f}'.format(vv, vv.support, vv.prediction_value))

Out of the groups output some are quite obviously redundant, such as:

lstat <= 4.630000114440918: support=0.107, prediction=7.044
lstat <= 4.664999961853027: support=0.107, prediction=7.654
lstat <= 4.644999980926514: support=0.107, prediction=4.838
lstat <= 4.650000095367432: support=0.073, prediction=12.706

Other are perhaps less intuitive:

nox <= 0.6569999754428864 & dis <= 1.3727499842643738 & rm <= 7.648000001907349: support=0.009, prediction=13.002
nox <= 0.6569999754428864 & dis <= 1.3980499505996704: support=0.013, prediction=14.541

crim <= 17.186299800872803 & rm > 7.797999858856201: support=0.030, prediction=4.344
rm > 6.850500106811523 & rm > 7.7834999561309814: support=0.056, prediction=13.445

I guess the difference in support comes from random subsampling during gradient boosting.

This makes the linear model badly specified, and will cause model instability the in the best case, as L1 will keep picking one of the equivalent groups at random.

To solve this one would probably need to merge these rules. I'm happy to give a PR that does this, but I think it needs to be discussed on what would be the best way to approach this.

The logical option is probably an OR operator.
This would handle the cases where the rules might not be equivalent in other datasets that are not training sets.

The question, is of course what to do with the support and prediction values during the merge.
If I understand the code correctly the prediction_value is not used anywhere.
Can support be recalculated at linear model step?

RuleFit' object has no attribute 'get_rules'

AttributeError: 'RuleFit' object has no attribute 'get_rules'

can't find get_rules in RuleFit class, but the demo recommend this method in readme. .

builtins.NameError: name 'reduce' is not defined

When running RuleFit, the error returned: builtins.NameError: name 'reduce' is not defined.

It is caused by the moving of reduce() to functools

Classification option needs Logistic Regression (not LassoCV?)

Hi Christoph, Thanks for this code! I'm curious that although it allows Classification based tree generators, when the coefficients are calculated everything goes through LassoCV - but doesn't this use straight SSE (sum of squared error) loss? For binary classification purposes (or OVA multiclass) you'd want L1 regularised Logistic Regression wouldn't you (ie with log loss)? (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)? Happy to be corrected if I've not got it right. Friedman's 2005 paper seems a bit vague on this... Cheers, Chris

Add max.rules argument

http://statweb.stanford.edu/~jhf/r-rulefit/RuleFit_help.html#rulefit

may you recommend actively maintained and good python code with the similar capabilities

Argument description in rulefit.py needs corrected

Hello!
The description of exclude_zero_coef on line 557 in rulefit.py states that True is the default, but False looks to be the default for the argument in the function:

exclude_zero_coef: If True (default), returns only the rules with an estimated
coefficient not equalt to zero.

rulefit/rulefit/rulefit.py

Line 557 in b1657af

exclude_zero_coef: If True (default), returns only the rules with an estimated

Getting error : unsupported operand type(s) for /: 'int' and 'RandomForestClassifier'

While running rf.fit(X, y, feature_names=features) in your github code I am getting below error,

: unsupported operand type(s) for /: 'int' and 'RandomForestClassifier'

Compatibility with GridSearchCV of sklearn

In order to optimize hyperparameters using sklearn's GridSearchCV, I think it's preferable to define a score function in the estimator

from sklearn.model_selection import GridSearchCV
from rulefit import RuleFit
from sklearn.datasets import load_diabetes
model = RuleFit()
X,y = load_diabetes(return_X_y=True)
param_grid = {"tree_size":[4,6]}
gcv = GridSearchCV(model, param_grid=param_grid)
gcv.fit(X,y)

TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator RuleFit(max_iter=1000) does not.

As shown below, we can avoid errors by applying scoring to GridSearchCV, so it is possible to use GridSearchCV even now.

gcv = GridSearchCV(model, param_grid=param_grid, scoring='neg_mean_squared_error')
gcv.fit(X,y)

However, fitting with gcv's best_estimator_ gives an error.

gcv.best_estimator_.fit(X,y)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9504/1281572257.py in <module>
----> 1 gcv.best_estimator_.fit(X,y)

~\Anaconda3\envs\\lib\site-packages\rulefit\rulefit.py in fit(self, X, y, feature_names)
    416                     self.tree_generator.set_params(random_state=i_size+random_state_add) # warm_state=True seems to reset random_state, such that the trees are highly correlated, unless we manually change the random_sate here.
    417                     self.tree_generator.get_params()['n_estimators']
--> 418                     self.tree_generator.fit(np.copy(X, order='C'), np.copy(y, order='C'))
    419                     curr_est_=curr_est_+1
    420                 self.tree_generator.set_params(warm_start=False)

~\Anaconda3\envs\env\lib\site-packages\sklearn\ensemble\_gb.py in fit(self, X, y, sample_weight, monitor)
    492                                  'warm_start==True'
    493                                  % (self.n_estimators,
--> 494                                     self.estimators_.shape[0]))
    495             begin_at_stage = self.estimators_.shape[0]
    496             # The requirements of _decision_function (called in two lines

ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=552 when warm_start==True

-- version
scikit-learn 0.24 and 1.0
python 3.7
RuleFit 0.3

Add relevance/importance measurements of predictors

(see p. 12, equations 28 and 29 of the paper (PDF))

Fails when model_type='l'

rf = RuleFit(tree_size=4,
             sample_fract='default',
             max_rules=2000,
             memory_par=0.01,
             tree_generator=None,
             rfmode='regress',
             lin_trim_quantile=0.025,
             lin_standardise=True, 
             exp_rand_tree_size=True,
             model_type='l',    
             random_state=1) 

rules = rf.get_rules()

Fails at:
n_features= len(self.coef_) - len(self.rule_ensemble.rules)
Error message:
AttributeError: 'RuleFit' object has no attribute 'rule_ensemble'

It seems when linear terms only, then len(self.rule_ensemble.rules) is undefined.

Your own example is not working

#if you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=50, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

#Predict
rf.predict(X)
#Inspect rules:
rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

ERROR
TypeError Traceback (most recent call last)
in
5 rf = RuleFit(gb)
6
----> 7 rf.fit(X, y, feature_names=features)
8
9 #Predict

~/anaconda3/lib/python3.7/site-packages/rulefit/rulefit.py in fit(self, X, y, feature_names)
362 ## initialise tree generator
363 if self.tree_generator is None:
--> 364 n_estimators_default=int(np.ceil(self.max_rules/self.tree_size))
365 self.sample_fract_=min(0.5,(100+6*np.sqrt(N))/N)
366 if self.rfmode=='regress':

TypeError: unsupported operand type(s) for /: 'int' and 'GradientBoostingRegressor'

When I ran the code example‘rf.fit(X, y, feature_names=features)’, I encountered an issue

InvalidParameterError: The 'max_iter' parameter of LassoCV must be an int in the range [1, inf). Got None instead

Update installation instructions

With the new github.com security guidelines the current installation method does not work.

Fix :
Update the README.md with new installation command pip install git+https://github.com/christophM/rulefit.git

import issue for rulefit

Hi Christoph,

While importing rulefit module, I am getting below error. Can you please suggest?

ImportError Traceback (most recent call last)
in ()
----> 1 import rulefit

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rulefit/init.py in ()
----> 1 from rulefit import RuleCondition, Rule, RuleEnsemble, RuleFit
2
3 all = ["rulefit"]

ImportError: cannot import name 'RuleCondition'

ValueError in example_simulated.py

Hi, running example_simulated.py gives the following error:

/Users/navid/opt/anaconda3/envs/rulefit-venv/bin/python "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/example_simulated.py"
/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/linear_model/_coordinate_descent.py:472: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.4675243067167685, tolerance: 0.4523666850279077
  tol, rng, random, positive)
/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5.518840165506575, tolerance: 0.6706225710160342
  positive)
Traceback (most recent call last):
  File "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/example_simulated.py", line 25, in <module>
    rf.fit(X.values, y)
  File "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/rulefit/rulefit.py", line 398, in fit
    self.tree_generator.fit(np.copy(X, order='C'), np.copy(y, order='C'))
  File "/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/ensemble/_gb.py", line 1523, in fit
    self.estimators_.shape[0]))
ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=560 when warm_start==True

exp_rand_tree_size=True issue: results are not reproducible when running multiple times

Hi,
when running the model with the following specs:

RuleFit(tree_generator=rf, rfmode='classify', exp_rand_tree_size=True)

where

rf

rf =RandomForestClassifier(  
                           n_estimators=500,   
                           criterion="gini",  
                           max_depth=10,  
                           min_samples_split=2,  
                           min_samples_leaf=1,  
                           min_weight_fraction_leaf=0.,  
                           max_features="auto",  
                           max_leaf_nodes=None,  
                           min_impurity_decrease=0.,  
                           min_impurity_split=None,  
                           bootstrap=True,  
                           oob_score=False,  
                           n_jobs=3,  
                           random_state=777,  
                           verbose=0,  
                           warm_start=False,  
                           class_weight=None,  
                           ccp_alpha=0.0,  
                           max_samples=20)

The model is not reproducible.
The:

np.random.seed()

in lines 378ff of the rulefit.py :

## fit tree generator
            if not self.exp_rand_tree_size: # simply fit with constant tree size
                self.tree_generator.fit(X, y)
            else: # randomise tree size as per Friedman 2005 Sec 3.3
                np.random.seed(self.random_state)

seems to randomise all parameters of the specified random forest and the results are not reproducible.

With:

RuleFit(tree_generator=rf, rfmode='classify', exp_rand_tree_size=False)

the results are always the same.

Of note: this is also the case if

np.random.seed()

is set directly before the modelling process.

I am not sure if this is intended, but application in a research setting for example, one would at least like to be able to reproduce the findings with a

np.ransom.seed()

before the model.

Add individual relevance/importance measurements of predictors

see p. 12, equations 30 and 31 of the paper (PDF)

Implement interaction effect measures

interact: overall strengths of interaction effects for selected variables

http://statweb.stanford.edu/~jhf/r-rulefit/RuleFit_help.html#rulefit

Fix Friedscale test

Fix this test as FriedScale signature has changed.

Add partial dependency plots

InvalidIndexError: (slice(None, None, None), 0)

Python 3.10
rulefit==0.3.1

Problem
Unable to follow official document

Step to reproduce:

pip install rulefit
Follow Train your model section. But omit .as_matrix because Dataframe has so such a method

Expected result:
Be able to train model

As is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:3600, in Index.get_loc(self, key, method, tolerance)
   3599 try:
-> 3600     return self._engine.get_loc(casted_key)
   3601 except KeyError as err:

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/_libs/index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()

TypeError: '(slice(None, None, None), 0)' is an invalid key

During handling of the above exception, another exception occurred:

InvalidIndexError                         Traceback (most recent call last)
Input In [71], in <module>
----> 1 rf.fit(X, y, feature_names=features)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:410, in RuleFit.fit(self, X, y, feature_names)
    406     self.rule_ensemble = RuleEnsemble(tree_list = tree_list,
    407                                       feature_names=self.feature_names)
    409     ## concatenate original features and rules
--> 410     X_rules = self.rule_ensemble.transform(X)
    412 ## standardise linear variables if requested (for regression model only)
    413 if 'l' in self.model_type: 
    414 
    415     ## standard deviation and mean of winsorized features

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:277, in RuleEnsemble.transform(self, X, coefs)
    275 rule_list=list(self.rules) 
    276 if   coefs is None :
--> 277     return np.array([rule.transform(X) for rule in rule_list]).T
    278 else: # else use the coefs to filter the rules we bother to interpret
    279     res= np.array([rule_list[i_rule].transform(X) for i_rule in np.arange(len(rule_list)) if coefs[i_rule]!=0]).T

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:277, in <listcomp>(.0)
    275 rule_list=list(self.rules) 
    276 if   coefs is None :
--> 277     return np.array([rule.transform(X) for rule in rule_list]).T
    278 else: # else use the coefs to filter the rules we bother to interpret
    279     res= np.array([rule_list[i_rule].transform(X) for i_rule in np.arange(len(rule_list)) if coefs[i_rule]!=0]).T

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:155, in Rule.transform(self, X)
    144 def transform(self, X):
    145     """Transform dataset.
    146 
    147     Parameters
   (...)
    153     X_transformed: array-like matrix, shape=(n_samples, 1)
    154     """
--> 155     rule_applies = [condition.transform(X) for condition in self.conditions]
    156     return reduce(lambda x,y: x * y, rule_applies)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:155, in <listcomp>(.0)
    144 def transform(self, X):
    145     """Transform dataset.
    146 
    147     Parameters
   (...)
    153     X_transformed: array-like matrix, shape=(n_samples, 1)
    154     """
--> 155     rule_applies = [condition.transform(X) for condition in self.conditions]
    156     return reduce(lambda x,y: x * y, rule_applies)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:66, in RuleCondition.transform(self, X)
     55 """Transform dataset.
     56 
     57 Parameters
   (...)
     63 X_transformed: array-like matrix, shape=(n_samples, 1)
     64 """
     65 if self.operator == "<=":
---> 66     res =  1 * (X[:,self.feature_index] <= self.threshold)
     67 elif self.operator == ">":
     68     res = 1 * (X[:,self.feature_index] > self.threshold)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/frame.py:3504, in DataFrame.__getitem__(self, key)
   3502 if self.columns.nlevels > 1:
   3503     return self._getitem_multilevel(key)
-> 3504 indexer = self.columns.get_loc(key)
   3505 if is_integer(indexer):
   3506     indexer = [indexer]

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:3607, in Index.get_loc(self, key, method, tolerance)
   3602         raise KeyError(key) from err
   3603     except TypeError:
   3604         # If we have a listlike key, _check_indexing_error will raise
   3605         #  InvalidIndexError. Otherwise we fall through and re-raise
   3606         #  the TypeError.
-> 3607         self._check_indexing_error(key)
   3608         raise
   3610 # GH#42269

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:5609, in Index._check_indexing_error(self, key)
   5605 def _check_indexing_error(self, key):
   5606     if not is_scalar(key):
   5607         # if key is not a scalar, directly raise an error (the code below
   5608         # would convert to numpy arrays and raise later any way) - GH29926
-> 5609         raise InvalidIndexError(key)

InvalidIndexError: (slice(None, None, None), 0)

Why is max depth fixed?

"In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time"

Is this just an artefact of the sklearn implementation of random forest or is there a different motivation behind it? Thanks.

predict_prob

Hi guys, would it be nice to have probability predictions for the rulefit classifier, which is useful to calculate things like AUC? It is just one line of code.

Input variable importance

Is the importance of input variables in section 7 of the paper implemented in this project? I looked at the code and didn't find this part.

all the input arrays must have same number of dimensions

Thanks for a great library. When I try to run it on my dataset with 25 columns, I get the following error

ValueError Traceback (most recent call last)
in ()
1 rf = RuleFit()
----> 2 rf.fit(X, y)

/Users/arshakn/anaconda/lib/python2.7/site-packages/rulefit/rulefit.pyc in fit(self, X, y, feature_names)
266 ## concatenate original features and rules
267 X_rules = self.rule_ensemble.transform(X)
--> 268 X_concat = np.concatenate((X, X_rules), axis=1)
269
270 ## initialise Lasso

ValueError: all the input arrays must have same number of dimensions

How can I calculate the list of rules that fired for a given prediction ?

Hi,

RuleFit's get_rules() returns the list of rules in the model. I would like to enhance predict (or create a new predict() variation) that determines which of the rules returned by get_rules() were used by predict(), and in what order, and return those specific rules utilized by the specific prediction as a list of indies into the list that get_rules() originally returned. Would anyone be able to provide any pointers on how to write that, or even better, have any sample code for it ?

Many Thanks
Andrew

christophm / rulefit Goto Github PK

rulefit's Introduction

RuleFit

Installation

Usage

Train your model:

Predict

Inspect rules:

Notes

Changelog

[v0.3] - IN PROGRESS

[v0.2] - 2017-11-24

[v0.1] - 2016-06-18

rulefit's People

Contributors

Stargazers

Watchers

Forkers

rulefit's Issues

Recommend Projects

Recommend Topics

Recommend Org