Giter Site home page Giter Site logo

rulefit's Introduction

! This package is no longer actively maintained. If you are interested in maintaining this package, please feel free to reach out to me via Github issue !

RuleFit

Implementation of a rule based prediction algorithm based on the rulefit algorithm from Friedman and Popescu (PDF)

The algorithm can be used for predicting an output vector y given an input matrix X. In the first step a tree ensemble is generated with gradient boosting. The trees are then used to form rules, where the paths to each node in each tree form one rule. A rule is a binary decision if an observation is in a given node, which is dependent on the input features that were used in the splits. The ensemble of rules together with the original input features are then being input in a L1-regularized linear model, also called Lasso, which estimates the effects of each rule on the output target but at the same time estimating many of those effects to zero.

You can use rulefit for predicting a numeric response (categorial not yet implemented). The input has to be a numpy matrix with only numeric values.

Installation

The latest version can be installed from the master branch using pip:

pip install git+https://github.com/christophM/rulefit.git

Another option is to clone the repository and install using python setup.py install or python setup.py develop.

Usage

Train your model:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

If you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

Predict

rf.predict(X)

Inspect rules:

rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

Notes

  • In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time
  • This implementation is in progress. If you find a bug, don't hesitate to contact me.

Changelog

All notable changes to this project will be documented here.

[v0.3] - IN PROGRESS

  • set default of exclude_zero_coef to False in get_rules():
  • syntax fix (Issue 21)

[v0.2] - 2017-11-24

  • Introduces classification for RuleFit
  • Adds scaling of variables (Friedscale)
  • Allows random size trees for creating rules

[v0.1] - 2016-06-18

  • Start changelog and versions

rulefit's People

Contributors

adityan1198 avatar alzmcr avatar ashearer-square avatar benoitparis avatar caseywhorton avatar christophm avatar chriswbartley avatar danallison avatar dirknbr avatar jckkvs avatar mcasx avatar volodymyrk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

rulefit's Issues

Be careful if you are using this package!

Here is an example of why:

X = X.matrix()
rf.fit(X, y, feature_names=features)

The function 'fit' accepts a matrix which converts your integers to floats. Then if you have a column with integer value, for example, gender (0, 1) then the code will create the rules that don't make sense. For instance, it will create a rule 'when gender > 0.5'. But it should create 'when gender == 1' or 'when gender == 0'.

Inconsistent rules (a <5 and a < 10 in the same rule)

Let's say we have five numeric input features: a,b,c,d,e,f and target variable y.

When I run a rulefit model using this data, I get the following rule in the result:

  • a < 5 and a < 10 and b >5 | coeff = 0.23

Is anyone else facing this issue? Any reason why we get a<5 and a<10 in the same rule? How would we interpret the results in this case?

** This is just a sample data output for explanation.

Error on passed estimator

When I want to use my model:

gb = GradientBoostingClassifier(n_estimators=500, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)
rf.fit(X_train.as_matrix(), y_train.values, feature_names=feats)

I get the error:

File "rulefit.py", line 324, in fit
n_estimators_default=int(np.ceil(self.max_rules/self.tree_size))
TypeError: unsupported operand type(s) for /: 'int' and 'GradientBoostingClassifier'

predict() does not work in rules-only mode

Reproduction:

import numpy as np
import pandas as pd

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit(model_type='r')
rf.fit(X, y, feature_names=features)

rf.predict(X)

Produces:

IndexError: index 1717 is out of bounds for axis 0 with size 1717

I investigated the issue. It is about array addressing that works in linear and linear+rules mode, but not in r rules only mode. Will send a PR shortly.

rulefit.py - SyntaxError: invalid syntax, line 105

Against the latest commit: 646d8ee

Mar 26 14:45:43 ubuntu-xenial celery[27022]:   File "/app/venv/local/lib/python2.7/site-packages/rulefit/__init__.py", line 1, in <module>
Mar 26 14:45:43 ubuntu-xenial celery[27022]:     from .rulefit import RuleCondition, Rule, RuleEnsemble, RuleFit, FriedScale
Mar 26 14:45:43 ubuntu-xenial celery[27022]:   File "/app/venv/local/lib/python2.7/site-packages/rulefit/rulefit.py", line 105
Mar 26 14:45:43 ubuntu-xenial celery[27022]:     self.scale_multipliers=scale_multipliers
Mar 26 14:45:43 ubuntu-xenial celery[27022]:        ^
Mar 26 14:45:43 ubuntu-xenial celery[27022]: SyntaxError: invalid syntax

Redundant rules in linear model

The current implementation of rulefit can sometimes produce redundant features that are then fed into the lasso. This comes from the stochastic nature of random trees and lack of rule pruning.

To illustrate this let's have a look at the Boston example in the readme:

import numpy as np
import pandas as pd
np.random.seed(42)

from rulefit import RuleFit

boston_data = pd.read_csv("boston.csv", index_col=0)

y = boston_data.medv.values
X = boston_data.drop("medv", axis=1)
features = X.columns
X = X.as_matrix()

rf = RuleFit()
rf.fit(X, y, feature_names=features)

The inner workings of RuleFit rely on L1-penalised linear model which acts on the rules-transformed input variable, i.e.:

transformed_x = pd.DataFrame(rf.transform(X), 
                             columns=list(rf.rule_ensemble.rules))

Now what happens (even with the Boston dataset) is that this transformed_x contains columns that are in all purposes duplicated, given the training set. This result comes from the different decision trees having slightly different splits that end up having the same effect on X. To see this we can construct a hashtable of rules based on transformed_x values:

equivalence_classes = {}


for rule, col in transformed_x.T.iterrows():
    
    # Tuple is hashable
    key_pos = tuple(col)
    
    try:
        equivalence_classes[key_pos].append(rule)
    except KeyError:
        equivalence_classes[key_pos] = [rule]
        
redundant_rules = {k: v for k, v in equivalence_classes.items() if len(v) > 1}
for v in redundant_rules.values():
    print('Redundant rules')
    for vv in v:
        print('{}: support={:.3f}, prediction={:.3f}'.format(vv, vv.support, vv.prediction_value))

Out of the groups output some are quite obviously redundant, such as:

lstat <= 4.630000114440918: support=0.107, prediction=7.044
lstat <= 4.664999961853027: support=0.107, prediction=7.654
lstat <= 4.644999980926514: support=0.107, prediction=4.838
lstat <= 4.650000095367432: support=0.073, prediction=12.706

Other are perhaps less intuitive:

nox <= 0.6569999754428864 & dis <= 1.3727499842643738 & rm <= 7.648000001907349: support=0.009, prediction=13.002
nox <= 0.6569999754428864 & dis <= 1.3980499505996704: support=0.013, prediction=14.541
crim <= 17.186299800872803 & rm > 7.797999858856201: support=0.030, prediction=4.344
rm > 6.850500106811523 & rm > 7.7834999561309814: support=0.056, prediction=13.445

I guess the difference in support comes from random subsampling during gradient boosting.

This makes the linear model badly specified, and will cause model instability the in the best case, as L1 will keep picking one of the equivalent groups at random.

To solve this one would probably need to merge these rules. I'm happy to give a PR that does this, but I think it needs to be discussed on what would be the best way to approach this.

The logical option is probably an OR operator.
This would handle the cases where the rules might not be equivalent in other datasets that are not training sets.

The question, is of course what to do with the support and prediction values during the merge.
If I understand the code correctly the prediction_value is not used anywhere.
Can support be recalculated at linear model step?

Classification option needs Logistic Regression (not LassoCV?)

Hi Christoph, Thanks for this code! I'm curious that although it allows Classification based tree generators, when the coefficients are calculated everything goes through LassoCV - but doesn't this use straight SSE (sum of squared error) loss? For binary classification purposes (or OVA multiclass) you'd want L1 regularised Logistic Regression wouldn't you (ie with log loss)? (http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)? Happy to be corrected if I've not got it right. Friedman's 2005 paper seems a bit vague on this... Cheers, Chris

Argument description in rulefit.py needs corrected

Hello!
The description of exclude_zero_coef on line 557 in rulefit.py states that True is the default, but False looks to be the default for the argument in the function:

exclude_zero_coef: If True (default), returns only the rules with an estimated
coefficient not equalt to zero.

exclude_zero_coef: If True (default), returns only the rules with an estimated

Compatibility with GridSearchCV of sklearn

In order to optimize hyperparameters using sklearn's GridSearchCV, I think it's preferable to define a score function in the estimator

from sklearn.model_selection import GridSearchCV
from rulefit import RuleFit
from sklearn.datasets import load_diabetes
model = RuleFit()
X,y = load_diabetes(return_X_y=True)
param_grid = {"tree_size":[4,6]}
gcv = GridSearchCV(model, param_grid=param_grid)
gcv.fit(X,y)
TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator RuleFit(max_iter=1000) does not.

As shown below, we can avoid errors by applying scoring to GridSearchCV, so it is possible to use GridSearchCV even now.

gcv = GridSearchCV(model, param_grid=param_grid, scoring='neg_mean_squared_error')
gcv.fit(X,y)

However, fitting with gcv's best_estimator_ gives an error.

gcv.best_estimator_.fit(X,y)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9504/1281572257.py in <module>
----> 1 gcv.best_estimator_.fit(X,y)

~\Anaconda3\envs\\lib\site-packages\rulefit\rulefit.py in fit(self, X, y, feature_names)
    416                     self.tree_generator.set_params(random_state=i_size+random_state_add) # warm_state=True seems to reset random_state, such that the trees are highly correlated, unless we manually change the random_sate here.
    417                     self.tree_generator.get_params()['n_estimators']
--> 418                     self.tree_generator.fit(np.copy(X, order='C'), np.copy(y, order='C'))
    419                     curr_est_=curr_est_+1
    420                 self.tree_generator.set_params(warm_start=False)

~\Anaconda3\envs\env\lib\site-packages\sklearn\ensemble\_gb.py in fit(self, X, y, sample_weight, monitor)
    492                                  'warm_start==True'
    493                                  % (self.n_estimators,
--> 494                                     self.estimators_.shape[0]))
    495             begin_at_stage = self.estimators_.shape[0]
    496             # The requirements of _decision_function (called in two lines

ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=552 when warm_start==True

-- version
scikit-learn 0.24 and 1.0
python 3.7
RuleFit 0.3

Fails when model_type='l'

rf = RuleFit(tree_size=4,
             sample_fract='default',
             max_rules=2000,
             memory_par=0.01,
             tree_generator=None,
             rfmode='regress',
             lin_trim_quantile=0.025,
             lin_standardise=True, 
             exp_rand_tree_size=True,
             model_type='l',    
             random_state=1) 

rules = rf.get_rules()

Fails at:
n_features= len(self.coef_) - len(self.rule_ensemble.rules)
Error message:
AttributeError: 'RuleFit' object has no attribute 'rule_ensemble'

It seems when linear terms only, then len(self.rule_ensemble.rules) is undefined.

Your own example is not working

#if you want to have influence on the tree generator you can pass the generator as argument:

from sklearn.ensemble import GradientBoostingRegressor
gb = GradientBoostingRegressor(n_estimators=50, max_depth=10, learning_rate=0.01)
rf = RuleFit(gb)

rf.fit(X, y, feature_names=features)

#Predict
rf.predict(X)
#Inspect rules:
rules = rf.get_rules()

rules = rules[rules.coef != 0].sort_values("support", ascending=False)

print(rules)

ERROR
TypeError Traceback (most recent call last)
in
5 rf = RuleFit(gb)
6
----> 7 rf.fit(X, y, feature_names=features)
8
9 #Predict

~/anaconda3/lib/python3.7/site-packages/rulefit/rulefit.py in fit(self, X, y, feature_names)
362 ## initialise tree generator
363 if self.tree_generator is None:
--> 364 n_estimators_default=int(np.ceil(self.max_rules/self.tree_size))
365 self.sample_fract_=min(0.5,(100+6*np.sqrt(N))/N)
366 if self.rfmode=='regress':

TypeError: unsupported operand type(s) for /: 'int' and 'GradientBoostingRegressor'

import issue for rulefit

Hi Christoph,

While importing rulefit module, I am getting below error. Can you please suggest?


ImportError Traceback (most recent call last)
in ()
----> 1 import rulefit

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/rulefit/init.py in ()
----> 1 from rulefit import RuleCondition, Rule, RuleEnsemble, RuleFit
2
3 all = ["rulefit"]

ImportError: cannot import name 'RuleCondition'

ValueError in example_simulated.py

Hi, running example_simulated.py gives the following error:

/Users/navid/opt/anaconda3/envs/rulefit-venv/bin/python "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/example_simulated.py"
/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/linear_model/_coordinate_descent.py:472: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 0.4675243067167685, tolerance: 0.4523666850279077
  tol, rng, random, positive)
/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/linear_model/_coordinate_descent.py:476: ConvergenceWarning: Objective did not converge. You might want to increase the number of iterations. Duality gap: 5.518840165506575, tolerance: 0.6706225710160342
  positive)
Traceback (most recent call last):
  File "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/example_simulated.py", line 25, in <module>
    rf.fit(X.values, y)
  File "/Users/navid/Google Drive/PhD/Repositories/XAI-2016/rulefit/rulefit/rulefit.py", line 398, in fit
    self.tree_generator.fit(np.copy(X, order='C'), np.copy(y, order='C'))
  File "/Users/navid/opt/anaconda3/envs/rulefit-venv/lib/python3.7/site-packages/sklearn/ensemble/_gb.py", line 1523, in fit
    self.estimators_.shape[0]))
ValueError: n_estimators=1 must be larger or equal to estimators_.shape[0]=560 when warm_start==True

exp_rand_tree_size=True issue: results are not reproducible when running multiple times

Hi,
when running the model with the following specs:

RuleFit(tree_generator=rf, rfmode='classify', exp_rand_tree_size=True)

where

rf

is

rf =RandomForestClassifier(  
                           n_estimators=500,   
                           criterion="gini",  
                           max_depth=10,  
                           min_samples_split=2,  
                           min_samples_leaf=1,  
                           min_weight_fraction_leaf=0.,  
                           max_features="auto",  
                           max_leaf_nodes=None,  
                           min_impurity_decrease=0.,  
                           min_impurity_split=None,  
                           bootstrap=True,  
                           oob_score=False,  
                           n_jobs=3,  
                           random_state=777,  
                           verbose=0,  
                           warm_start=False,  
                           class_weight=None,  
                           ccp_alpha=0.0,  
                           max_samples=20)

The model is not reproducible.
The:

np.random.seed()

in lines 378ff of the rulefit.py :

## fit tree generator
            if not self.exp_rand_tree_size: # simply fit with constant tree size
                self.tree_generator.fit(X, y)
            else: # randomise tree size as per Friedman 2005 Sec 3.3
                np.random.seed(self.random_state)

seems to randomise all parameters of the specified random forest and the results are not reproducible.

With:

RuleFit(tree_generator=rf, rfmode='classify', exp_rand_tree_size=False)

the results are always the same.

Of note: this is also the case if

np.random.seed()

is set directly before the modelling process.

I am not sure if this is intended, but application in a research setting for example, one would at least like to be able to reproduce the findings with a

np.ransom.seed()

before the model.

InvalidIndexError: (slice(None, None, None), 0)

Python 3.10
rulefit==0.3.1

Problem
Unable to follow official document

Step to reproduce:

  1. pip install rulefit
  2. Follow Train your model section. But omit .as_matrix because Dataframe has so such a method

Expected result:
Be able to train model

As is:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:3600, in Index.get_loc(self, key, method, tolerance)
   3599 try:
-> 3600     return self._engine.get_loc(casted_key)
   3601 except KeyError as err:

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/_libs/index.pyx:136, in pandas._libs.index.IndexEngine.get_loc()

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/_libs/index.pyx:142, in pandas._libs.index.IndexEngine.get_loc()

TypeError: '(slice(None, None, None), 0)' is an invalid key

During handling of the above exception, another exception occurred:

InvalidIndexError                         Traceback (most recent call last)
Input In [71], in <module>
----> 1 rf.fit(X, y, feature_names=features)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:410, in RuleFit.fit(self, X, y, feature_names)
    406     self.rule_ensemble = RuleEnsemble(tree_list = tree_list,
    407                                       feature_names=self.feature_names)
    409     ## concatenate original features and rules
--> 410     X_rules = self.rule_ensemble.transform(X)
    412 ## standardise linear variables if requested (for regression model only)
    413 if 'l' in self.model_type: 
    414 
    415     ## standard deviation and mean of winsorized features

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:277, in RuleEnsemble.transform(self, X, coefs)
    275 rule_list=list(self.rules) 
    276 if   coefs is None :
--> 277     return np.array([rule.transform(X) for rule in rule_list]).T
    278 else: # else use the coefs to filter the rules we bother to interpret
    279     res= np.array([rule_list[i_rule].transform(X) for i_rule in np.arange(len(rule_list)) if coefs[i_rule]!=0]).T

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:277, in <listcomp>(.0)
    275 rule_list=list(self.rules) 
    276 if   coefs is None :
--> 277     return np.array([rule.transform(X) for rule in rule_list]).T
    278 else: # else use the coefs to filter the rules we bother to interpret
    279     res= np.array([rule_list[i_rule].transform(X) for i_rule in np.arange(len(rule_list)) if coefs[i_rule]!=0]).T

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:155, in Rule.transform(self, X)
    144 def transform(self, X):
    145     """Transform dataset.
    146 
    147     Parameters
   (...)
    153     X_transformed: array-like matrix, shape=(n_samples, 1)
    154     """
--> 155     rule_applies = [condition.transform(X) for condition in self.conditions]
    156     return reduce(lambda x,y: x * y, rule_applies)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:155, in <listcomp>(.0)
    144 def transform(self, X):
    145     """Transform dataset.
    146 
    147     Parameters
   (...)
    153     X_transformed: array-like matrix, shape=(n_samples, 1)
    154     """
--> 155     rule_applies = [condition.transform(X) for condition in self.conditions]
    156     return reduce(lambda x,y: x * y, rule_applies)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/rulefit/rulefit.py:66, in RuleCondition.transform(self, X)
     55 """Transform dataset.
     56 
     57 Parameters
   (...)
     63 X_transformed: array-like matrix, shape=(n_samples, 1)
     64 """
     65 if self.operator == "<=":
---> 66     res =  1 * (X[:,self.feature_index] <= self.threshold)
     67 elif self.operator == ">":
     68     res = 1 * (X[:,self.feature_index] > self.threshold)

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/frame.py:3504, in DataFrame.__getitem__(self, key)
   3502 if self.columns.nlevels > 1:
   3503     return self._getitem_multilevel(key)
-> 3504 indexer = self.columns.get_loc(key)
   3505 if is_integer(indexer):
   3506     indexer = [indexer]

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:3607, in Index.get_loc(self, key, method, tolerance)
   3602         raise KeyError(key) from err
   3603     except TypeError:
   3604         # If we have a listlike key, _check_indexing_error will raise
   3605         #  InvalidIndexError. Otherwise we fall through and re-raise
   3606         #  the TypeError.
-> 3607         self._check_indexing_error(key)
   3608         raise
   3610 # GH#42269

File /t/pyenv/versions/py-default/lib/python3.10/site-packages/pandas/core/indexes/base.py:5609, in Index._check_indexing_error(self, key)
   5605 def _check_indexing_error(self, key):
   5606     if not is_scalar(key):
   5607         # if key is not a scalar, directly raise an error (the code below
   5608         # would convert to numpy arrays and raise later any way) - GH29926
-> 5609         raise InvalidIndexError(key)

InvalidIndexError: (slice(None, None, None), 0)

Why is max depth fixed?

"In contrast to the original paper, the generated trees are always fitted with the same maximum depth. In the original implementation the maximum depth of the tree are drawn from a distribution each time"

Is this just an artefact of the sklearn implementation of random forest or is there a different motivation behind it? Thanks.

predict_prob

Hi guys, would it be nice to have probability predictions for the rulefit classifier, which is useful to calculate things like AUC? It is just one line of code.

Input variable importance

image
Is the importance of input variables in section 7 of the paper implemented in this project? I looked at the code and didn't find this part.

all the input arrays must have same number of dimensions

Thanks for a great library. When I try to run it on my dataset with 25 columns, I get the following error


ValueError Traceback (most recent call last)
in ()
1 rf = RuleFit()
----> 2 rf.fit(X, y)

/Users/arshakn/anaconda/lib/python2.7/site-packages/rulefit/rulefit.pyc in fit(self, X, y, feature_names)
266 ## concatenate original features and rules
267 X_rules = self.rule_ensemble.transform(X)
--> 268 X_concat = np.concatenate((X, X_rules), axis=1)
269
270 ## initialise Lasso

ValueError: all the input arrays must have same number of dimensions

How can I calculate the list of rules that fired for a given prediction ?

Hi,

RuleFit's get_rules() returns the list of rules in the model. I would like to enhance predict (or create a new predict() variation) that determines which of the rules returned by get_rules() were used by predict(), and in what order, and return those specific rules utilized by the specific prediction as a list of indies into the list that get_rules() originally returned. Would anyone be able to provide any pointers on how to write that, or even better, have any sample code for it ?

Many Thanks
Andrew

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.