saucecat / pdpbox Goto Github PK
View Code? Open in Web Editor NEWpython partial dependence plot toolbox
Home Page: http://pdpbox.readthedocs.io/en/latest/
License: MIT License
python partial dependence plot toolbox
Home Page: http://pdpbox.readthedocs.io/en/latest/
License: MIT License
I have a random forrest model that gives output in the format array([[0, 1]], dtype=uint8) or array([[1, 0]], dtype=uint8). I cant figure out how to use pdp_isolate... my code is here and erorr message:
CODE:
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots
pdp_inc = pdp.pdp_isolate(model=model, dataset=pd.DataFrame(features.iloc[0]).T, model_features=features.columns.tolist(), feature='r1')
pdp.pdp_plot(pdp_inc, 'SCI_Mainframe')
plt.show()
ERROR:
TypeError: list indices must be integers or slices, not tuple
I dont know what to do. I have spend a couple hours looking for soluytion whitout luck
The pdp_interact_plot is meant to print 3 graphs isn't it? Feature 1 vs y, Feature 2 vs y and finally a contour graph. Mine only prints the contour graph.
If a feature has percentiles that vary by less than 0.01 the generated grid has duplicate values which (for some reason) leads to unequal dimensions of the .feature_grids
and .pdp
attributes of the pdpbox.pdp.pdp_isolate_obj
.
When using this development version, the problem with the dimensions is gone, but the grid will have only one value in this example. I could fix the problem for my purposes by just removing the rounding statements in the _get_grids()
function (see my fork), but assume there's a reason for the rounding so a real fix is probably more involved (?).
Here's a reproducible example (using the version from pypi):
import pandas as pd
import numpy as np
from pdpbox import pdp # Version from pypi
from sklearn.linear_model import SGDClassifier
np.random.seed(123)
df = pd.DataFrame({'y': np.random.randint(0,2,100),
'x1': np.random.uniform(0.5, 0.5001, 100),
'x2': np.random.uniform(0.5, 0.5001, 100)})
clf = SGDClassifier()
X = df[['x1', 'x2']]
y = df['y']
clf.fit(X, y)
P = pdp.pdp_isolate(model=clf, train_X=X, feature='x1')
print(P.feature_grids)
print(P.pdp)
Hi,
I am unable to successfully plot the pdp. Find below the code being used -
feat_name = 'avg_albumin'
x_test_pdp = x_test[x_test[feat_name].notnull()] #Did this to remove nulls since pdp cant take null values
pdp_data = pdp.pdp_isolate(model=best_gbm, dataset=x_test_pdp, model_features=cols_best_rfe, feature=feat_name)
# plot it
pdp.pdp_plot(pdp_data, feat_name)
plt.show()
The error statement is- 'PDPIsolate' object has no attribute 'pdp_isolate'.
It would be great if you could help me out with this. Thanks! @doctorperceptron @SauceCat
In the example provided, 'embarked' has three labels 'C', 'S', and 'Q', and all 3 labels presented as features of the model.
However, practically when train the model, encoding all labels to the model will cause loss rank. As the name "one-hot" indicates, the base label should not be used as feature to keep full rank. In this case, "Embarked_C" would not be a feature to train the model.
So the PDP should display correct dependency values for feature=['Embarked_S', 'Embarked_Q']
Hi,
first a real big 'Thank you' for this wonderful package! I also read the book from Chris and searched for python implementations of the ICE plots and it really solves some open things.
I assume, you used matplotlib as a figure-backend. And I found the list of parameters you can feed with the dict. But it would be glad to have the possibility to
Thanks
Jan
Hi - firstly I'd like to thank you for producing this package, it's really great! I was just reading the ICEBox paper recently and was considering building something, but was delighted to see somebody else already had :)
I'm having issues with calling pdp_isolate on a regression model - it throws the following exception:
usr/local/lib/python3.5/dist-packages/PDPbox-0.1-py3.5.egg/pdpbox/pdp.py in pdp_isolate(model, train_X, feature, num_grid_points, percentile_range)
113 # store the ICE lines
114 # for multi-classifier, a dictionary is created
--> 115 if n_classes > 2:
116 ice_lines = {}
117 for n_class in range(n_classes):
TypeError: unorderable types: NoneType() > int()
Even the 'Regression.ipynb' example in PDPbox/test/Regression/
does this. A cursory glance at the codebase seems to suggest that when we have a sklearn model without a classes property, n_classes gets set to None on pdp.py line 64. Then all subsequent comparisons of n_classes to an integer will throw this error. Any suggestions?
Hi,
when I tried to increase num_grid_points into 100, the following value error happened:
"x and y must have same first dimension, but have shapes (81,) and (91,)"
I am confused cuz shouldn't they always be same?
Code:
pdp_age = pdp.pdp_isolate(clf, X, target_feature, num_grid_points = 100, percentile_range = (1,100)) pdp.pdp_plot(pdp_age, target_feature, center = False, figsize=(9,5), plot_lines = False)
It seems that after "pdp_isolate", pdp.size (which determines y) and feature_grids.size become different.
Can you please explain more about how to make sure that x and y have same shape?
Thanks!
I am having trouble figuring out how pdpbox assigns class labels to multiclass plots for single feats
here is an code snippet im using for a target that is either yes, no , or unknown
feature = <feat>
pdp_isolated = pdp_isolate(
model=xgbcl,
dataset=X_train_processed,
model_features=X_train_processed.columns,
feature=feature,
num_grid_points=100
)
pdp_plot(
pdp_isolated,
feature_name=feature,
center=True,
x_quantile=True,
ncols=2,
plot_lines=True,
frac_to_plot=100,
which_classes=[0, 1],
plot_pts_dist=True
)
Hi @SauceCat,
In my use case of PDPbox's pdp_interact_plot(), I create a bunch of interact plots choosing two features at a time and visualize which of the interaction produces most variation in 'marginal effects' (encoded in color scale). As a result all of the plots have colors spread along entire range. Tiny and large variations in marginal effects can be distinguished only by reading off the range of color scale on the plot.
I think it would be an immense enhancement if we could pass along a custom normalize object that can be shared among multiple plots, thus encoding same value by same color for each plot. A fixed normalize routine shared among multiple plots brings consistency and highlights degree of variation in marginal effect when sharing it with stakeholders.
Two plots below uses full color scale; however, range of marginal effect is quite different.
Plot 1: Small Range of Marginal Effect
When trying to run an example from the docs, I get the following error:
https://pdpbox.readthedocs.io/en/latest/pdp_plot.html
/home/janvanrijn/anaconda3/envs/openml-defaults/bin/python /home/janvanrijn/projects/openml-defaults/test2.py
Traceback (most recent call last):
File "/home/janvanrijn/projects/openml-defaults/test2.py", line 14, in <module>
feature='Sex')
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/pdpbox/pdp.py", line 159, in pdp_isolate
for feature_grid in feature_grids)
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/parallel.py", line 983, in __call__
if self.dispatch_one_batch(iterator):
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/parallel.py", line 825, in dispatch_one_batch
self._dispatch(tasks)
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/parallel.py", line 782, in _dispatch
job = self._backend.apply_async(batch, callback=cb)
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/_parallel_backends.py", line 545, in __init__
self.results = batch()
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/parallel.py", line 261, in __call__
for func, args, kwargs in self.items]
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/joblib/parallel.py", line 261, in <listcomp>
for func, args, kwargs in self.items]
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/pdpbox/pdp_calc_utils.py", line 44, in _calc_ice_lines
preds = predict(_data[model_features], **predict_kwds)
File "/home/janvanrijn/anaconda3/envs/openml-defaults/lib/python3.6/site-packages/xgboost/sklearn.py", line 797, in predict_proba
test_dmatrix = DMatrix(data, missing=self.missing, nthread=self.n_jobs)
AttributeError: 'XGBClassifier' object has no attribute 'n_jobs'
my pip freeze:
AnyQt==0.0.8
asn1crypto==0.24.0
Babel==2.6.0
Bottleneck==1.2.1
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
click==6.7
cloudpickle==0.5.3
commonmark==0.8.0
ConfigSpace==0.4.7
cryptography==2.3.1
cycler==0.10.0
Cython==0.28.2
dask==0.18.1
debtcollector==1.19.0
decorator==4.3.0
distributed==1.22.0
docutils==0.14
entrypoints==0.2.3
fasteners==0.14.1
feather-format==0.4.0
future==0.16.0
HeapDict==1.0.0
holoviews==1.10.7
idna==2.7
iso8601==0.1.12
jeepney==0.3.1
joblib==0.12.3
keyring==13.2.1
keyrings.alt==3.1
kiwisolver==1.0.1
liac-arff==2.2.2
matplotlib==2.2.2
mkl-fft==1.0.0
mkl-random==1.0.1
monotonic==1.5
msgpack==0.5.6
netaddr==0.7.19
netifaces==0.10.7
networkx==2.1
numpy==1.14.3
Orange3==3.15.0
oslo.concurrency==3.27.0
oslo.config==6.2.1
oslo.i18n==3.20.0
oslo.utils==3.36.2
pandas==0.24.0.dev0+997.ga197837
param==1.7.0
pbr==4.0.4
PDPbox==0.2.0
psutil==5.4.6
PuLP==1.6.8
pyarrow==0.9.0
pycparser==2.18
pyparsing==2.2.0
pyqtgraph==0.10.0
python-dateutil==2.7.2
python-louvain==0.11
pytz==2018.4
pyviz-comms==0.6.0
PyYAML==3.12
requests==2.19.1
rfc3986==1.1.0
scikit-learn==0.20.0
scikit-optimize==0.5.2
scipy==0.19.1
seaborn==0.9.0
SecretStorage==3.1.0
serverfiles==0.2.1
six==1.11.0
sortedcontainers==2.0.4
stevedore==1.28.0
tblib==1.3.2
toolz==0.9.0
tornado==5.0.2
typing==3.6.4
urllib3==1.23
wrapt==1.10.11
xgboost==0.81
xlrd==1.1.0
xmltodict==0.11.0
zict==0.1.3
Code:
from pdpbox import pdp, get_dataset
test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_target = test_titanic['target']
titanic_features = test_titanic['features']
titanic_model = test_titanic['xgb_model']
pdp_sex = pdp.pdp_isolate(model=titanic_model,
dataset=titanic_data,
model_features=titanic_features,
feature='Sex')
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_sex, feature_name='sex')
Which XGboost version do I need?
cc @prerna135
Is there a way to plot multiple features using pdp.pdp_plot function? Currently, as I understand, the function can generate a plot for individual features and returns matplotlib figure and axis. It is hard to manage individual axis and assign them a new figure and compile all the axes into a figure.
On line 251 in pdp_plot_utils.py
, one of the parameters for _pdp_contour_plot()
is contour_label_fontsize
and this causes the following error:
TypeError: clabel() got an unexpected keyword argument 'contour_label_fontsize'
According to the documentation for matplotlib.pyplot.clabel()
, the parameter should be called fontsize
.
Source: clabel()
documentation
Only PDPbox 0.1 is available via pip.Please update it
Still the same data ('Rossman Store Sales') and exact same code as tutorial:
ross_data = test_ross['data']
ross_features = test_ross['features']
ross_model = test_ross['rf_model']
ross_target = test_ross['target']
%%time
inter_rf = pdp.pdp_interact(
model=ross_model, dataset=ross_data, model_features=ross_features,
features=['weekofyear', ['StoreType_a', 'StoreType_b', 'StoreType_c', 'StoreType_d']]
)
'dataset' is not recognized as a keyword, I use python 2.7 in Ubuntu 16.04.4 LTS.
Thank you!
When looking at the examples, we see various options where PDP Box works on self included datasets. They all have a model stored in there. Is there a particular reason for this?
Furthermore, it is not clear how to run PDP Box on any other data frame. Should we provide such model ourselves?
cc @prerna135
Hi,
When i tried importing get_dataset, I got the import error. Please advise.
ImportError Traceback (most recent call last)
in ()
----> 1 from pdpbox import get_dataset
ImportError: cannot import name 'get_dataset'
Thanks
Hi,
I ran multiclass XGB with one target column, and execute the following pdp_isolate code. I got an error: "ValueError: one-hot encoding feature should contain more than 1 element". How I can fix it? Thank you in advance.
pdp_xgb = pdp.pdp_isolate(
model=xgb1, dataset=data, model_features=features, feature=['dis_suwK']
)
Some non-native implementations of the sklearn interface (most notably XGBoost) allow for keywords to be passed to the model's predict
or predict_proba
methods.
Concrete example: if you train an XGBoost model with early stopping, you need to specify ntree_limit = clf.best_ntree_limit
in order to get the score that actually corresponds to the best model, rather than just the last iteration.
It would be nice to allow such predict keywords to be passed in when calling pdp methods.
I have implemented this feature in a local branch, and would be happy to submit a PR.
Thanks for creating such a tool for Python partial dependence plot. I do find an issue, though. Right now in my project, the trained model is wrapped as a pipeline. Incoming data would have a handful number of features, and the categorical ones will be transformed into one-hot by preprocessors within the pipeline object. PDPbox works fine when I'm calling the pipeline and a numerical feature that is available in the test dataframe. However, things get interesting when I'm trying to plot a one-hot encoded categorical feature...
ValueError: only accept pandas DataFrame
)Is there a way to better support sklearn Pipeline object? Ideally, users should be able to pass a pipeline and one-hot encoded feature names as arguments.
If you try to pickle a pdp_isolate_obj
you get a PicklingError:
PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
The reason is that currently the model's predict
(or predict_proba
) method is added as a class member to the object, and pickling instance methods is verboten.
As far as I can tell, there's no reason to add the predict
method to class. The pdp_isolate_obj.predict
member isn't used anywhere in the code, and it could quite easily be reconstructed from the model, if it were needed. I'd propose to simply remove this member. Happy to submit a PR, if desired.
I've got pdp_isolate to work on other features, but it throws an exception "ValueError: No objects to concatenate" when plotted for column "f66". Does this mean there isn't enough information to make a plot of this feature?
Code that draws the error is:
pdp_feature = pdp.pdp_isolate(
model=model,
dataset=generic_features,
model_features=generic_features.columns.tolist(),
feature="f66",
num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, memory_limit=0.5, n_jobs=1, predict_kwds={}, data_transformer=None
)
Error printout is:
ValueError Traceback (most recent call last)
<ipython-input-48-65f3cae83da8> in <module>
7 model_features=self._xgb_col_names,
8 feature=''.join(['f', str(self._featureName_to_featureIdx_map[feature])]),
----> 9 num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, memory_limit=0.5, n_jobs=1, predict_kwds={}, data_transformer=None
10 )
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pdpbox\pdp.py in pdp_isolate(model, dataset, model_features, feature, num_grid_points, grid_type, percentile_range, grid_range, cust_grid_points, memory_limit, n_jobs, predict_kwds, data_transformer)
165 ice_lines.append(ice_line_n_class)
166 else:
--> 167 ice_lines = pd.concat(grid_results, axis=1)
168
169 # calculate the counts
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, join_axes, ignore_index, keys, levels, names, verify_integrity, sort, copy)
226 keys=keys, levels=levels, names=names,
227 verify_integrity=verify_integrity,
--> 228 copy=copy, sort=sort)
229 return op.get_result()
230
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in __init__(self, objs, axis, join, join_axes, keys, levels, names, ignore_index, verify_integrity, copy, sort)
260
261 if len(objs) == 0:
--> 262 raise ValueError('No objects to concatenate')
263
264 if keys is None:
ValueError: No objects to concatenate
The array of data is:
array([ 1.42646187e-04, 2.20339505e-03, -3.71780779e-02, -3.07990126e-02,
-1.16102087e-03, -1.56650202e-02, -2.06472276e-03, -2.08325083e-02,
1.91286310e-02, 8.36141875e-03, -8.11077609e-04, -1.92386611e-02,
2.00920603e-02, -3.85844310e-02, -3.05273896e-03, 1.50174930e-03,
4.46882570e-03, -2.53156515e-02, 7.88625472e-03, -4.30667359e-03,
-9.39565665e-03, 8.74410281e-04, 5.34033060e-02, -8.75319328e-03,
-9.87543920e-03, -5.46208786e-03, -6.50628504e-03, -7.57054927e-03,
-3.93052166e-03, 2.16708143e-03, 1.42646187e-04, -7.25448253e-03,
-1.06866675e-02, 3.59743997e-02, -3.22111433e-03, -7.26964438e-03,
2.44697544e-02, 1.66259945e-02, -3.95567627e-03, 1.93527167e-02,
1.64243560e-03, 2.66140257e-02, 3.36265867e-02, -1.45173875e-03,
-6.50628504e-03, 2.01777145e-02, 5.00185982e-03, -2.68591401e-02,
5.35982088e-03, 7.15823115e-02, -4.28643707e-03, 1.62057894e-02,
-6.83444130e-03, 5.02746656e-02, 2.21616214e-03, -9.68313760e-03,
-3.70581025e-03, 8.12628710e-03, -3.70581025e-03, -8.75319328e-03,
9.87255512e-03, 6.73500014e-03, 5.45383777e-02, -3.86630195e-03,
2.85044068e-03, 3.06522130e-02, 3.29460041e-03, 9.06824300e-03,
-3.16553335e-03, 6.66199922e-03, 5.24429083e-04, -3.71585490e-03,
6.27464537e-03, -8.96973965e-03, 8.17747652e-03, -5.51070443e-04,
5.00562964e-03, -2.66748822e-02, 1.38245766e-02, 6.50494563e-04,
-9.44020712e-03, 1.96937834e-02, -2.77638032e-03, -1.82182974e-04,
7.87300556e-03, 4.61156800e-03, -4.69126662e-02, 1.10852780e-02,
9.89909316e-04, 3.60295491e-04, 1.57297735e-02, 4.62766075e-02,
1.33140739e-02, 6.19039552e-03, 1.39541582e-02, -1.69532139e-03,
-1.16102087e-03, 3.80017680e-03, -6.50628504e-03, 6.10508974e-03,
1.15785507e-03, 1.68395233e-02, 3.28879539e-03, 1.30998048e-02,
-7.75223512e-03, 1.76696777e-02, -2.06107510e-03, 2.55053545e-02,
-4.71198937e-02, 6.10950021e-02, 4.04423462e-05, -3.27229081e-02,
-1.28115640e-03, -8.02660978e-02, 4.14655803e-02, -2.31697945e-02,
3.25156125e-02, 3.42166041e-02, -8.96781154e-02, 1.60982957e-03,
4.81951811e-02, 1.33224275e-03, 1.14265095e-02, -2.24592418e-02,
-2.36253054e-03, 2.93535839e-02, 1.33670513e-02, 6.95616360e-03,
-4.05554484e-02, -1.17983934e-02, -4.75658884e-03, -1.99465422e-02,
1.67505552e-02, -3.21242621e-03, 4.16127660e-03, -3.64498680e-02,
-2.67821052e-03, -1.50303162e-02, 1.17175118e-02, 1.08760602e-02,
-5.16998437e-03, -2.69782135e-04, -2.70069166e-03, -4.10286181e-02,
1.24384512e-03, 1.07679725e-03, -2.89405080e-02, -1.72169721e-03,
6.27027032e-02, -5.21718738e-04, 3.55731258e-02, -2.26476632e-02,
1.39934603e-02, 1.19824431e-02, -1.38995866e-02, 4.92600824e-03,
-7.93047226e-03, 4.36035646e-02, -1.69812569e-04, -6.50628504e-03,
-7.92308831e-02, 1.37241203e-02, 1.26096808e-02, 1.72127291e-03,
-2.10604483e-02, -3.86630195e-03, 1.76504359e-02, -6.50628504e-03,
-4.36824084e-03, 3.47388067e-03, 2.25694330e-03, -1.09188288e-02,
-2.17498918e-02, -2.32474048e-02, 2.99691884e-02, -2.52163580e-02,
1.43217263e-02, -8.43296067e-04, 2.70521511e-02, -4.18762082e-02,
-1.55742569e-02, 9.67830047e-02, 3.12716409e-02, 1.42646187e-04,
3.46233434e-02, -4.70378050e-03, -1.00597171e-02, -3.10874944e-03,
8.58906318e-04, -5.80679408e-04, -1.04566714e-02, 7.34680904e-03,
1.05924652e-02, -2.06613277e-02, -3.16553335e-03, 1.77879256e-03,
-4.37393282e-02, -5.83798230e-03, -6.18573615e-03, -1.14453858e-02,
-7.59866074e-03, -2.21871359e-02, -3.16553335e-03, -1.00102993e-04,
1.18235288e-03, 8.06930693e-03, -1.96816126e-04, 1.84993239e-02,
2.13461349e-02, 7.07266741e-03, 1.42646187e-04, -5.47744218e-03,
-1.03254946e-02, -1.73400374e-02, -2.58117361e-04, 1.24010993e-02,
3.92948733e-03, -1.82873090e-02, 1.75378801e-04, -2.85078366e-03,
5.82887766e-03, -9.36706224e-03, 2.93535839e-02, 1.69861258e-02,
1.29833871e-02, -2.34977297e-03, 4.12969746e-04, -4.58429383e-03,
-1.97912848e-02, -1.71597336e-02, 1.00754861e-02, 1.46873022e-02,
1.29070842e-02, -8.75319328e-03, -2.44563897e-02, -2.40753008e-02,
-2.29969049e-03, 5.32787323e-03, -9.60620144e-04, 5.82650104e-03,
1.23422518e-03, 1.23086831e-02, -4.98560347e-03, -2.14837586e-03,
-2.19718025e-03, 3.04158477e-03, 1.42646187e-04, -6.03845600e-03,
2.68922183e-03, 4.90456739e-03, -2.64048818e-02, -4.25583690e-03,
1.13904307e-03, 1.82139863e-02, 5.86379367e-03, 3.09502502e-02,
1.04408664e-02, 3.30992549e-02, 3.26630269e-03, -8.84241306e-03,
2.53524525e-03, -1.93375418e-02, -1.65741420e-03, 3.01436365e-02,
-3.25712075e-04, -3.89183492e-02, -2.70689577e-03, 1.42646187e-04,
-4.11961023e-02, -1.19694286e-02, 3.78084147e-03, 5.03621425e-03,
-1.31770086e-03, 1.73333962e-03, 1.83711198e-02, 4.24728106e-03,
4.81215520e-03, -2.05966443e-03, 1.55374284e-02, 2.09652199e-04,
-5.55169662e-02, 1.13634914e-02, 2.09302581e-02, 1.42646187e-04,
1.55923464e-02, 5.68731417e-03, -3.66920155e-03, 3.35823081e-02,
2.59721886e-02, 1.55376680e-02, 1.98637057e-03, -5.16036812e-02,
2.23336166e-02, 8.63353952e-03, 7.94881900e-03, -6.61724505e-04,
1.48939556e-02, 1.56944799e-02, -1.52858182e-02, 1.28012873e-03,
-1.90668483e-02, -6.50628504e-03, 7.59052706e-03, -3.16190751e-02,
-2.24662311e-02, 1.34058745e-03, -2.34977297e-03, 6.91485104e-03,
2.07674911e-02, -3.43720015e-03, 5.12697636e-03, 5.68418815e-03,
-1.98302377e-02, 3.17619165e-03, 2.87206334e-02, -4.16068265e-04,
-1.70174842e-02, -4.25886490e-03, -1.90537711e-02, 1.40145629e-02,
1.91054495e-03, -1.76012040e-02, 5.62004485e-02, -7.56296167e-03,
1.93897412e-02, -1.08269795e-03, -4.12494685e-02, -8.52674004e-02,
2.93535839e-02, -2.34977297e-03, -4.35269843e-02, 8.49059462e-04,
-3.10987210e-02, -2.11906108e-03, 2.75347206e-02, -1.16102087e-03,
-2.14935108e-04, -9.37163282e-04, 2.08917108e-02, -1.49624887e-04,
-1.20360853e-03, -1.84003701e-02, 5.69942226e-02, -6.51881530e-03,
1.19581645e-02, 9.34846000e-03, 1.95967201e-02, 1.83098260e-02,
1.50381701e-02, -1.01716106e-02, 1.19326200e-02, -8.45938115e-05,
-5.39846427e-02, 4.12628460e-02, 5.94171647e-02, -3.16112209e-02,
-1.04566714e-02, 2.13981361e-03, 2.17117295e-03, 5.24219656e-03,
-7.49931840e-04, -7.69360893e-02, 1.04539920e-03, -1.25689487e-02,
-1.11945172e-02, -1.06232041e-02, 7.95284125e-03, -1.19005699e-03,
-7.01669455e-03, 1.35516141e-03, -1.18424677e-02, 7.34416499e-03,
1.82665318e-02, -6.50628504e-03, -1.66917762e-04, 7.93482177e-03,
7.15381390e-02, 7.11127527e-05, -1.48615269e-03, 2.64163194e-02,
-4.44932982e-02, -2.34977297e-03, 4.55293195e-02, -8.35069547e-03,
3.70642948e-03, 1.53445914e-02, 5.59201845e-03, -4.88137601e-02,
1.52571208e-03, 2.52508119e-03, -1.16102087e-03, 1.21301764e-02,
7.62865196e-03, -9.38930734e-03, -1.87311721e-03, 1.73434631e-02,
-9.26327361e-03, -7.73302270e-03, -8.66310438e-03, 1.59627685e-02,
2.15100444e-02, -2.01720393e-02, -6.26028935e-03, -1.04566714e-02,
1.94997264e-02, 7.40022891e-03, 6.63133282e-02, -2.86199529e-02,
1.88261116e-03, 1.98344906e-03, 2.16258155e-03, 1.30730564e-02,
-1.16102087e-03, -2.62273463e-02, -6.53665300e-03, -2.45184235e-02,
-4.12609788e-02, -2.54090496e-03, -2.54805998e-02, 6.54251064e-03,
-6.50628504e-03, 1.90305898e-02, 3.77064587e-02, 7.18832359e-03,
-3.14250026e-04, -3.75230486e-02, -3.92870243e-03, 1.00036256e-02,
1.99581498e-02, -1.45951448e-02, 3.94904799e-04, -1.47333697e-02,
-1.12384254e-02, -2.02326112e-02, 1.89755598e-02, -5.89852839e-03,
7.44942716e-03, -1.07168755e-02, -1.68000304e-02, 1.22239083e-03])
Hi,
I just installed the latest version. I am running your tutorial script "pdpbox_binary_classification.ipynb".
When running step 1.2
fig, axes, summary_df = info_plots.actual_plot(
model=titanic_model, X=titanic_data[titanic_features], feature='Sex', feature_name='gender'
)
I got an error as below:
TypeError Traceback (most recent call last)
in ()
1 fig, axes, summary_df = info_plots.actual_plot(
----> 2 model=titanic_model, X=titanic_data[titanic_features], feature='Sex', feature_name='gender'
3 )
~/.local/lib/python3.6/site-packages/pdpbox/info_plots.py in actual_plot(model, X, feature, feature_name, num_grid_points, grid_type, percentile_range, grid_range, cust_grid_points, show_percentile, show_outliers, endpoint, which_classes, predict_kwds, ncols, figsize, plot_params)
289 # make predictions
290 # info_df only contains feature value and actual predictions
--> 291 prediction = predict(X, **predict_kwds)
292 info_df = X[_make_list(feature)]
293 actual_prediction_columns = ['actual_prediction']
TypeError: predict_proba() argument after ** must be a mapping, not NoneType
I tried a different model and got the same error here.
Here is my code to reproduce the problem:
from pdpbox import pdp, get_dataset, info_plots
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
# Setup data
data = load_iris()
df = pd.DataFrame(data.data, columns = data.feature_names)
df.index = data.target
# Train basic model
estimator = RandomForestClassifier()
model = estimator.fit(df, df.index)
# pdp_interactions
pdp_paid= pdp.pdp_interact(
model=model, dataset=df, model_features=df.columns, features=df.columns,
num_grid_points=[5, 5, 5],
percentile_ranges=[None, None, None],
n_jobs=4
)
# plotting
fig, axes = pdp.pdp_interact_plot(
pdp_paid, ['petal length (cm)', 'petal width (cm)'], plot_type='grid',x_quantile=True, ncols=2, plot_pdp=True,
which_classes=[0, 1, 2]
)
The problem is that in the reference docs you have, these subplots that show the dimensional values to the left and above each class plot, they are aligned with the grid of the figure. They seem to be squished. I can probably figure out how to reference to axis or figure directly and correct them but is this expected? Any easy fix?
Thanks! Great library!
I understand that this package supports all sci-kit models, PDP's should technically work as long as the model.predict function works, however I get this error when trying to run pdp_isolate for a Keras model. Can you confirm its the model thats causing this error? I was successfully able to run it for a RF sci-kit model.
I have a couple of features which are scaled between 0 and 1. For all of those I get a "ValueError: cannot reindex from a duplicate axis". I assume that in creating the columns for the different values of a feature, some rounding happens for their naming, which results in several columns having the same name, although I couldn't trace back the error in the code. Multiplying the column by 10 solves the problem but is of course unintended.
The error message below.
Thanks for this beautiful package.
/home/cdsw/.local/lib/python3.6/site-packages/pdpbox/pdp.py in pdp_plot(pdp_isolate_out, feature_name, center, plot_org_pts, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, figsize, ncols, plot_params, multi_flag, which_class)
546 _pdp_plot(pdp_isolate_out=pdp_isolate_out, feature_name=feature_name, center=center, plot_org_pts=plot_org_pts, plot_lines=plot_lines,
547 frac_to_plot=frac_to_plot, cluster=cluster, n_cluster_centers=n_cluster_centers, cluster_method=cluster_method, x_quantile=x_quantile,
--> 548 ax=ax2, plot_params=plot_params)
549
550
/home/cdsw/.local/lib/python3.6/site-packages/pdpbox/pdp.py in _pdp_plot(pdp_isolate_out, feature_name, center, plot_org_pts, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, ax, plot_params)
616 pdp_y -= pdp_y[0]
617 for col in display_columns[1:]:
--> 618 ice_lines[col] -= ice_lines[display_columns[0]]
619 ice_lines['actual_preds'] -= ice_lines[display_columns[0]]
620 ice_lines[display_columns[0]] = 0
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in f(self, other)
895
896 def f(self, other):
--> 897 result = method(self, other)
898
899 # this makes sure that we are aligned like the input
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in f(self, other, axis, level, fill_value)
1552 return _combine_series_frame(self, other, na_op,
1553 fill_value=fill_value, axis=axis,
-> 1554 level=level, try_cast=True)
1555 else:
1556 if fill_value is not None:
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/ops.py in _combine_series_frame(self, other, func, fill_value, axis, level, try_cast)
1437 # default axis is columns
1438 return self._combine_match_columns(other, func, level=level,
-> 1439 try_cast=try_cast)
1440
1441
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/frame.py in _combine_match_columns(self, other, func, level, try_cast)
4767 def _combine_match_columns(self, other, func, level=None, try_cast=True):
4768 left, right = self.align(other, join='outer', axis=1, level=level,
-> 4769 copy=False)
4770
4771 new_data = left._data.eval(func=func, other=right,
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/frame.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
3548 method=method, limit=limit,
3549 fill_axis=fill_axis,
-> 3550 broadcast_axis=broadcast_axis)
3551
3552 @appender(_shared_docs['reindex'] % _shared_doc_kwargs)
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/generic.py in align(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis, broadcast_axis)
7364 copy=copy, fill_value=fill_value,
7365 method=method, limit=limit,
-> 7366 fill_axis=fill_axis)
7367 else: # pragma: no cover
7368 raise TypeError('unsupported type: %s' % type(other))
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/generic.py in _align_series(self, other, join, axis, level, copy, fill_value, method, limit, fill_axis)
7461
7462 if lidx is not None:
-> 7463 fdata = fdata.reindex_indexer(join_index, lidx, axis=0)
7464 else:
7465 raise ValueError('Must specify axis=0 or 1')
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/internals.py in reindex_indexer(self, new_axis, indexer, axis, fill_value, allow_dups, copy)
4412 # some axes don't allow reindexing with dups
4413 if not allow_dups:
-> 4414 self.axes[axis]._can_reindex(indexer)
4415
4416 if axis >= self.ndim:
/home/cdsw/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py in _can_reindex(self, indexer)
3558 # trying to reindex on an axis with duplicates
3559 if not self.is_unique and len(indexer):
-> 3560 raise ValueError("cannot reindex from a duplicate axis")
3561
3562 def reindex(self, target, method=None, level=None, limit=None,
ValueError: cannot reindex from a duplicate axis
Doesn't work w/ XGBClassifier. The kernel dies.
when i execute the follow code just like binary_classification tutorial:
"""
fig, axes, summary_df = info_plots.actual_plot(
model=titanic_model, X=titanic_data[titanic_features], feature=['Embarked_C', 'Embarked_S', 'Embarked_Q'],
feature_name='embarked'
)
"""
and got follow error:
"""
TypeError: predict_proba() argument after ** must be a mapping, not NoneType
"""
i also tried lgb.LGBMClassifier and lgb raw model on my own data but got same error.
is there anyone knows how to fix it?
Looks like the two plots do not talk to each other:
i.e. if i provide percentile_range to the pdp_isolate
function, the ICE line axes limits looks correct;
however, the 'rug' plot is not adjusted accordingly.
Hey.
Do you have any interest in upstreaming part of this to scikit-learn?
We had a PR here: scikit-learn/scikit-learn#5653
but it's a bit stalled.
We probably don't want to have as much plotting code as you do, but some basics in sklearn would be cool.
I am trying to execute the code:
from pdpbox import pdp, get_dataset, info_plots
test_titanic = get_dataset.titanic()
And I'm having the below error.
PDP 0.2.0+13.g73c6966
XGBoost 1.1.0-SNAPSHOT
conda environment
Stacktrace:
XGBoostError Traceback (most recent call last)
<ipython-input-2-931a5e8d7b9f> in <module>
----> 1 test_titanic = get_dataset.titanic()
~/anaconda3/lib/python3.6/site-packages/PDPbox-0.2.0+13.g73c6966-py3.6.egg/pdpbox/get_dataset.py in titanic()
7
8 def titanic():
----> 9 dataset = joblib.load(os.path.join(DIR, 'datasets/test_titanic.pkl'))
10 return dataset
11
~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in load(filename, mmap_mode)
603 return load_compatibility(fobj)
604
--> 605 obj = _unpickle(fobj, filename, mmap_mode)
606
607 return obj
~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in _unpickle(fobj, filename, mmap_mode)
527 obj = None
528 try:
--> 529 obj = unpickler.load()
530 if unpickler.compat_mode:
531 warnings.warn("The file '%s' has been generated with a "
~/anaconda3/lib/python3.6/pickle.py in load(self)
1048 raise EOFError
1049 assert isinstance(key, bytes_types)
-> 1050 dispatch[key[0]](self)
1051 except _Stop as stopinst:
1052 return stopinst.value
~/anaconda3/lib/python3.6/site-packages/joblib/numpy_pickle.py in load_build(self)
340 NDArrayWrapper is used for backward compatibility with joblib <= 0.9.
341 """
--> 342 Unpickler.load_build(self)
343
344 # For backward compatibility, we support NDArrayWrapper objects.
~/anaconda3/lib/python3.6/pickle.py in load_build(self)
1505 setstate = getattr(inst, "__setstate__", None)
1506 if setstate is not None:
-> 1507 setstate(state)
1508 return
1509 slotstate = None
~/anaconda3/lib/python3.6/site-packages/xgboost/core.py in __setstate__(self, state)
1096 ptr = (ctypes.c_char * len(buf)).from_buffer(buf)
1097 _check_call(
-> 1098 _LIB.XGBoosterUnserializeFromBuffer(handle, ptr, length))
1099 state['handle'] = handle
1100 self.__dict__.update(state)
~/anaconda3/lib/python3.6/site-packages/xgboost/core.py in _check_call(ret)
187 """
188 if ret != 0:
--> 189 raise XGBoostError(py_str(_LIB.XGBGetLastError()))
190
191
XGBoostError: [18:53:06] /home/sergey/xgboost/src/learner.cc:834: Check failed: header == serialisation_header_:
If you are loading a serialized model (like pickle in Python) generated by older
XGBoost, please export the model by calling `Booster.save_model` from that version
first, then load it back in current version. There's a simple script for helping
the process. See:
https://xgboost.readthedocs.io/en/latest/tutorials/saving_model.html
for reference to the script, and more details about differences between saving model and
serializing.
Stack trace:
[bt] (0) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x64) [0x7fe81e08c784]
[bt] (1) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(xgboost::LearnerIO::Load(dmlc::Stream*)+0x674) [0x7fe81e19f444]
[bt] (2) /home/sergey/anaconda3/lib/python3.6/site-packages/xgboost/lib/libxgboost.so(XGBoosterUnserializeFromBuffer+0x5e) [0x7fe81e07f61e]
[bt] (3) /home/sergey/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call_unix64+0x4c) [0x7fe84c23d630]
[bt] (4) /home/sergey/anaconda3/lib/python3.6/lib-dynload/../../libffi.so.6(ffi_call+0x22d) [0x7fe84c23cfed]
[bt] (5) /home/sergey/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7fe84b3c509e]
[bt] (6) /home/sergey/anaconda3/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x13ad5) [0x7fe84b3c5ad5]
[bt] (7) /home/sergey/anaconda3/bin/python -m ipykernel -f /home/sergey/.local/share/jupyter/runtime/kernel-813f0269-7bc5-4ef8-b890-fb9b799698ce.json(_PyObject_FastCallDict+0x8b) [0x559094256f8b]
[bt] (8) /home/sergey/anaconda3/bin/python -m ipykernel -f /home/sergey/.local/share/jupyter/runtime/kernel-813f0269-7bc5-4ef8-b890-fb9b799698ce.json(+0x1a162e) [0x5590942e562e]
Accumulated local effects describe how features influence the prediction of a machine learning model on average. ALE plots are a faster and unbiased alternative to partial dependence plots (PDPs).
https://christophm.github.io/interpretable-ml-book/ale.html
There's R modules supporting them but no Python module.
Hi @SauceCat , quick conceptual question:
Say if I selected I given feature to analyze in my multiclass classifier.
In pdp.pdp_isolate()
function for PDP plot, when would make sense to use the train set or test set to fill dataset
parameter?
Initially, I'd say it is more complete to build 2 PDP plots for the same feature, one using train set and another using test set. So you can verify if that feature is having an equivalent impact on both sets. But I am interested in your thoughts.
Regards, Fernando
Thanks for your great library! I'm using it in a course I'm teaching, so to make things easy, I'd like to make it installable from pypi. I'm happy to submit it, so you don't have to worry about it - but I figured I'd double-check that there wasn't any reason you wanted to avoid posting it to pypi (or whether you'd rather do it yourself). If I don't hear from you, I'll assume it's OK - but just ping me if you have any issues! :)
This command works fine and produces the expected results:
fig, axes = pdp.pdp_interact_plot(
pdp_interact_out = inter1,
feature_names=['NOx', 'NO_2'],
plot_type='grid'
)
However, changing only plot_type
to contour
gives an error related to the labels and the font size. The figure appears label-less at the bottom after this error. Any guess or help is appreciated.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-363-7b31c15b4793> in <module>()
2 pdp_interact_out = inter1,
3 feature_names=['NOx', 'NO_2'],
----> 4 plot_type='contour'
5 )
/Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp.py in pdp_interact_plot(pdp_interact_out, feature_names, plot_type, x_quantile, plot_pdp, which_classes, figsize, ncols, plot_params)
773 fig.add_subplot(inter_ax)
774 _pdp_inter_one(pdp_interact_out=pdp_interact_plot_data[0], inter_ax=inter_ax, norm=None,
--> 775 feature_names=feature_names_adj, **inter_params)
776 else:
777 wspace = 0.3
/Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp_plot_utils.py in _pdp_inter_one(pdp_interact_out, feature_names, plot_type, inter_ax, x_quantile, plot_params, norm, ticks)
330 # for numeric not quantile
331 X, Y = np.meshgrid(pdp_interact_out.feature_grids[0], pdp_interact_out.feature_grids[1])
--> 332 im = _pdp_contour_plot(X=X, Y=Y, **inter_params)
333 elif plot_type == 'grid':
334 im = _pdp_inter_grid(**inter_params)
/Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/pdpbox/pdp_plot_utils.py in _pdp_contour_plot(X, Y, pdp_mx, inter_ax, cmap, norm, inter_fill_alpha, fontsize, plot_params)
249 c1 = inter_ax.contourf(X, Y, pdp_mx, N=level, origin='lower', cmap=cmap, norm=norm, alpha=inter_fill_alpha)
250 c2 = inter_ax.contour(c1, levels=c1.levels, colors=contour_color, origin='lower')
--> 251 inter_ax.clabel(c2, contour_label_fontsize=fontsize, inline=1)
252 inter_ax.set_aspect('auto')
253
/Users/jsg/Documents/DrivenData_Cold_Forecast/venv/lib/python3.6/site-packages/matplotlib/axes/_axes.py in clabel(self, CS, *args, **kwargs)
6221
6222 def clabel(self, CS, *args, **kwargs):
-> 6223 return CS.clabel(*args, **kwargs)
6224 clabel.__doc__ = mcontour.ContourSet.clabel.__doc__
6225
TypeError: clabel() got an unexpected keyword argument 'contour_label_fontsize'
Thank you in advance. Awesome library by the way!
I find this library to be incredibly useful, though I would like to know if there are ways to customize the pdp_plot a bit more. Specifically, I would like to be able to:
I don't see a way to do either of these at present, but it would be incredibly helpful to have these as optional arguments.
ross_data = test_ross['data']
ross_features = test_ross['features']
ross_model = test_ross['rf_model']
ross_target = test_ross['target']
fig, axes, summary_df = info_plots.target_plot(
df=ross_data, feature='SchoolHoliday', feature_name='SchoolHoliday', target=ross_target
)
_ = axes['bar_ax'].set_xticklabels(['Not SchoolHoliday', 'SchoolHoliday'])
I use the exact same dataset 'Rossmann Store Sales' and the same code (Tutorial: pdpbox_regression.ipynb), but I encounter the error, and the plot is empty.
Please advise, thank you!
I just recently started to use this excellent repository to fill in a much needed gap in scikit learn. A suggestion for clarity in the parameters of pdpbox.pdp.pdp_isolate is to require train_X to be a deduplicated pandas dataframe because it caused a bit of confusion on my part when I wasn't able to plot due to the indexing issues from duplicated values. It's really just as simple as df.drop_duplicates(). Thanks for all of your work!
EDIT:
Another data checking step should be added at line 303 in pdp.py for using pdp.pdp_interact. If the feature grids are not specified and are defaulted to 10 and train_X.shape[0] is less than 100, then you will have an error on line 305 since data_chunk_size will round to 0. I just need to specify that num_grid_points=[5,5] so that it would run when train_X.shape[0] = 25.
Does PDPbox support PySpark models as well or any plan of releasing PySpark support in a future release?
It would be great to have the option to customise the PDPbox plot colours. Could this be added as a feature?
When I run
obj = pdp.pdp_isolate(model, X_train, X_train.columns, 'addy_change')
pdp.pdp_plot(obj, 'addy_change', plot_pts_dist=True, x_quantile=True)
where "addy_change" is a binary variable, I get the error pasted below.
The problem seems to be that count_data['xticklabels'] doesn't exist for binary variables, but when x_quantile = True, _pdp_plot looks for that key anyway.
My use case is that I'm actually looping through a large list of variables, with x_quantile set to True for all of them. I'm wondering if it would make sense for pdp_plot to ignore x_quantile=True if the variable is binary.
Barring that, it would be helpful to have a more informative error message.
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2524 try:
-> 2525 return self._engine.get_loc(key)
2526 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'xticklabels'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-70-faa0732e94a6> in <module>()
----> 1 pdp.pdp_plot(obj, 'addy_change', plot_pts_dist=True, x_quantile=True)
~/.conda/envs/checking/lib/python3.6/site-packages/pdpbox/pdp.py in pdp_plot(pdp_isolate_out, feature_name, center, plot_pts_dist, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, show_percentile, figsize, ncols, plot_params, which_classes)
414
415 _pdp_plot(pdp_isolate_out=pdp_plot_data[0], feature_name=feature_name_adj, pdp_ax=_pdp_ax,
--> 416 count_ax=_count_ax, **pdp_plot_params)
417 else:
418 pdp_ax = plt.subplot(outer_grid[1])
~/.conda/envs/checking/lib/python3.6/site-packages/pdpbox/pdp_plot_utils.py in _pdp_plot(pdp_isolate_out, feature_name, center, plot_lines, frac_to_plot, cluster, n_cluster_centers, cluster_method, x_quantile, show_percentile, pdp_ax, count_data, count_ax, plot_params)
97 # need to plot data distribution
98 if x_quantile:
---> 99 count_display_columns = count_data['xticklabels'].values
100 # number of grids = number of bins + 1
101 # count_x: min -> max + 1
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/frame.py in __getitem__(self, key)
2137 return self._getitem_multilevel(key)
2138 else:
-> 2139 return self._getitem_column(key)
2140
2141 def _getitem_column(self, key):
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/frame.py in _getitem_column(self, key)
2144 # get column
2145 if self.columns.is_unique:
-> 2146 return self._get_item_cache(key)
2147
2148 # duplicate columns & possible reduce dimensionality
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
1840 res = cache.get(item)
1841 if res is None:
-> 1842 values = self._data.get(item)
1843 res = self._box_item_values(item, values)
1844 cache[item] = res
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/internals.py in get(self, item, fastpath)
3841
3842 if not isna(item):
-> 3843 loc = self.items.get_loc(item)
3844 else:
3845 indexer = np.arange(len(self.items))[isna(self.items)]
~/.conda/envs/checking/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2525 return self._engine.get_loc(key)
2526 except KeyError:
-> 2527 return self._engine.get_loc(self._maybe_cast_indexer(key))
2528
2529 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'xticklabels'
When i run with own data set,I get the following error:
AttributeError Traceback (most recent call last)
in
4 feature='sex',
5 feature_name='Gender',
----> 6 predict_kwds={}
7 )
/opt/anaconda2/envs/python35/lib/python3.5/site-packages/pdpbox/info_plots.py in actual_plot(model, X, feature, feature_name, num_grid_points, grid_type, percentile_range, grid_range, cust_grid_points, show_percentile, show_outliers, endpoint, which_classes, predict_kwds, ncols, figsize, plot_params)
289 # make predictions
290 # info_df only contains feature value and actual predictions
--> 291 prediction = predict(X, **predict_kwds)
292 info_df = X[_make_list(feature)]
293 actual_prediction_columns = ['actual_prediction']
/opt/anaconda2/envs/python35/lib/python3.5/site-packages/xgboost/core.py in predict(self, data, output_margin, ntree_limit, pred_leaf, pred_contribs, approx_contribs, pred_interactions, validate_features)
1282
1283 if validate_features:
-> 1284 self._validate_features(data)
1285
1286 length = c_bst_ulong()
/opt/anaconda2/envs/python35/lib/python3.5/site-packages/xgboost/core.py in _validate_features(self, data)
1669 """
1670 if self.feature_names is None:
-> 1671 self.feature_names = data.feature_names
1672 self.feature_types = data.feature_types
1673 else:
/opt/anaconda2/envs/python35/lib/python3.5/site-packages/pandas/core/generic.py in getattr(self, name)
5065 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5066 return self[name]
-> 5067 return object.getattribute(self, name)
5068
5069 def setattr(self, name, value):
AttributeError: 'DataFrame' object has no attribute 'feature_names'
so i want to know how to train the titanic_model in the example.
Thank for you advice.
Hi,
i was wondering how to exactly interpret the values on the y-axis of the partial dependence plots in the case of a classification. The classifier outputs probabilities between 0 and 1, however, the plots shows negative and positive values which can also be greater than one.
Thanks in advance
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.