Giter Site home page Giter Site logo

How to train titanic_model about pdpbox HOT 6 OPEN

saucecat avatar saucecat commented on August 13, 2024
How to train titanic_model

from pdpbox.

Comments (6)

dyerrington avatar dyerrington commented on August 13, 2024

Looks like you're referencing an attribute that doesn't exist in your dataframe @szz01. Why don't you post your full code example?

from pdpbox.

ivan-marroquin avatar ivan-marroquin commented on August 13, 2024

Hi @dyerrington

I have the same issue with PDPpox version 0.2.0. I am using Python 3.6.5 on a windows machine.

The classifier was generated using xgboost 0.90 with command XGBClassifier and to fit the classifier, I used Python arrays (the same data set is part of the attached zip file).

The attached a zip file contains a Python script and its input data necessary to duplicate the incident.

Many thanks,
Ivan

testing_pdpbox.zip

from pdpbox.

ivan-marroquin avatar ivan-marroquin commented on August 13, 2024

Hi there,

I was wondering if someone had the opportunity to look into this issue.

Many thanks,

Ivan

from pdpbox.

SauceCat avatar SauceCat commented on August 13, 2024

@ivan-marroquin can you put your error messages here?

from pdpbox.

ivan-marroquin avatar ivan-marroquin commented on August 13, 2024

Hi @SauceCat

As per your request:

pdpbox_interaction= pdp.pdp_interact(model= best_trained_model, dataset= pd_test_inputs, model_features= feature_names, features= features_to_plot)

File "c:\temp\python\python3.6.5\lib\site-packages\pdpbox\pdp.py", line 558, in pdp_interact
n_jobs=n_jobs, predict_kwds=predict_kwds, data_transformer=data_transformer)

File "c:\temp\python\python3.6.5\lib\site-packages\pdpbox\pdp.py", line 159, in pdp_isolate
for feature_grid in feature_grids)

File "c:\temp\python\python3.6.5\lib\site-packages\joblib\parallel.py", line 921, in call
if self.dispatch_one_batch(iterator):

File "c:\temp\python\python3.6.5\lib\site-packages\joblib\parallel.py", line 759, in dispatch_one_batch
self._dispatch(tasks)

File "c:\temp\python\python3.6.5\lib\site-packages\joblib\parallel.py", line 716, in _dispatch
job = self._backend.apply_async(batch, callback=cb)

File "c:\temp\python\python3.6.5\lib\site-packages\joblib_parallel_backends.py", line 182, in apply_async
result = ImmediateResult(func)

File "c:\temp\python\python3.6.5\lib\site-packages\joblib_parallel_backends.py", line 549, in init
self.results = batch()

File "c:\temp\python\python3.6.5\lib\site-packages\joblib\parallel.py", line 225, in call
for func, args, kwargs in self.items]

File "c:\temp\python\python3.6.5\lib\site-packages\joblib\parallel.py", line 225, in
for func, args, kwargs in self.items]

File "c:\temp\python\python3.6.5\lib\site-packages\pdpbox\pdp_calc_utils.py", line 44, in _calc_ice_lines
preds = predict(_data[model_features], **predict_kwds)

File "c:\temp\python\python3.6.5\lib\site-packages\xgboost\core.py", line 1284, in predict
self._validate_features(data)

File "c:\temp\python\python3.6.5\lib\site-packages\xgboost\core.py", line 1675, in _validate_features
if self.feature_names != data.feature_names:

File "c:\temp\python\python3.6.5\lib\site-packages\pandas\core\generic.py", line 5180, in getattr
return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'feature_names'

Many thanks,
Ivan

from pdpbox.

dyerrington avatar dyerrington commented on August 13, 2024

To me, @ivan-marroquin , the error is descriptive:

File "c:\temp\python\python3.6.5\lib\site-packages\xgboost\core.py", line 1675, in _validate_features
if self.feature_names != data.feature_names:

File "c:\temp\python\python3.6.5\lib\site-packages\pandas\core\generic.py", line 5180, in getattr
return object.getattribute(self, name)

AttributeError: 'DataFrame' object has no attribute 'feature_names'

The part of the code from xgboost that throws this error is this:

Line ~1675 of xgboost/core.py

    def _validate_features(self, data):
        """
        Validate Booster and data's feature_names are identical.
        Set feature_names and feature_types from DMatrix
        """
        if self.feature_names is None:
            self.feature_names = data.feature_names
            self.feature_types = data.feature_types
        else:
            # Booster can't accept data with different feature names
            if self.feature_names != data.feature_names:
                dat_missing = set(self.feature_names) - set(data.feature_names)
                my_missing = set(data.feature_names) - set(self.feature_names)

                msg = 'feature_names mismatch: {0} {1}'

                if dat_missing:
                    msg += ('\nexpected ' + ', '.join(str(s) for s in dat_missing) +
                            ' in input data')

                if my_missing:
                    msg += ('\ntraining data did not have the following fields: ' +
                            ', '.join(str(s) for s in my_missing))

                raise ValueError(msg.format(self.feature_names,
                                            data.feature_names))
    

xgboost is trying to make sure the data that the model is derived from matches the data frame in reference -- as far as I can tell. When the original object (data in this case) doesn't have an attribute, .feature_names, the original DataFrame type object throws the final error.

The first thing I would check is that the model you've trained matches the data you are trying to plot. I would double-check everything including the encoding of feature names. Assert that they match 100% before doing anything with PDP then fix any problems. If it fails, absolutely reduce the problem and re-revaluate. Try building a model with fewer features and a very small number of observations so that it trains in seconds or milliseconds, then try to get it to work in the same file or in a notebook environment without doing any encoding or decoding / serialization of models.

from pdpbox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.