Giter Site home page Giter Site logo

Comments (16)

pplonski avatar pplonski commented on May 18, 2024 5

@vivek2319 I've redesigned the AutoML class. It always saves the models and all artifacts, like metrics, learning curves to hard drive. There is a change in the AutoML constructor, the new argument is results_path where you define the name of the directory where all results will be saved. If the results_path exists the AutoML will try to load models from it. The example:

Train the model and set the path to AutoML_Results_1

automl = AutoML(results_path="AutoML_Results_1")

This will create the AutoML_Results_1 directory and will save all models there.

If you create the new AutoML variable (or run the script again) and set AutoML_Results_1 directory in results_path it will try to load models from this path.

from mljar-supervised.

mglowacki100 avatar mglowacki100 commented on May 18, 2024 2

This snippet works for me:

import pickle
...

MODEL_PKL = 'automl_b.pkl'
...
automl = AutoML(...)
automl.fit(...)

#saving
with open(MODEL_PKL, 'wb') as f:
    pickle.dump(automl.to_json(), f)

#loading
with open(MODEL_PKL, 'rb') as f:
    model_json = pickle.load(f)
automl = AutoML()
automl.from_json(model_json)

from mljar-supervised.

stin7 avatar stin7 commented on May 18, 2024 2

When training on one system and wanting to load the model from_json onto another system, the load fails because it's trying to find something saved in a /tmp/ directory on the training system:

Traceback (most recent call last):
File "mljar_pipeline.py", line 12, in
mljar_pipeline.from_json(model_json)
File "/usr/local/lib/python3.6/site-packages/supervised/automl.py", line 306, in from_json
self._best_model.from_json(json_data["best_model"])
File "/usr/local/lib/python3.6/site-packages/supervised/models/ensemble.py", line 160, in from_json
il.from_json(model)
File "/usr/local/lib/python3.6/site-packages/supervised/iterative_learner_framework.py", line 157, in from_json
with zipfile.ZipFile(json_desc.get("framework_file_path"), "r") as zip_ref:
File "/usr/local/lib/python3.6/zipfile.py", line 1090, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/bec1c3cd-d29e-44ce-98d7-b5ec90cf14ea.framework'

Anyway to workaround this issue? (Pickle, Dill, Joblib and Cloudpickle all run into errors when trying to save the model directly.)

For reference, here is a snippet of the much longer json output that references the tmp directory in framework_file_path:

{'model': {'algorithm_short_name': 'RF',
'framework_file': 'eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'framework_file_path': '/tmp/eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'learners': [{'algorithm_name': 'Random Forest',
'algorithm_short_name': 'RF',
'library_version': '0.21.3',
'model_file': '9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'model_file_path': '/tmp/9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'params': {'criterion': 'entropy',
'max_features': 0.7,
'min_samples_leaf': 18,
'min_samples_split': 20,
'model_type': 'RF',
'seed': 9},
'uid': '9841f28d-d7a8-4a1e-906f-b7158a0700fa'},

from mljar-supervised.

TheAccountant777 avatar TheAccountant777 commented on May 18, 2024 1

After training the model, i call model.to_json() i get the following error:

'ModelFramework' object has no attribute 'to_json'

Heres the code :

automl = AutoML(mode="Perform")
model = automl.fit(X_train, y_train)
model.to_json()

anything changed ?

from mljar-supervised.

Matthew-J-Payne avatar Matthew-J-Payne commented on May 18, 2024 1

@pplonski I got it to work! Your clf.classes_ = 3 prompt got me thinking. Instead of specifying the number of classes, I specified the class names :

classes = ["target1", "target2, "target3", "target4"]
clf.classes_ = classes

PDP runs fine after that.

(Predict methods executed happily, too.)

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

Hey!
Thank you for reporting this. It is a bug in the package while converting AutoML objects into JSON string. I'm working on it. It will be fixed with a new release.

from mljar-supervised.

DunaiFuentes avatar DunaiFuentes commented on May 18, 2024

Building on the answer from @mglowacki100. To dump your model into a single (zip) file you can take with you wherever you go, you may use this:

import os
import pickle
import shutil
from supervised.automl import AutoML

# Includes parameters
def automl_to_zip(automl, save_dir=None):
    automl_json = automl.to_json()

    # Create the directory where tmp files will be copied.
    if save_dir is None:
        uid = automl_json['best_model']['uid']
        save_dir = 'model_' + uid
    os.mkdir(save_dir)
    
    # Copy file to new dir an replace their pointers in the json
    for i, model in enumerate(automl_json['best_model']['models']):
        framework_file_path = model['model']['framework_file_path']
        new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
        print('Moving: ', framework_file_path, ' to: ', new_path)
        shutil.copyfile(framework_file_path, new_path)
        model['model']['framework_file_path'] = new_path
        learners = model['model']['learners']
        for learner in learners:
            model_file_path = learner['model_file_path']
            new_path = os.path.join(save_dir, os.path.basename(model_file_path))
            print('Moving: ', model_file_path, ' to: ', new_path)
            shutil.copyfile(model_file_path, new_path)
            learner['model_file_path'] = new_path  
            
    # Dump json with model description as a pickle (json.dump is troublesome)
    with open(os.path.join(save_dir, "mljar.model.pkl"), 'wb') as f:
        pickle.dump(automl_json, f)
        
    # Zip directory
    shutil.make_archive(save_dir, 'zip', save_dir)


# Includes parameters
def automl_from_zip(save_dir):
    
    # Unzip directory
    shutil.unpack_archive(save_dir + '.zip', save_dir, 'zip')  
    
    # Load json description of the model
    with open(os.path.join(save_dir, "mljar.model.pkl"), 'rb') as f:
        automl_json = pickle.load(f)

    # Replace pointers in the json
    for i, model in enumerate(automl_json['best_model']['models']):
        framework_file_path = model['model']['framework_file_path']
        new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
        model['model']['framework_file_path'] = new_path
        learners = model['model']['learners']
        for learner in learners:
            model_file_path = learner['model_file_path']
            new_path = os.path.join(save_dir, os.path.basename(model_file_path))
            learner['model_file_path'] = new_path
    
    # Setup the model
    automl = AutoML()
    automl.from_json(automl_json)
    return automl

# Your autoML training 
automl = AutoML()
automl.fit(train_x, train_y)

automl_to_zip(automl, 'my_model')  # Save your trained model to zip file
automl2 = automl_from_zip('my_model')  # Load your saved model from zip file

# Predictions from automl and automl2 will be the same.
test_preds = automl.predict(test_x)
test_preds2 = automl2.predict(test_x)
check_same = test_preds == test_preds2
check_same.head()

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

@TheAccountant777 why do you need to call to_json()?

If you would like to display AutoML in notebook, please try report() - details https://mljar.com/blog/automl-notebook/

If you run AutoML in the python script, then all details are save in the results_path - its name should be displayed at training start.

from mljar-supervised.

Matthew-J-Payne avatar Matthew-J-Payne commented on May 18, 2024

@pplonski Fantastic library you've created!
Is there a way to preserve xgb model attributes when mljar saves xgb?
For example, I get the following error:

# load automl from previous session
clf = AutoML(results_path = "previous_autoML")
print("automl loaded")

# load dataframe which model was trained on
x_train, x_test, y_train, y_test = train_test_split(
    df.drop(["class"], axis = 1), df["class"], train_size = 0.75
)

# predictions and classification report run fine
predictions = clf.predict(x_test)
print(classification_report(y_test, predictions))

pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
                                                X = x_test,
                                                features = ["feature_name"],
                                                target = "target_name",
                                                random_state = 0)

File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
    760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
    761 
...
    369     raise ValueError("Multiclass-multioutput estimators are not supported")
    371 # Use check_array only on lists and other non-array-likes / sparse. Do not
    372 # convert DataFrame into a NumPy array.

AttributeError: 'XGBClassifier' object has no attribute 'classes_'

I know autoML produces SHAP decision plots but I'd like to make PDPs.

I'm accessing the saved autoML directory because I don't know of a way to access a model after autoML has created them (i.e. whilst saved as a variable in the current session).

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

@Matthew-J-Payne the AutoML() class objects should have sklearn API. If PDP works with sklearn models then it should work. If there are some missing attributes you can add them before calling a method, for example:

clf.classes_ = 3
# and then call PDP

from mljar-supervised.

Matthew-J-Payne avatar Matthew-J-Payne commented on May 18, 2024

@pplonski thanks for the reply!

Unfortunately, when I call:


clf.classes_ = 4
pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
                                                X = x_test,
                                                features = ["feature_name"],
                                                target = "target_name",
                                                random_state = 0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
    760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
    761 
...
    369     raise ValueError("Multiclass-multioutput estimators are not supported")
    371 # Use check_array only on lists and other non-array-likes / sparse. Do not
    372 # convert DataFrame into a NumPy array.

TypeError: 'int' object is not subscriptable

PDP is a scikit-learn class and so should accept automl objects.
PDP expects the object to implement predict, predict_proba or decision_function. doc link

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

What if you run:

clf.predict(x_test)
# or
clf.predict_proba(x_test)

do you have errors as well?

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

@Matthew-J-Payne fantastic!

from mljar-supervised.

Matthew-J-Payne avatar Matthew-J-Payne commented on May 18, 2024

thanks for your help!

from mljar-supervised.

nodirmcsd avatar nodirmcsd commented on May 18, 2024

I understand correctly that "AutoML_X" folder contains everything for load model in production service and predict data?
a = AutoML(r'C:\Users\...\AutoML_6')
This is right way to load model in the REST service?

from mljar-supervised.

pplonski avatar pplonski commented on May 18, 2024

Thats right.

from mljar-supervised.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.