Hi, traditionally I had been using pickle package to save models in pkl file and r

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

This snippet works for me: <div class="snippet-clipboard-content notranslate posit

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Building on the answer from <a class="user-mention notranslate" data-hovercard-type="u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Saving mljar automl model for future use about mljar-supervised HOT 16 CLOSED

mljar commented on May 18, 2024

Saving mljar automl model for future use

from mljar-supervised.

Comments (16)

pplonski commented on May 18, 2024 5

@vivek2319 I've redesigned the AutoML class. It always saves the models and all artifacts, like metrics, learning curves to hard drive. There is a change in the AutoML constructor, the new argument is results_path where you define the name of the directory where all results will be saved. If the results_path exists the AutoML will try to load models from it. The example:

Train the model and set the path to AutoML_Results_1

automl = AutoML(results_path="AutoML_Results_1")

This will create the AutoML_Results_1 directory and will save all models there.

If you create the new AutoML variable (or run the script again) and set AutoML_Results_1 directory in results_path it will try to load models from this path.

from mljar-supervised.

mglowacki100 commented on May 18, 2024 2

This snippet works for me:

import pickle
...

MODEL_PKL = 'automl_b.pkl'
...
automl = AutoML(...)
automl.fit(...)

#saving
with open(MODEL_PKL, 'wb') as f:
    pickle.dump(automl.to_json(), f)

#loading
with open(MODEL_PKL, 'rb') as f:
    model_json = pickle.load(f)
automl = AutoML()
automl.from_json(model_json)

from mljar-supervised.

stin7 commented on May 18, 2024 2

When training on one system and wanting to load the model from_json onto another system, the load fails because it's trying to find something saved in a /tmp/ directory on the training system:

Traceback (most recent call last):
File "mljar_pipeline.py", line 12, in
mljar_pipeline.from_json(model_json)
File "/usr/local/lib/python3.6/site-packages/supervised/automl.py", line 306, in from_json
self._best_model.from_json(json_data["best_model"])
File "/usr/local/lib/python3.6/site-packages/supervised/models/ensemble.py", line 160, in from_json
il.from_json(model)
File "/usr/local/lib/python3.6/site-packages/supervised/iterative_learner_framework.py", line 157, in from_json
with zipfile.ZipFile(json_desc.get("framework_file_path"), "r") as zip_ref:
File "/usr/local/lib/python3.6/zipfile.py", line 1090, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/bec1c3cd-d29e-44ce-98d7-b5ec90cf14ea.framework'

Anyway to workaround this issue? (Pickle, Dill, Joblib and Cloudpickle all run into errors when trying to save the model directly.)

For reference, here is a snippet of the much longer json output that references the tmp directory in framework_file_path:

{'model': {'algorithm_short_name': 'RF',
'framework_file': 'eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'framework_file_path': '/tmp/eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'learners': [{'algorithm_name': 'Random Forest',
'algorithm_short_name': 'RF',
'library_version': '0.21.3',
'model_file': '9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'model_file_path': '/tmp/9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'params': {'criterion': 'entropy',
'max_features': 0.7,
'min_samples_leaf': 18,
'min_samples_split': 20,
'model_type': 'RF',
'seed': 9},
'uid': '9841f28d-d7a8-4a1e-906f-b7158a0700fa'},

from mljar-supervised.

TheAccountant777 commented on May 18, 2024 1

After training the model, i call model.to_json() i get the following error:

'ModelFramework' object has no attribute 'to_json'

Heres the code :

automl = AutoML(mode="Perform")
model = automl.fit(X_train, y_train)
model.to_json()

anything changed ?

from mljar-supervised.

Matthew-J-Payne commented on May 18, 2024 1

@pplonski I got it to work! Your clf.classes_ = 3 prompt got me thinking. Instead of specifying the number of classes, I specified the class names :

classes = ["target1", "target2, "target3", "target4"]
clf.classes_ = classes

PDP runs fine after that.

(Predict methods executed happily, too.)

from mljar-supervised.

pplonski commented on May 18, 2024

Hey!
Thank you for reporting this. It is a bug in the package while converting AutoML objects into JSON string. I'm working on it. It will be fixed with a new release.

from mljar-supervised.

DunaiFuentes commented on May 18, 2024

Building on the answer from @mglowacki100. To dump your model into a single (zip) file you can take with you wherever you go, you may use this:

import os
import pickle
import shutil
from supervised.automl import AutoML

# Includes parameters
def automl_to_zip(automl, save_dir=None):
    automl_json = automl.to_json()

    # Create the directory where tmp files will be copied.
    if save_dir is None:
        uid = automl_json['best_model']['uid']
        save_dir = 'model_' + uid
    os.mkdir(save_dir)
    
    # Copy file to new dir an replace their pointers in the json
    for i, model in enumerate(automl_json['best_model']['models']):
        framework_file_path = model['model']['framework_file_path']
        new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
        print('Moving: ', framework_file_path, ' to: ', new_path)
        shutil.copyfile(framework_file_path, new_path)
        model['model']['framework_file_path'] = new_path
        learners = model['model']['learners']
        for learner in learners:
            model_file_path = learner['model_file_path']
            new_path = os.path.join(save_dir, os.path.basename(model_file_path))
            print('Moving: ', model_file_path, ' to: ', new_path)
            shutil.copyfile(model_file_path, new_path)
            learner['model_file_path'] = new_path  
            
    # Dump json with model description as a pickle (json.dump is troublesome)
    with open(os.path.join(save_dir, "mljar.model.pkl"), 'wb') as f:
        pickle.dump(automl_json, f)
        
    # Zip directory
    shutil.make_archive(save_dir, 'zip', save_dir)


# Includes parameters
def automl_from_zip(save_dir):
    
    # Unzip directory
    shutil.unpack_archive(save_dir + '.zip', save_dir, 'zip')  
    
    # Load json description of the model
    with open(os.path.join(save_dir, "mljar.model.pkl"), 'rb') as f:
        automl_json = pickle.load(f)

    # Replace pointers in the json
    for i, model in enumerate(automl_json['best_model']['models']):
        framework_file_path = model['model']['framework_file_path']
        new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
        model['model']['framework_file_path'] = new_path
        learners = model['model']['learners']
        for learner in learners:
            model_file_path = learner['model_file_path']
            new_path = os.path.join(save_dir, os.path.basename(model_file_path))
            learner['model_file_path'] = new_path
    
    # Setup the model
    automl = AutoML()
    automl.from_json(automl_json)
    return automl

# Your autoML training 
automl = AutoML()
automl.fit(train_x, train_y)

automl_to_zip(automl, 'my_model')  # Save your trained model to zip file
automl2 = automl_from_zip('my_model')  # Load your saved model from zip file

# Predictions from automl and automl2 will be the same.
test_preds = automl.predict(test_x)
test_preds2 = automl2.predict(test_x)
check_same = test_preds == test_preds2
check_same.head()

from mljar-supervised.

pplonski commented on May 18, 2024

@TheAccountant777 why do you need to call to_json()?

If you would like to display AutoML in notebook, please try report() - details https://mljar.com/blog/automl-notebook/

If you run AutoML in the python script, then all details are save in the results_path - its name should be displayed at training start.

from mljar-supervised.

Matthew-J-Payne commented on May 18, 2024

@pplonski Fantastic library you've created!
Is there a way to preserve xgb model attributes when mljar saves xgb?
For example, I get the following error:

# load automl from previous session
clf = AutoML(results_path = "previous_autoML")
print("automl loaded")

# load dataframe which model was trained on
x_train, x_test, y_train, y_test = train_test_split(
    df.drop(["class"], axis = 1), df["class"], train_size = 0.75
)

# predictions and classification report run fine
predictions = clf.predict(x_test)
print(classification_report(y_test, predictions))

pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
                                                X = x_test,
                                                features = ["feature_name"],
                                                target = "target_name",
                                                random_state = 0)

File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
    760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
    761 
...
    369     raise ValueError("Multiclass-multioutput estimators are not supported")
    371 # Use check_array only on lists and other non-array-likes / sparse. Do not
    372 # convert DataFrame into a NumPy array.

AttributeError: 'XGBClassifier' object has no attribute 'classes_'

I know autoML produces SHAP decision plots but I'd like to make PDPs.

I'm accessing the saved autoML directory because I don't know of a way to access a model after autoML has created them (i.e. whilst saved as a variable in the current session).

from mljar-supervised.

pplonski commented on May 18, 2024

@Matthew-J-Payne the AutoML() class objects should have sklearn API. If PDP works with sklearn models then it should work. If there are some missing attributes you can add them before calling a method, for example:

clf.classes_ = 3
# and then call PDP

from mljar-supervised.

Matthew-J-Payne commented on May 18, 2024

@pplonski thanks for the reply!

Unfortunately, when I call:


clf.classes_ = 4
pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
                                                X = x_test,
                                                features = ["feature_name"],
                                                target = "target_name",
                                                random_state = 0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
    760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
    761 
...
    369     raise ValueError("Multiclass-multioutput estimators are not supported")
    371 # Use check_array only on lists and other non-array-likes / sparse. Do not
    372 # convert DataFrame into a NumPy array.

TypeError: 'int' object is not subscriptable

PDP is a scikit-learn class and so should accept automl objects.
PDP expects the object to implement predict, predict_proba or decision_function. doc link

from mljar-supervised.

pplonski commented on May 18, 2024

What if you run:

clf.predict(x_test)
# or
clf.predict_proba(x_test)

do you have errors as well?

from mljar-supervised.

pplonski commented on May 18, 2024

@Matthew-J-Payne fantastic!

from mljar-supervised.

Matthew-J-Payne commented on May 18, 2024

thanks for your help!

from mljar-supervised.

nodirmcsd commented on May 18, 2024

I understand correctly that "AutoML_X" folder contains everything for load model in production service and predict data?
a = AutoML(r'C:\Users\...\AutoML_6')
This is right way to load model in the REST service?

from mljar-supervised.

pplonski commented on May 18, 2024

Thats right.

from mljar-supervised.

Saving mljar automl model for future use about mljar-supervised HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent