Comments (16)
@vivek2319 I've redesigned the AutoML
class. It always saves the models and all artifacts, like metrics, learning curves to hard drive. There is a change in the AutoML
constructor, the new argument is results_path
where you define the name of the directory where all results will be saved. If the results_path
exists the AutoML
will try to load models from it. The example:
Train the model and set the path to AutoML_Results_1
automl = AutoML(results_path="AutoML_Results_1")
This will create the AutoML_Results_1
directory and will save all models there.
If you create the new AutoML
variable (or run the script again) and set AutoML_Results_1
directory in results_path
it will try to load models from this path.
from mljar-supervised.
This snippet works for me:
import pickle
...
MODEL_PKL = 'automl_b.pkl'
...
automl = AutoML(...)
automl.fit(...)
#saving
with open(MODEL_PKL, 'wb') as f:
pickle.dump(automl.to_json(), f)
#loading
with open(MODEL_PKL, 'rb') as f:
model_json = pickle.load(f)
automl = AutoML()
automl.from_json(model_json)
from mljar-supervised.
When training on one system and wanting to load the model from_json onto another system, the load fails because it's trying to find something saved in a /tmp/ directory on the training system:
Traceback (most recent call last):
File "mljar_pipeline.py", line 12, in
mljar_pipeline.from_json(model_json)
File "/usr/local/lib/python3.6/site-packages/supervised/automl.py", line 306, in from_json
self._best_model.from_json(json_data["best_model"])
File "/usr/local/lib/python3.6/site-packages/supervised/models/ensemble.py", line 160, in from_json
il.from_json(model)
File "/usr/local/lib/python3.6/site-packages/supervised/iterative_learner_framework.py", line 157, in from_json
with zipfile.ZipFile(json_desc.get("framework_file_path"), "r") as zip_ref:
File "/usr/local/lib/python3.6/zipfile.py", line 1090, in init
self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/bec1c3cd-d29e-44ce-98d7-b5ec90cf14ea.framework'
Anyway to workaround this issue? (Pickle, Dill, Joblib and Cloudpickle all run into errors when trying to save the model directly.)
For reference, here is a snippet of the much longer json output that references the tmp directory in framework_file_path
:
{'model': {'algorithm_short_name': 'RF',
'framework_file': 'eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'framework_file_path': '/tmp/eec09600-319d-4f03-92c2-775776d4c5d5.framework',
'learners': [{'algorithm_name': 'Random Forest',
'algorithm_short_name': 'RF',
'library_version': '0.21.3',
'model_file': '9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'model_file_path': '/tmp/9841f28d-d7a8-4a1e-906f-b7158a0700fa.rf.model',
'params': {'criterion': 'entropy',
'max_features': 0.7,
'min_samples_leaf': 18,
'min_samples_split': 20,
'model_type': 'RF',
'seed': 9},
'uid': '9841f28d-d7a8-4a1e-906f-b7158a0700fa'},
from mljar-supervised.
After training the model, i call model.to_json() i get the following error:
'ModelFramework' object has no attribute 'to_json'
Heres the code :
automl = AutoML(mode="Perform")
model = automl.fit(X_train, y_train)
model.to_json()
anything changed ?
from mljar-supervised.
@pplonski I got it to work! Your clf.classes_ = 3
prompt got me thinking. Instead of specifying the number of classes, I specified the class names :
classes = ["target1", "target2, "target3", "target4"]
clf.classes_ = classes
PDP runs fine after that.
(Predict methods executed happily, too.)
from mljar-supervised.
Hey!
Thank you for reporting this. It is a bug in the package while converting AutoML objects into JSON string. I'm working on it. It will be fixed with a new release.
from mljar-supervised.
Building on the answer from @mglowacki100. To dump your model into a single (zip) file you can take with you wherever you go, you may use this:
import os
import pickle
import shutil
from supervised.automl import AutoML
# Includes parameters
def automl_to_zip(automl, save_dir=None):
automl_json = automl.to_json()
# Create the directory where tmp files will be copied.
if save_dir is None:
uid = automl_json['best_model']['uid']
save_dir = 'model_' + uid
os.mkdir(save_dir)
# Copy file to new dir an replace their pointers in the json
for i, model in enumerate(automl_json['best_model']['models']):
framework_file_path = model['model']['framework_file_path']
new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
print('Moving: ', framework_file_path, ' to: ', new_path)
shutil.copyfile(framework_file_path, new_path)
model['model']['framework_file_path'] = new_path
learners = model['model']['learners']
for learner in learners:
model_file_path = learner['model_file_path']
new_path = os.path.join(save_dir, os.path.basename(model_file_path))
print('Moving: ', model_file_path, ' to: ', new_path)
shutil.copyfile(model_file_path, new_path)
learner['model_file_path'] = new_path
# Dump json with model description as a pickle (json.dump is troublesome)
with open(os.path.join(save_dir, "mljar.model.pkl"), 'wb') as f:
pickle.dump(automl_json, f)
# Zip directory
shutil.make_archive(save_dir, 'zip', save_dir)
# Includes parameters
def automl_from_zip(save_dir):
# Unzip directory
shutil.unpack_archive(save_dir + '.zip', save_dir, 'zip')
# Load json description of the model
with open(os.path.join(save_dir, "mljar.model.pkl"), 'rb') as f:
automl_json = pickle.load(f)
# Replace pointers in the json
for i, model in enumerate(automl_json['best_model']['models']):
framework_file_path = model['model']['framework_file_path']
new_path = os.path.join(save_dir, os.path.basename(framework_file_path))
model['model']['framework_file_path'] = new_path
learners = model['model']['learners']
for learner in learners:
model_file_path = learner['model_file_path']
new_path = os.path.join(save_dir, os.path.basename(model_file_path))
learner['model_file_path'] = new_path
# Setup the model
automl = AutoML()
automl.from_json(automl_json)
return automl
# Your autoML training
automl = AutoML()
automl.fit(train_x, train_y)
automl_to_zip(automl, 'my_model') # Save your trained model to zip file
automl2 = automl_from_zip('my_model') # Load your saved model from zip file
# Predictions from automl and automl2 will be the same.
test_preds = automl.predict(test_x)
test_preds2 = automl2.predict(test_x)
check_same = test_preds == test_preds2
check_same.head()
from mljar-supervised.
@TheAccountant777 why do you need to call to_json()
?
If you would like to display AutoML in notebook, please try report()
- details https://mljar.com/blog/automl-notebook/
If you run AutoML in the python script, then all details are save in the results_path
- its name should be displayed at training start.
from mljar-supervised.
@pplonski Fantastic library you've created!
Is there a way to preserve xgb model attributes when mljar saves xgb?
For example, I get the following error:
# load automl from previous session
clf = AutoML(results_path = "previous_autoML")
print("automl loaded")
# load dataframe which model was trained on
x_train, x_test, y_train, y_test = train_test_split(
df.drop(["class"], axis = 1), df["class"], train_size = 0.75
)
# predictions and classification report run fine
predictions = clf.predict(x_test)
print(classification_report(y_test, predictions))
pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
X = x_test,
features = ["feature_name"],
target = "target_name",
random_state = 0)
File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
761
...
369 raise ValueError("Multiclass-multioutput estimators are not supported")
371 # Use check_array only on lists and other non-array-likes / sparse. Do not
372 # convert DataFrame into a NumPy array.
AttributeError: 'XGBClassifier' object has no attribute 'classes_'
I know autoML produces SHAP decision plots but I'd like to make PDPs.
I'm accessing the saved autoML directory because I don't know of a way to access a model after autoML has created them (i.e. whilst saved as a variable in the current session).
from mljar-supervised.
@Matthew-J-Payne the AutoML()
class objects should have sklearn API. If PDP works with sklearn models then it should work. If there are some missing attributes you can add them before calling a method, for example:
clf.classes_ = 3
# and then call PDP
from mljar-supervised.
@pplonski thanks for the reply!
Unfortunately, when I call:
clf.classes_ = 4
pdp = PartialDependenceDisplay.from_estimator(estimator = clf,
X = x_test,
features = ["feature_name"],
target = "target_name",
random_state = 0)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File c:\Users\matth\anaconda3\envs\deforestationdynamics\lib\site-packages\sklearn\inspection\_plot\partial_dependence.py:989, in PartialDependenceDisplay.from_estimator(cls, estimator, X, features, feature_names, target, response_method, n_cols, grid_resolution, percentiles, method, n_jobs, verbose, line_kw, ice_lines_kw, pd_line_kw, contour_kw, ax, kind, centered, subsample, random_state)
760 """Partial dependence (PD) and individual conditional expectation (ICE) plots.
761
...
369 raise ValueError("Multiclass-multioutput estimators are not supported")
371 # Use check_array only on lists and other non-array-likes / sparse. Do not
372 # convert DataFrame into a NumPy array.
TypeError: 'int' object is not subscriptable
PDP is a scikit-learn class and so should accept automl objects.
PDP expects the object to implement predict, predict_proba or decision_function. doc link
from mljar-supervised.
What if you run:
clf.predict(x_test)
# or
clf.predict_proba(x_test)
do you have errors as well?
from mljar-supervised.
@Matthew-J-Payne fantastic!
from mljar-supervised.
thanks for your help!
from mljar-supervised.
I understand correctly that "AutoML_X" folder contains everything for load model in production service and predict data?
a = AutoML(r'C:\Users\...\AutoML_6')
This is right way to load model in the REST service?
from mljar-supervised.
Thats right.
from mljar-supervised.
Related Issues (20)
- ImportError: cannot import name 'interp' from 'scipy' HOT 7
- Warning from pandas HOT 2
- sklearn/metrics/_scorer.py:548: FutureWarning HOT 2
- Get confidence scores for regression predictions HOT 1
- FutureWarning: The `needs_threshold` and `needs_proba` parameter. HOT 1
- What's the parameter sample_weight used for? HOT 1
- trained error HOT 1
- problem run in colab HOT 2
- UserWarning: The y_pred values do not sum to one. Starting from 1.5 thiswill result in an error.
- report() not working in JupyterLab HOT 1
- Functionality to retrain or continue training models using the library.
- problem with automl._best_model() HOT 1
- Please document all preprocessing methods HOT 4
- Links to models are not working in report
- view individual CV metrics or CV metric AUC mean and standard deviations HOT 2
- Google colab - Feature selection not working HOT 7
- Fix issues from AutoML benchmark
- X has feature names, but StandardScaler was fitted without feature names
- 'module' object is not callable HOT 3
- Feature names unseen at fit time HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mljar-supervised.