nok / sklearn-porter Goto Github PK
View Code? Open in Web Editor NEWTranspile trained scikit-learn estimators to C, Java, JavaScript and others.
License: BSD 3-Clause "New" or "Revised" License
Transpile trained scikit-learn estimators to C, Java, JavaScript and others.
License: BSD 3-Clause "New" or "Revised" License
I fount that N_vector size of the exported C is not the same with the size of the training sample.
Method:
I use the sample code on https://github.com/nok/sklearn-porter/blob/stable/examples/estimator/classifier/SVC/c/basics.pct.ipynb
to export the C code.
I split the training and test set of 90%:10% by the following code:
from sklearn.model_selection import train_test_split
irisdata = load_iris()
X=irisdata.data
y=irisdata.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=True)
print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)
Output:
(135, 4) (135,)(15, 4) (15,)
Then I train the model:
clf = svm.SVC(C=1.0, gamma = 0.001, kernel = 'rbf', random_state = 0)
clf.fit(X_train,y_train)
Finally I exportthe code:
porter = Porter(clf, language = 'c')
output = porter.export()
print(output)
But I got:
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define N_FEATURES 4
#define N_CLASSES 3
#define N_VECTORS 132
#define N_ROWS 3
#define N_COEFFICIENTS 2
#define N_INTERCEPTS 3
#define KERNEL_TYPE 'r'
#define KERNEL_GAMMA 0.001
#define KERNEL_COEF 0.0
#define KERNEL_DEGREE 3
double vectors[132][4] = {{4.4, 3.2, 1.3, 0.2}, {5.4, 3.4, 1.5, 0.4}, {5.0, 3.2, 1.2, 0.2}, {5.0, 3.5, 1.3, 0.3}, {5.5, 4.2, 1.4, 0.2}, {5.1, 3.8, 1.5, 0.3}, {5.3, 3.7, 1.5, 0.2}, {5.2, 3.4, 1.4, 0.2}, {5.1, 3.5, 1.4, 0.3}, {5.7, 3.8, 1.7, 0.3}, {5.0, 3.6, 1.4, 0.2}, {4.8, 3.0, 1.4, 0.3}, {5.1, 3.4, 1.5, 0.2}, {5.5, 3.5, 1.3, 0.2}, {4.8, 3.4, 1.6, 0.2}, {4.8, 3.0, 1.4, 0.1}, {4.7, 3.2, 1.3, 0.2}, {4.6, 3.4, 1.4, 0.3}, {5.1, 3.8, 1.6, 0.2}, {5.4, 3.7, 1.5, 0.2}, {4.9, 3.1, 1.5, 0.2}, {5.2, 4.1, 1.5, 0.1}, {4.4, 3.0, 1.3, 0.2}, {5.2, 3.5, 1.5, 0.2}, {5.1, 3.3, 1.7, 0.5}, {4.9, 3.1, 1.5, 0.1}, {5.7, 4.4, 1.5, 0.4}, {4.5, 2.3, 1.3, 0.3}, {5.0, 3.4, 1.6, 0.4}, {5.0, 3.5, 1.6, 0.6}, ...
......
The
N_VECTORS is 132 instead of 135.
I tried other split ratios and the following are some examples:
Training test ratio | training size | Exported N_VECTORS |
---|---|---|
0.5 | 75 | 75 |
0.4 | 90 | 89 |
0.3 | 105 | 100 |
0.2 | 120 | 113 |
0.1 | 135 | 132 |
0.05 | 142 | 141 |
0 | 150 | 150 |
Original issue found by @Phyks in #18 (comment):
ValueError: Currently the given model 'OneVsRestClassifier(estimator=LinearSVC(C=1.0, class_weight=None, dual=True, fit_intercept=True,
intercept_scaling=1, loss='squared_hinge', max_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0),
n_jobs=1)' isn't supported.
This issue requires some refactorings of (maybe) all templates. The goal is using object oriented instances the right way, not static methods. Further a solution should be found how we can use templates for the non-object-oriented programming languages.
Hi,
I am trying to run the command:
python -m sklearn_porter -i estimator.pkl --js
as instructed on the github readme, with a sklearn random forest classifier that I saved into estimator.pkl as instructed. I am using Python 3.6 from Anaconda on a Ubuntu 16.04 LTS.
But it fails with following error:
Traceback (most recent call last):
File "/home/user/anaconda3/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/home/user/anaconda3/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/anaconda3/lib/python3.6/site-packages/sklearn_porter/main.py", line 153, in
main()
File "/home/user/anaconda3/lib/python3.6/site-packages/sklearn_porter/main.py", line 105, in main
estimator = joblib.load(input_path)
File "/home/user/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 578, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/user/anaconda3/lib/python3.6/site-packages/sklearn/externals/joblib/numpy_pickle.py", line 508, in _unpickle
obj = unpickler.load()
File "/home/user/anaconda3/lib/python3.6/pickle.py", line 1050, in load
dispatchkey[0]
KeyError: 239
Just wondering if there is an ETA for a JS implementation for RandomForestRegressor?
Thanks!
ValueError: Currently the given estimator 'MultiOutputClassifier(estimator=MLPClassifier(activation='relu', alpha=0.0001, batch_size='auto', beta_1=0.9,
beta_2=0.999, early_stopping=False, epsilon=1e-08,
hidden_layer_sizes=1, learning_rate='constant',
learning_rate_init=0.001, max_iter=1, momentum=0.9,
nesterovs_momentum=True, power_t=0.5, random_state=None,
shuffle=True, solver='adam', tol=0.0001, validation_fraction=0.1,
verbose=False, warm_start=False),
n_jobs=1)' isn't supported.
C code exported by porter has wrong data type for feature value as double which will cause accuracy percentage.
scikit-learn code
def predict(self, X, check_input=True):
"""Predict class or regression value for X.
For a classification model, the predicted class for each sample in X is
returned. For a regression model, the predicted value based on X is
returned.
Parameters
----------
X : array-like or sparse matrix of shape = [n_samples, n_features]
The input samples. Internally, it will be converted to
``dtype=np.float32`` and if a sparse matrix is provided
to a sparse ``csr_matrix``.
check_input : boolean, (default=True)
Allow to bypass several input checking.
Don't use this parameter unless you know what you do.
Returns
-------
y : array of shape = [n_samples] or [n_samples, n_outputs]
The predicted classes, or the predict values.
"""
porter C Code:
int main(int argc, const char * argv[]) {{
/* Features: */
double features[argc-1];
int i;
for (i = 1; i < argc; i++) {{
features[i-1] = atof(argv[i]);
}}
/* Prediction: */
printf("%d", {method_name}(features, 0));
return 0;
}}
I am using the wrapper of scikit-learn Multilayer Perceptron in Python https://github.com/aigamedev/scikit-neuralnetwork to train the neural network and save it to a file. Now, I want to expose it on production to predict in real time. So, I was thinking to use Java for better concurrency than Python. Hence, my question is whether can we read the model using this library written using Python or above wrapper? The code below I am using for training the model and last three lines I want to port to Java to expose it on production
import pickle
import numpy as np
import pandas as pd
from sknn.mlp import Classifier, Layer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
f = open("TrainLSDataset.csv")
data = np.loadtxt(f,delimiter = ',')
x = data[:, 1:]
y = data[:, 0]
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.3)
nn = Classifier(
layers=[
Layer("Rectifier", units=5),
Layer("Softmax")],
learning_rate=0.001,
n_iter=100)
nn.fit(X_train, y_train)
filename = 'finalized_model.txt'
pickle.dump(nn, open(filename, 'wb'))
**Below code i want to write in Java/GoLang for exposing it on Production** :
loaded_model = pickle.load(open(filename, 'rb'))
result = loaded_model.score(X_test, y_test)
y_pred = loaded_model.predict(X_test)
I am wondering if it’s possible to add Golang support for SVM? Seems like Golang support is still limited.
I have created a RandomForestclassifier
in Python using sklearn
. Now I convert the code to C using sklearn-porter
. In around 10-20% of the cases the prediction of the transpiled code is wrong.
I figured that the problem occurs when specifying max_depth
.
Here's some code to reproduce the issue:
import numpy as np
import sklearn_porter
from sklearn.ensemble import RandomForestClassifier
train_x = np.random.rand(1000, 8)
train_y = np.random.randint(0, 4, 1000)
# when using max_depth='auto', the problem does not occur
rfc = RandomForestClassifier(n_estimators=10)
rfc.fit(train_x, train_y)
porter = sklearn_porter.Porter(rfc, language='c')
print(porter.integrity_score(train_x)) # 1.0
# now using max_depth=10 the integrity
rfc = RandomForestClassifier(n_estimators=10, max_depth=10)
rfc.fit(train_x, train_y)
porter = sklearn_porter.Porter(rfc, language='c')
print(porter.integrity_score(train_x)) # 0.829
I also saw that Python is performing calculations with double while the C code seems to use float, might that be an issue? (changing float -> double did not change anything unfortunately).
train_X, val_X, train_y, val_y = train_test_split(X, y, random_state=1, train_size=0.0001)
clf = DecisionTreeClassifier()
clf.fit(train_X, train_y)
porter = Porter(clf, language='java')
output = porter.export(embed_data=True)
print(output)
fails with bigger train sizespython3.7/site-packages/sklearn_porter/estimator/classifier/DecisionTreeClassifier/init.py", line 308, in create_branches
out += temp.format(features[node], '<=', self.repr(threshold[node]))
IndexError: list index out of range
I am trying to use the Sklearn Porter to transform my multilabel randomforest Classifier into Javascript. But the transformed Classifier doesn't predict multiple label.
Does the Sklearn Porter support multilabel prediction? If yes, could you please provide a small example of the implementation?
There are several compile errors when transpiling random forrests to java:
int[] classes = new int[[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]];
should beint[] classes = new int[] { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 };
.for (int i = 1; i < [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]; i++)
should befor (int i = 1; i < classes.length; i++)
.int n_classes = [2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]; int[] classes = new int[n_classes];
should beint[] classes = new int[] { 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2 }; int n_classes = classes.length;
Maybe there are other errors too, because the transpiled random forrest does not produce the same result as python.
Sorry to bother you again, but when attempting to run:
python3 -m sklearn_porter -i model_notokenizer.pkl -l java
I get:
Traceback (most recent call last):
File "/usr/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.5/dist-packages/sklearn_porter/__main__.py", line 71, in <module>
main()
File "/usr/local/lib/python3.5/dist-packages/sklearn_porter/__main__.py", line 49, in main
porter = Porter(model, language=language)
File "/usr/local/lib/python3.5/dist-packages/sklearn_porter/Porter.py", line 65, in __init__
raise ValueError(error)
ValueError: The given model 'Pipeline(memory=None,
steps=[('vect', TfidfVectorizer(analyzer='word', binary=False, decode_error='strict',
dtype=<class 'numpy.int64'>, encoding='utf-8', input='content',
lowercase=True, max_df=0.5, max_features=None, min_df=0.001,
ngram_range=(1, 1), norm='l2', preprocessor=None, smooth_idf=True...ax_iter=1000,
multi_class='ovr', penalty='l2', random_state=None, tol=0.0001,
verbose=0))])' isn't supported.
I'm running python 3.5.2, numpy 1.13.1, and sklearn 0.19.0.
Firstly, I wold like to thank the authors of the library, it is really useful.
Most of Java Algebra libraries are based on 1D primitive arrays (probably other languages too) instead of 2D (it is easy to map one to another and the algorithms in 1D are simpler to write). One option is to create a new 1D array and copy the data from the 2D, but it is not a desired approach. Then, I suggest that you provide a way to save the data as a 1D primitive array (more especially a 1D column array). I started doing this in a copy of the repository, but I guess you can do it in a future release.
I have an observation about the SVC template (I guess it should be in another place). When you save a model that has two classes, I guess the use of starts and end arrays are redundant, because coefficients is an ordered array (in the sense that all coefficients of the class zero are before any coefficient of the class one). It means you could change:
...
if (this.clf.nClasses == 2) {
for (int i = 0; i < kernels.length; i++) {
kernels[i] = -kernels[i];
}
double decision = 0.;
for (int k = starts[1]; k < ends[1]; k++) {
decision += kernels[k] * this.clf.coefficients[0][k];
}
for (int k = starts[0]; k < ends[0]; k++) {
decision += kernels[k] * this.clf.coefficients[0][k];
}
decision += this.clf.intercepts[0];
if (decision > 0) {
return 0;
}
return 1;
}
...
to:
...
if (this.clf.nClasses == 2) {
for (int i = 0; i < kernels.length; i++) {
kernels[i] = -kernels[i];
}
double decision = 0.;
for (int k = 0; k < clf.coefficients[0].length; k++) {
decision += kernels[k] * this.clf.coefficients[0][k];
}
decision += this.clf.intercepts[0];
if (decision > 0) {
return 0;
}
return 1;
}
...
I guess you could improve the case of more then two classes too, merging the structures decisions, votes and amounts.
Best Regards,
Charles
Hello
I'm using anaconda3 and I want to install sklearn-porter
What will be the best way to do this?
conda install doesn't work:
okoob$ conda install sklearn-porter
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
- sklearn-porter
Current channels:
- https://repo.anaconda.com/pkgs/main/osx-64
- https://repo.anaconda.com/pkgs/main/noarch
- https://repo.anaconda.com/pkgs/free/osx-64
- https://repo.anaconda.com/pkgs/free/noarch
- https://repo.anaconda.com/pkgs/r/osx-64
- https://repo.anaconda.com/pkgs/r/noarch
- https://repo.anaconda.com/pkgs/pro/osx-64
- https://repo.anaconda.com/pkgs/pro/noarch
ERROR: Command "python setup.py egg_info" failed with error code 1 in
aohf\AppData\Local\Temp\pip-install-kzqmrh2h\sklearn-porter\
I am unable to install sklearn-porter for python3 through pip.
Collecting sklearn-porter
Using cached sklearn-porter-0.5.0.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-bdmsnkqx/sklearn-porter/setup.py", line 18, in <module>
with open(requirements_path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/pip-build-bdmsnkqx/sklearn-porter/requirements.txt'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-bdmsnkqx/sklearn-porter/
I am running python 3.6 on Arch Linux.
when i try to export "porter = Porter(model, language='java', method= 'predict_proba')" from SVC model it returns "Currently the chosen model method 'predict_proba' isn't supported." is there any solution to get the probability of the predicted class ?!
Attempted to port a somewhat large random forest classifier (7.2 MB) for Java and compiling the Java class ended up giving a "too many constants" error, because of the number of hardcoded values to compose the tree. I circumvented this by using a simple script to separate out all (static) methods into individual classes and files. Is there a cleaner way internally to get around this problem or achieve this effect?
I have trained a multi-label DecisionTreeClassifier and when I ported the result is the following:
public static int predict(double[] features) {
int[] classes = new int[2];
if (features[11] <= 12.5) {
if (features[10] <= 182.5) {
if (features[12] <= 72.5) {
if (features[13] <= 63.0) {
if (features[7] <= 767.5) {
classes[0] = 20;
classes[1] = 5;
// Here the result shoud be:
//classes[0][0] = 20; classes[0][1] = 5; classes[1][0] = 25; classes[1][1] = 0; And so on...
}else{
//Huge amount of ifs
}
}
}
}
}
The full decision tree is here:
I really appreciate this feature in your porter. =]
Additionally, if this feature is already present in the code, I haven't figured out how to use it.
Thank you.
For text mining it's important to fit also a CountVectorizer (or a TFIDFTransformer), so should be possible to export it in the targhet lenguage
Could you please add support for Multinomial Naive Bayes? It's performance on text classification makes it a very desirable target for porting.
Darius, are you fine with the idea to support C#? If so, I will go ahead whenever I have free time. I might also contribute on other parts when I'm done.
Thanks for letting me know, best, Balint
Hi -- tried installing and running the Go example, but I'm getting this error:
$ cd sklearn-porter/examples/classifier/LinearSVC/go
$ python basics.py
Traceback (most recent call last):
File "basics.py", line 12, in <module>
model = Porter(language='go').port(clf)
File "/Users/bjohnson/anaconda/lib/python2.7/site-packages/sklearn_porter-0.2.0-py2.7.egg/sklearn_porter/__init__.py", line 56, in port
ported_model = instance.port(model)
File "/Users/bjohnson/anaconda/lib/python2.7/site-packages/sklearn_porter-0.2.0-py2.7.egg/sklearn_porter/classifier/LinearSVC/__init__.py", line 69, in port
return self.predict()
File "/Users/bjohnson/anaconda/lib/python2.7/site-packages/sklearn_porter-0.2.0-py2.7.egg/sklearn_porter/classifier/LinearSVC/__init__.py", line 81, in predict
return self.create_class(self.create_method())
File "/Users/bjohnson/anaconda/lib/python2.7/site-packages/sklearn_porter-0.2.0-py2.7.egg/sklearn_porter/classifier/LinearSVC/__init__.py", line 111, in create_method
return self.temp('method', indentation=1, skipping=True).format(
File "/Users/bjohnson/anaconda/lib/python2.7/site-packages/sklearn_porter-0.2.0-py2.7.egg/sklearn_porter/classifier/__init__.py", line 114, in temp
raise AttributeError('Template \'%s\' not found.' % (name))
AttributeError: Template 'method' not found.
Any thoughts? Thanks!
Currently i am generating my java file with training "uniform" weights but i want to generate java code with the "distance" weights.
When i am trying to generate it with the "distance" weight it generates NotImplementedError.
If i have to implement it on my own, How can i implement ?.
Any idea can help me. :)
Hey guys , there's a little bug —— "ValueError: ("The classifier doesn't support the given base estimator %s.", None)" . It seems that the default base estimator for adaboost is DecisionTree but the context code here is "if not isinstance(estimator.base_estimator, DecisionTreeClassifier):". So , please use clf.base_estimator_ (the variable ending with _ is automatically generated by code not given by user ).
Hi, I started using your code to port a random forest estimator, first off I can't call the porter.integrity_score() function cause I get the following error:
Traceback (most recent call last):
File "C:/Python Project/Euler.py", line 63, in <module>
accuracy = porter.integrity_score(test_X)
File "C:\Python\lib\site-packages\sklearn_porter\Porter.py", line 440, in integrity_score
keep_tmp_dir=True, num_format=num_format)
File "C:\Python\lib\site-packages\sklearn_porter\Porter.py", line 342, in predict
self._test_dependencies()
File "C:\Python\lib\site-packages\sklearn_porter\Porter.py", line 454, in _test_dependencies
raise EnvironmentError(error)
OSError: The required dependencies aren't available on Windows.
So I can't check the accuracy in python, and when I used the java code in eclipse it gives me very bad accuracy, the original scikit model gave me about 69% accuracy whereas the accuracy from the java code is less than 10%.
I need the code for an important project, would really appreciate some help on this.
I used code tag but I don't know why it doesn't show end line propriety
There are 2 bugs in code of Java. They are the same type.
double[] decisions = new double[13];
for (int i = 0, d = 0, l = 13; i < l; i++) {
for (int j = i + 1; j < l; j++) {
double tmp1 = 0., tmp2 = 0.;
for (int k = starts[j]; k < ends[j]; k++) {
tmp1 += kernels[k] * coeffs[i][k];
}
for (int k = starts[i]; k < ends[i]; k++) {
tmp2 += kernels[k] * coeffs[j - 1][k];
}
System.out.println("d=" + d);
decisions[d] = tmp1 + tmp2 + inters[d++];
}
}
In my understanding, the second loop will run with d and l are always initialized to 0 and 13 correspondingly. But actually it won't. This code raise exception ArrayIndexOutOfBoundsException.
I suggest that you need to add a line: d = 0;
right after the first for
statement (l=13;
doesn't need because it is a constant here).
The same bug with this code:
int[] votes = new int[13];
for (int i = 0, d = 0, l = 13; i < l; i++) {
for (int j = i + 1; j < l; j++) {
votes[d] = decisions[d++] > 0 ? i : j;
}
}
I installed this by python setup.py install
I used python 3
$ python -m sklearn_porter --input cl.pkl --language java
Traceback (most recent call last):
File "/home/rem/anaconda3/lib/python3.5/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "/home/rem/anaconda3/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/rem/anaconda3/lib/python3.5/site-packages/sklearn_porter-0.2.0-py3.5.egg/sklearn_porter/__main__.py", line 54, in <module>
main()
File "/home/rem/anaconda3/lib/python3.5/site-packages/sklearn_porter-0.2.0-py3.5.egg/sklearn_porter/__main__.py", line 35, in main
result = porter.port(raw_model)
File "/home/rem/anaconda3/lib/python3.5/site-packages/sklearn_porter-0.2.0-py3.5.egg/sklearn_porter/__init__.py", line 50, in port
locals(), [md_name], -1)
ValueError: level must be >= 0
Firstly, thank you for this great project.
For models involving decision tree such as decision tree, random forrest, the probability of the predictions is often as crucial as predictions themselves as it carries more infomation than simply a result.
And as far as implementation go, since the porter needs to build every leafnode, I think it's possible to export the probability of the leafnode then aggregate.
So is there any way to do that?
Any plans to support XGBoost?
I'm creating a Random Forest Classifier that features 248 inputs and 108 outputs. Based on the Boolean state of each input the 108 outputs will be on or off (They represent valves). The value of these discreet output states is what the system has learned. There are two issues I'm having with this:
The code generator only seems to create trees for one output, and I don't know which one. For each output I'd expect a separate set of trees, because the inputs remain the same, but the decision tree for each valve's state will be different.
The code for the single output generates invalid C. See below for example code fragment.
`int predict_0(float features[]) {
int classes[[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]];
if (features[181] <= 0.5) { ... }
}`
I have an svm/svc classifier trained using sparse matrix as follows:
from sklearn_porter import Porter
from sklearn import svm
# load data and train the classifier:
clf = svm.SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=1/X_train_transformed.shape[1], kernel='linear',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)
clf.fit(X_train_transformed, X_train['label'])
type(X_train_transformed)
----------------------------------------
scipy.sparse.csr.csr_matrix
The problem is that exporting fails with the errors shown bellow:
# export:
porter = Porter(clf, language='java')
output = porter.export(embed_data=False, details=False)
with open('SVC.java', 'w') as f:
f.writelines(output)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-34-e7d647ff66cd> in <module>()
1 # export:
2 porter = Porter(clf, language='java')
----> 3 output = porter.export(embed_data=False, details=False)
4 with open('SVC.java', 'w') as f:
5 f.writelines(output)
~/.conda/envs/ml/lib/python3.6/site-packages/sklearn_porter/Porter.py in export(self, class_name, method_name, num_format, details, **kwargs)
187
188 output = self.template.export(class_name=class_name,
--> 189 method_name=method_name, **kwargs)
190 if not details:
191 return output
~/.conda/envs/ml/lib/python3.6/site-packages/sklearn_porter/estimator/classifier/SVC/__init__.py in export(self, class_name, method_name, export_data, export_dir, export_filename, export_append_checksum, **kwargs)
131 self.params = params
132
--> 133 self.n_features = len(est.support_vectors_[0])
134 self.svs_rows = est.n_support_
135 self.n_svs_rows = len(est.n_support_)
~/.conda/envs/ml/lib/python3.6/site-packages/scipy/sparse/base.py in __len__(self)
264 # non-zeros is more important. For now, raise an exception!
265 def __len__(self):
--> 266 raise TypeError("sparse matrix length is ambiguous; use getnnz()"
267 " or shape[0]")
268
TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]
Attaching training data csv file where first column is target class to predict.
I generated pickle file and using sklearn-porter command line i convert pickle file to C Code and ran it.
C code returning index of classes not actual classes and Python predict() function returns actual class not index.
Attaching training csv file pickle file.
csv_and_pickle_file.zip
For SVC's, the default value of gamma is 'auto' which causes porter to crash when using exporting data is true.
>>> output = porter.export(export_data=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/Porter.py", line 189, in export method_name=method_name, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/estimator/classifier/SVC/__init__.py", line 192, in export
export_append_checksum)
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/estimator/classifier/SVC/__init__.py", line 239, in export_data
'gamma': float(self.gamma),
ValueError: could not convert string to float: auto
The solution is to check if it's equal to 'auto'; if so, then set gamma equal to 1/n_features, such as:
>>> clf.gamma
'auto'
>>> clf.gamma = 1/float(clf.support_vectors_.shape[1])
>>> output = porter.export(export_data=True) # success
Hi
the package installed with no errors using pip3.
I get the following error when importing. Using python 3.7.2 on MacOS.
Thanks
C.
$ python3
Python 3.7.2 (default, Jan 13 2019, 12:51:54)
[Clang 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
from sklearn_porter import Porter
Traceback (most recent call last):
File "", line 1, in
File "/usr/local/lib/python3.7/site-packages/sklearn_porter/init.py", line 42, in
meta = _load_meta(package)
File "/usr/local/lib/python3.7/site-packages/sklearn_porter/init.py", line 26, in _load_meta
reqs = open(req_path, 'r').read().strip().split('\n')
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/lib/python3.7/site-packages/sklearn_porter/../requirements.txt'
I am figuring out which is the best machine learning approach to use for a pedometer project. So far I have gyro and accelerometer data of walking and no walking. When I train and test a Naive Bayes model in my machine I get nearly 70 of accuracy. However, when I port to java and add it to my android app and start using the implementation it is just predicting the same label. Several question arise from this: why is this happening?... Do I need to use an online learning algorithm for this scenario?, the balance of my classes is wrong?
I've built a very simple single feature RandomForestClassifier:
from sklearn.ensemble import RandomForestClassifier
import numpy as np
from sklearn_porter import Porter
rf = RandomForestClassifier()
features = [[i] for i in xrange(0, 10)]
labels = [i > 5 for i in xrange(0, 10)]
rf.fit(features, labels)
for feature in xrange(-20, 20):
print feature, '->', rf.predict(np.array([feature]).reshape(1, -1))
result = Porter(language='java').port(rf)
print result
which gives the following stack trace:
Traceback (most recent call last):
File "generateModel.py", line 21, in <module>
result = Porter(language='java').port(rf)
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/__init__.py", line 72, in port
ported_model = instance.port(model)
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/classifier/RandomForestClassifier/__init__.py", line 84, in port
return self.predict()
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/classifier/RandomForestClassifier/__init__.py", line 95, in predict
return self.create_class(self.create_method())
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/classifier/RandomForestClassifier/__init__.py", line 198, in create_method
tree = self.create_single_method(idx, model)
File "/usr/local/lib/python2.7/dist-packages/sklearn_porter/classifier/RandomForestClassifier/__init__.py", line 162, in create_single_method
indices.append([str(j) for j in range(model.n_features_)][i])
IndexError: list index out of range
The line in question involves indexing into the feature vector, but sometimes the index is negative, which is fine except when it wraps around the list twice. In this case, model.n_features_
is 1 but i
(the index) is -2, giving the list out of range exception. What is the best solution for this? Would simply taking the modulus of the index by the length of list be correct?
Thanks!
Example: I would like to use float constants.
porter = Porter(clf, language='c')
output = porter.export(embed_data=True, num_format=lambda o: str(o) + 'f' )
C- Output
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int predict_0(float features[]) {
int classes[3];
if (features[3f] <= 0.800000011920929f) {
classes[0] = 49;
classes[1] = 0;
classes[2] = 0;
} else {
if (features[3f] <= 1.75f) {
I've noticed that the integrity score of the JavaScript ported model for extraTrees is around .86 (after sampling a few thousand random inputs). What would be some possible reasons for the cause of such a large this divergence? The extraTrees have 3 estimators with depth 2.
Hi,
I know sklearn-porter doesn't support nu-SVCs, but those are mathematically equivalent to svm.SVC models (see http://scikit-learn.org/stable/modules/svm.html#nusvc).
I was wondering if there was a workaround for this ?
Thank you
Currently I am trying to export a model from sklearn to Android. For this I use the library sklearn-porter.
I have asked the same question on stackoverflow.
question
This generates a Java class from the trained model, which looks like the following:
class DecisionTreeClassifier {
public static int predict(double[] features) {
int[] classes = new int[2];
if (features[350] <= 0.5156863033771515) {
if (features[568] <= 0.0019607844296842813) {
if (features[430] <= 0.0019607844296842813) {
if (features[405] <= 0.009803921915590763) {
...
}
This file has a size of about 1 MB and thus the error "Code too large" occurs in Android Studio.
Is there a solution for this problem?
The command: pip install sklearn-porter
produces the following:
Collecting sklearn-porter
Using cached sklearn-porter-0.3.2.tar.gz
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-build-2k6e9qlh/sklearn-porter/setup.py", line 6, in <module>
from sklearn_porter import Porter
File "/tmp/pip-build-2k6e9qlh/sklearn-porter/sklearn_porter/__init__.py", line 3, in <module>
from Porter import Porter
ImportError: No module named 'Porter'
----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-2k6e9qlh/sklearn-porter/
Which algorithms can be used for streams of data? (e.g. passive aggressive, perceptron )?
Hi @nok, the libsvm implementation seems to be using subtraction while the sklearn-porter's JavaScript predict method is using addition in the same place. I'm guessing both are the same if the intercepts are having opposite sign, but, I'm not sure. Could you please shed some light on this?
Hi,
This is a great tool and I've been looking for a while.
A classifier such as GradientBoostingClassifier is needed, we have to write to you for a feature request. Actually, I don't mind implementing such one and giving it back to this repo. I checked the document and did not find any article about how to implement a classifier in sklearn.
It would be extremely helpful for this project and other users like me.
Thanks.
Features is accessed inside the function predict. The scope of variable features is within main function. It should be either a global variable or passes as function parameter.
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
int predict(float atts[2]) {
int classes[2];
if (features[0] <= 5.43762493134) {
if (features[1] <= 5.74491977692) {
if (features[0] <= 3.51197504997) {
classes[0] = 10;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 1;
}
} else {
if (features[1] <= 16.6829204559) {
if (features[0] <= 2.67515516281) {
if (features[1] <= 11.5629148483) {
if (features[1] <= 7.29798984528) {
if (features[1] <= 6.13995504379) {
classes[0] = 1;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 3;
}
} else {
if (features[0] <= 1.60292005539) {
if (features[1] <= 8.0366601944) {
classes[0] = 3;
classes[1] = 0;
} else {
if (features[1] <= 9.11940002441) {
classes[0] = 0;
classes[1] = 2;
} else {
if (features[0] <= 1.21078002453) {
if (features[0] <= 1.11364006996) {
classes[0] = 1;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 1;
}
} else {
classes[0] = 2;
classes[1] = 0;
}
}
}
} else {
classes[0] = 6;
classes[1] = 0;
}
}
} else {
if (features[0] <= 2.35693502426) {
classes[0] = 0;
classes[1] = 7;
} else {
classes[0] = 1;
classes[1] = 0;
}
}
} else {
if (features[1] <= 16.5127105713) {
if (features[1] <= 12.1385450363) {
if (features[1] <= 6.92804527283) {
if (features[1] <= 6.25199985504) {
classes[0] = 0;
classes[1] = 4;
} else {
if (features[0] <= 5.02503490448) {
classes[0] = 2;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 1;
}
}
} else {
if (features[1] <= 10.6784753799) {
classes[0] = 0;
classes[1] = 9;
} else {
if (features[1] <= 10.7935905457) {
classes[0] = 1;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 5;
}
}
}
} else {
if (features[0] <= 4.75841522217) {
if (features[0] <= 3.42268514633) {
classes[0] = 1;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 5;
}
} else {
classes[0] = 2;
classes[1] = 0;
}
}
} else {
classes[0] = 1;
classes[1] = 0;
}
}
} else {
if (features[0] <= 4.17648506165) {
classes[0] = 6;
classes[1] = 0;
} else {
if (features[0] <= 4.91468000412) {
classes[0] = 0;
classes[1] = 3;
} else {
classes[0] = 2;
classes[1] = 0;
}
}
}
}
} else {
if (features[0] <= 7.70522975922) {
if (features[0] <= 7.64461517334) {
if (features[0] <= 6.52222013474) {
if (features[0] <= 6.49937534332) {
if (features[1] <= 8.1920003891) {
if (features[1] <= 8.07668018341) {
classes[0] = 0;
classes[1] = 4;
} else {
classes[0] = 1;
classes[1] = 0;
}
} else {
classes[0] = 0;
classes[1] = 14;
}
} else {
classes[0] = 1;
classes[1] = 0;
}
} else {
if (features[1] <= 13.1301851273) {
classes[0] = 0;
classes[1] = 41;
} else {
if (features[1] <= 13.5656652451) {
classes[0] = 1;
classes[1] = 0;
} else {
classes[0] = 0;
classes[1] = 7;
}
}
}
} else {
classes[0] = 1;
classes[1] = 0;
}
} else {
classes[0] = 0;
classes[1] = 183;
}
}
int index = 0;
for (int i = 0; i < 2; i++) {
index = classes[i] > classes[index] ? i : index;
}
return index;
}
int main(int argc, const char * argv[]) {
/* Features: */
double features[argc-1];
int i;
for (i = 1; i < argc; i++) {
features[i-1] = atof(argv[i]);
}
/* Prediction: */
printf("%d", predict(features));
return 0;
}
Hi, Great work with Porter really helpful!
The next is a small issue but one that took me a good time to debug so here I wanted to post as both a problem and a possible solution that seems to work for me.
I have been porting an MLPClassifier to android, everything seemed fine except that in java desktop tests the classifier worked fine but in android would usually produce not completely wrong but slightly off values. I kept running tests and found that the way MLPClassifier is implemented currently in Java stores the input values of the network in the object every time a prediction is made, what this means is that if the method .predict is run once any subsequent call will reuse values that were changed inside the network, with this I do not mean the weights but the actual input values and any subsequent estimations. This does not produce very different results but slightly off which makes it very hard to debug, initially, I thought this may have been just a rounding numbers issue. Also when running desktop tests you may run the suggested terminal test which inputs a single value, and hence this problem is impossible to catch that way as it only appears when you call .predict multiple times sequentially.
A way to fix this issue is by adding a method that resets the network values to zero.
public void reset(){
//Cleans up the network values
for (int i=0;i<this.network.length;i++){
for (int i2=0;i2<this.network[i].length;i2++){
this.network[i][i2]=0;
}
}
}
The solution above has the caveat that it will assign a value of zero to the input values used in .predict since predict does not copy the values but instead uses a pointer.
Although deleting the MLPClassifier is another option or creating a new this.network is possible it may be much slower.
Hope this helps other people and if you have a better solution please let me know.
I was trying to implement the predict_proba function for an Extra Tree model when I realized that the result returned by the transpiled version of the model differed from the one returned by sklearn.
My model contains 30 trees and 3 classes, below are the classes predicted by sklearn along side the probabilities for each estimator:
Proba Class 0 | Proba Class 1 | Proba Class 2 | Predicted class | |
---|---|---|---|---|
Estimator 0 | 0.1765 | 0.0000 | 0.8235 | 2 |
Estimator 1 | 0.0000 | 0.0000 | 1.0000 | 2 |
Estimator 2 | 0.1667 | 0.0000 | 0.8333 | 2 |
Estimator 3 | 0.6923 | 0.0000 | 0.3077 | 0 |
Estimator 4 | 0.8125 | 0.0417 | 0.1458 | 0 |
Estimator 5 | 0.8374 | 0.0064 | 0.1562 | 0 |
Estimator 6 | 0.9727 | 0.0000 | 0.0273 | 0 |
Estimator 7 | 0.3429 | 0.0000 | 0.6571 | 2 |
Estimator 8 | 0.8391 | 0.0095 | 0.1514 | 0 |
Estimator 9 | 0.0000 | 0.0000 | 1.0000 | 2 |
Estimator 10 | 0.7266 | 0.0078 | 0.2656 | 0 |
Estimator 11 | 0.6220 | 0.0000 | 0.3780 | 0 |
Estimator 12 | 0.5000 | 0.0000 | 0.5000 | 0 |
Estimator 13 | 0.6117 | 0.0000 | 0.3883 | 0 |
Estimator 14 | 0.0000 | 0.0000 | 1.0000 | 2 |
Estimator 15 | 0.8687 | 0.0000 | 0.1313 | 0 |
Estimator 16 | 1.0000 | 0.0000 | 0.0000 | 0 |
Estimator 17 | 0.8468 | 0.0170 | 0.1362 | 0 |
Estimator 18 | 0.5595 | 0.0000 | 0.4405 | 0 |
Estimator 19 | 0.0714 | 0.0000 | 0.9286 | 2 |
Estimator 20 | 0.4600 | 0.0000 | 0.5400 | 2 |
Estimator 21 | 0.0000 | 0.0000 | 1.0000 | 2 |
Estimator 22 | 0.5217 | 0.0000 | 0.4783 | 0 |
Estimator 23 | 0.8322 | 0.0049 | 0.1629 | 0 |
Estimator 24 | 0.5000 | 0.0000 | 0.5000 | 0 |
Estimator 25 | 0.3333 | 0.0000 | 0.6667 | 2 |
Estimator 26 | 1.0000 | 0.0000 | 0.0000 | 0 |
Estimator 27 | 0.4545 | 0.0000 | 0.5455 | 2 |
Estimator 28 | 0.0000 | 0.0000 | 1.0000 | 2 |
Estimator 29 | 0.0000 | 0.0000 | 1.0000 | 2 |
MODEL | 0.4916 | 0.0029 | 0.5055 | 2 |
17 estimators predict class 0 and 13 predict class 2 BUT the model predicts class 2 because it is the most probable class.
Therefore it seems to me that the transpiled model should also make its decision on the predicted probabilities.
What do you think?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.