Comments (5)
Yes you are right. In general all estimators in all programming languages return the index of the resulted label (y
), because it would be an overhead to reimplement the mapping for each programming language. Nevertheless I noted this requirement for a future release.
from sklearn-porter.
for export json option can we add classes name array to JSON data ?
from sklearn-porter.
A hack for this would be to put in the exported JSON data few labelled training samples ((x_i, y_i), ...) from each class for which we know that the python classifier predicts correctly their classes (i.e. the most confident training samples from each class). Then in the target language, one can match the indexes provided by clf.predict(x_i) to their actual labels y_i ...
from sklearn-porter.
I just came across this problem as well:
In my case, I have input classes ranging from [1,2,3,4,5]
, however there is no example in the training for class 2
. As a result, the C-version of my random forest outputs classes [1,2,3,4]
, with 2,3,4
being actually 3,4,5
. Is there any way to prevent that, or are there ideas to fix this without tampering with the C code?
I have a semi-production pipeline where sometimes classes are not part of the training set, and I would be glad to have some way to automatically correct that without manually putting class labels into the c code.
(see also BayesWitnesses/m2cgen#77 where I outline this problem in more detail)
from sklearn-porter.
I solved it temporarily by writing a small wrapper:
It adds a conversion function to the c code and embeds it:
int idx2label(int class_idx) {
int labels[5] = {0,2,3,4,5}; // your original ints
return labels[class_idx];
}
import sklearn_porter
def save_model_sklearn_porter(clf, file):
"""
Saves an sklearn model which keeps the original class IDs, even if they are not consecutive.
"""
porter = sklearn_porter.Porter(clf, language='C')
output = porter.export(embed_data=True)
# see which labels are in the classifier, so far only ints are supported
labels = [str(int(i)) for i in clf.classes_]
# create new label code and conversion function
labels_code = 'int labels[{}] = {{{}}}'.format(len(labels), ','.join(labels))
convert_func = '\n\nint idx2label(int class_idx) { \n' +\
' {};\n return labels[class_idx];\n}}\n\n'.format(labels_code)
# insert this function in the beginning of the file
lines = output.splitlines()
position = 0
for idx, line in enumerate(lines):
if line.strip().startswith('#'): position=idx
lines.insert(position+1, convert_func)
output = '\n'.join(lines)
# replace last occurrence of `return class_idx` with the label transfer function
# with [::-1] we can revert the string and look for the first element as if it where the last
output = output[::-1].replace('return class_idx'[::-1], 'return idx2label(class_idx)'[::-1], 1)[::-1]
with open(file, 'w') as file:
file.write(output)
return output
from sklearn-porter.
Related Issues (20)
- Feature Request: translator for onehot encoder
- Feature Request: Multinomial Logistic Regression
- A bug : When the version of sklearn contains character sequences like "rc1, rc2", the Porter class cannot be created. HOT 1
- RandomForestClassifier export HOT 1
- decision tree C code exported by porter have zero integrity score with custom test_data. HOT 1
- Test code, which is part of the Readme is failing HOT 2
- [Query] Is the isolation forest model for outlier detection supported now? HOT 1
- ValueError: invalid literal for int() with base 10: 'post1' on Example from Readme HOT 2
- What does embed_data do?
- [Enhancement]Background concurrent copying GC freed for sklearn model constrcutor in Java HOT 2
- [Error] Works fine with C but getting this error when ported to Java
- OSError: Windows isn't supported yet HOT 3
- Unable to check integrity score. HOT 1
- Generating probabilities instead of categorical results
- scikit-learn-0.24.1: ModuleNotFoundError: No module named 'sklearn.tree.tree' HOT 5
- Is there any plan to support RandomForestRegressor? HOT 11
- ImportError: cannot import name 'Porter' HOT 2
- Can't use port or save functions HOT 3
- ModuleNotFoundError: No module named 'sklearn_porter' HOT 1
- ModuleNotFoundError: No module named 'sklearn.tree.tree' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sklearn-porter.