Comments (8)
Hello @lichard49 @chappers , I have good news, with the very latest commit on the master branch you can transpile a RandomForestClassifier with imported data. Have a look into the prepared notebook for a demonstration which uses the export_data=True
argument in the predict
method.
You can use the following commands to install the latest version:
pip uninstall -y sklearn-porter
pip install --no-cache-dir https://github.com/nok/sklearn-porter/zipball/master
from sklearn-porter.
Hello @lichard49,
I noticed this issue in the past by porting and using a large svm classifier. In my case I fixed it manually by using a property file which stores the model data (support vectors).
But in Java ...
A single method in a Java class may be at most 64KB of bytecode.
Currently I'm working on the next release, where you can run predictions against the ported models in Python.
After that I will fix this issue by adding an alternative export for larger models (in Java). Because most models are larger than 64KB of bytecode.
Happy coding,
Darius 🌵
from sklearn-porter.
@nok hopefully this isn't a stupid request as I don't normally use Java; could you provide a template of how you got around this for Java export?
from sklearn-porter.
Hello @chappers, I tested different solutions how we can store large model data in separate files.
First I tested .properties
files:
public static Properties load(String path) throws IOException {
Properties props = new Properties();
FileInputStream inStream = new FileInputStream(path);
BufferedInputStream buffer = new BufferedInputStream(inStream);
props.load(buffer);
inStream.close();
return props;
}
public static double[][][] convert(double[][][] output, String[] data) {
for (int i = 0, x = 0, xl = output.length; x < xl; x++) {
for (int y = 0, yl = output[x].length; y < yl; y++) {
for (int z = 0, zl = output[x][y].length; z < zl; z++) {
output[x][y][z] = Double.parseDouble(data[i++]);
}
}
}
return output;
}
Properties model = Tmp.load(System.getProperty("user.dir") + "/src/model.properties");
// model.properties: "inters=0.0, 0.0, 10.0, 12. ... "
double[][][] inters = Tmp.convert(new double[2][3][4], model.getProperty("inters").split(","));
System.out.println(inters[0][1][1]);
But I don't like that solution, because it's not generic (<?> ...
), what means that multiple versions of the convert
method (method overloading) are required. Furthermore the other programming languages don't really work well with properties files. So I decided to use the JSON format for storing all dynamic model data, but again Java is the black sheep. It unfortunately doesn't have any JSON parser in the standard packages. The status is that I will give org.json a go.
from sklearn-porter.
Thank you so much - I'm keen on seeing a more fleshed out version in the future, but at least I have an adhoc/manual way working in the interim.
from sklearn-porter.
Okay, that's good 👍 !
In the future the transpiled estimators will be cleaner, faster and more dynamically. Today small changes can affect over 40 different transformations and the related test cases.
from sklearn-porter.
I tried c with export_data = True, it seems not work.
Do you plan to support exported model in c in the future?
from sklearn-porter.
Hi and thank you very much for your contribution.
I am trying to export a RandomForestClassifier( n_estimators= 100, max_features = 'sqrt',max_depth=100, n_jobs=-1, verbose = 1) , but I think that my laptop runs out of memory. Do you think that I can try in a server with better specifications or only option is to reduce n_estimators and max_depth?
from sklearn-porter.
Related Issues (20)
- Feature Request: translator for onehot encoder
- Feature Request: Multinomial Logistic Regression
- A bug : When the version of sklearn contains character sequences like "rc1, rc2", the Porter class cannot be created. HOT 1
- RandomForestClassifier export HOT 1
- decision tree C code exported by porter have zero integrity score with custom test_data. HOT 1
- Test code, which is part of the Readme is failing HOT 2
- [Query] Is the isolation forest model for outlier detection supported now? HOT 1
- ValueError: invalid literal for int() with base 10: 'post1' on Example from Readme HOT 2
- What does embed_data do?
- [Enhancement]Background concurrent copying GC freed for sklearn model constrcutor in Java HOT 2
- [Error] Works fine with C but getting this error when ported to Java
- OSError: Windows isn't supported yet HOT 3
- Unable to check integrity score. HOT 1
- Generating probabilities instead of categorical results
- scikit-learn-0.24.1: ModuleNotFoundError: No module named 'sklearn.tree.tree' HOT 5
- Is there any plan to support RandomForestRegressor? HOT 11
- ImportError: cannot import name 'Porter' HOT 2
- Can't use port or save functions HOT 3
- ModuleNotFoundError: No module named 'sklearn_porter' HOT 1
- ModuleNotFoundError: No module named 'sklearn.tree.tree' HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sklearn-porter.