yinlou / mltk Goto Github PK
View Code? Open in Web Editor NEWMachine Learning Tool Kit
License: BSD 3-Clause "New" or "Revised" License
Machine Learning Tool Kit
License: BSD 3-Clause "New" or "Revised" License
The regression tree methods in MLTK allocate and drop a huge number of objects, which causes GC churn, and hugely strains the VM. A huge amount of time is spent in garbage collection, and the impact is even worse if you are trying to run several regressions in parallel, since the JVM doesn't do a good job of concurrent garbage collection. (The scalability issue alone will probably mean I can't use MLTK for my task.)
An object instance recycling scheme would help immensely with this problem.
If the original features are at different ranges other than [-1, 1], do we need to normalize/calibrate their values before running mltk.predictor.gam.GAMLearner? Or, GAMLearner will take care of it?
Another question is: for mltk.predictor.evaluation.Evaluator, does can we find its metric output? I don't see the metric numbers displayed in output display or saved in any output file?
I'm trying to build on OS X 10.14.4.
brew install cask java
brew install maven
mvn clean package
as directed here.I get this error:
1 error
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 9.082 s
[INFO] Finished at: 2019-04-23T21:20:38-05:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:2.10.3:jar (attach-javadocs) on project mltk: MavenReportException: Error while generating Javadoc:
[ERROR] Exit code: 1 - javadoc: error - The code being documented uses modules but the packages defined in http://docs.oracle.com/javase/8/docs/api/ are in the unnamed module.
[ERROR]
[ERROR] Command line was: /Library/Java/JavaVirtualMachines/openjdk-12.0.1.jdk/Contents/Home/bin/javadoc @options @packages
[ERROR]
[ERROR] Refer to the generated Javadoc files in '/Users/dan/work/mltk/target/apidocs' dir.
[ERROR]
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
Might just be docs?
This is a very interesting package for ml in java.
I think there must be some working data set during the development.
could you provide some of them as demos to help users startup.
Does mltk have support for data structures which are already present in the runtime of the program? I'm loading and manipulating data after fetching it from a remote database, and would love to pump in my matrices directly into mltk without writing to disk.
Is there any way to build datasets in memory, using the API, or do I need to write out a file to disk, and read it back in?
I tried creating a dataset using the API, but the methods and constructors of Attribute are not visible, so I can't create a List, so I can't create an Instances object, so I can't create cross-validation folds.
It seems to me that there is a bug in Visualizer
when one of the interaction terms is not of type BinnedAttribute
, because at line 251 there is a cast Bins bins1 = ((BinnedAttribute) f1).getBins();
without a previous type test. This is weird because some tests are made before to handle NominalAttributes
, but the end of the code seems to run only on BinnedAttributes
(in particular it need boundaries which are only defined for bins..).
I'm using MLTK from Clojure and deployment/installation of code dependent on it would be much nicer if it were in Maven Central.
If I run an elastic net learner a dozen times in a row, I get nine different results. I've confirmed that my input data is identical on each round (the hashcodes on my data strings and attribute strings are identical each round). Is there anything variable or non-deterministic about the elastic net process? Also, all of my work is done using BigDecimal, so any floating point issues would be within the regression. Is that what's happening here?
I have been trying to use MLTK for a regression problem, and I found there are many important details missing from the documentation:
That's as far as I have gotten so far... I finally got my first regression results, but it took me several hours to figure out how to use this properly... hopefully this feedback helps!
My guess is that it does not create a bug, but there is a typo in the function compare
in DoublePairComparator
in mltk.predictor.evaluation.AUC
.
You wrote:
int cmp = Double.compare(o1.v1, o2.v1);
if (cmp == 0) {
cmp = Double.compare(o2.v2, o2.v2);
}
but the second comparison should be Double.compare(
o1.v2, o2.v2)
.
I've been reading the docs trying to create an example of classification with GA2MLearner using command-line tools.
I checked in https://github.com/dfrankow/mltk/tree/master/examples with train_ga2m.sh. You should be able to check it out and run it. If I can get it to work, I'm happy to pass it back as an example, as requested in #17.
Several questions:
I am new to java. Could you make a .jar file please?
On the Intelligible Models webpage you say that we should pass the option -R cal_housing.residual
when building the GAM model (step 3), so that it can be used to detect the interactions in step4.
However this version of the code does not handle such option and does not save the residuals. Can you confirm that the residuals that you mention are those stored in rTrain
in your code ?
I've fixed it and can submit a pull request if you want.
By the way, the option -T
is not handled neither, and so the score on the test set is not computed.
Thanks a lot for sharing your work!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.