amplab / keystone Goto Github PK
View Code? Open in Web Editor NEWSimplifying robust end-to-end machine learning on Apache Spark.
Home Page: http://keystone-ml.org/
License: Apache License 2.0
Simplifying robust end-to-end machine learning on Apache Spark.
Home Page: http://keystone-ml.org/
License: Apache License 2.0
With proper NTSC standard weights
That is this repo!
BSD Licensed. Permissions, labels, and milestones set up.
Should have accuracy >60% with 4k features.
Simplest image pipeline - should run single-node with no issues.
Compute accuracy of a classifier.
No public features - but we can achieve the solve result!
Mean average precision.
Jenkins should build the repo, and run all unit tests. It should also automatically build/test PRs.
May be unnecessary for Release 0.1
Should call VLFeat dense SIFT via JNI.
VOC Pipeline with 59% MAP
Requires
As well as a dataloader for VOC2007.
Bash scripts should be put in a uniform format, and we should settle on some common spark options.
e.g. setting eventLogger to "on" and locality.wait to some reasonable default.
We should also have instructions about how to launch a cluster with spark-ec2 scripts and execute pipelines with this repo.
Should call VGG fisher vector C++ code via JNI.
Including featurization. Stretch goal - no milestone.
Multi-frequency Cepstral Coefficients - a common set of speech features.
Create a Makefile for the C components, and integrate with sbt.
First commit contains a reasonable clean build script.
Used for speech pipeline. May be unncessary for release 0.1.
Stop-word removal -> N-Grams -> TFIDF -> Naive Bayes
Should work on 20 Newsgroups and RCV1.
Subtract feature means and optionally divide by SD.
Pipelines executed from the command line should have a uniform CLI. We should use scopt
for argument parsing and be consistent with option names across pipelines.
Linear Discriminant Analysis - as opposed to the topic modeling kind.
Local Color Statistic
Call into MLlib
Zongheng can fill in details - I think this is a stretch for release 0.1, so I'm not going to assign a milestone.
These should also run single node
May be unnecessary for Release 0.1
Should support gaussian and cauchy random maps. Useful if there's a lazy interface as well.
Should fit a GMM on a sample of 1m features locally - call out to VGG GMM library with JNI.
Implemented with Stupid Backoff. Should have a small demo that shows what the language model can do - e.g. evaluate P(text comes from modeled langague).
Transformer should downsample with PCA matrix.
First cut may or may not do label centering.
Ideally on the ACE2005 dataset.
Downsample a collection.
Named Entity Recognition. May be unnecessary for 0.1.
Whether this belongs in stats or elsewhere is certainly up for debate.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.