amplab / keystone Goto Github PK
View Code? Open in Web Editor NEWSimplifying robust end-to-end machine learning on Apache Spark.
Home Page: http://keystone-ml.org/
License: Apache License 2.0
Simplifying robust end-to-end machine learning on Apache Spark.
Home Page: http://keystone-ml.org/
License: Apache License 2.0
May be unnecessary for Release 0.1
No public features - but we can achieve the solve result!
With proper NTSC standard weights
First cut may or may not do label centering.
Used for speech pipeline. May be unncessary for release 0.1.
Pipelines executed from the command line should have a uniform CLI. We should use scopt
for argument parsing and be consistent with option names across pipelines.
May be unnecessary for Release 0.1
Should fit a GMM on a sample of 1m features locally - call out to VGG GMM library with JNI.
Mean average precision.
Should call VGG fisher vector C++ code via JNI.
Should have accuracy >60% with 4k features.
Local Color Statistic
That is this repo!
BSD Licensed. Permissions, labels, and milestones set up.
Multi-frequency Cepstral Coefficients - a common set of speech features.
Should support gaussian and cauchy random maps. Useful if there's a lazy interface as well.
These should also run single node
Subtract feature means and optionally divide by SD.
VOC Pipeline with 59% MAP
Requires
As well as a dataloader for VOC2007.
Named Entity Recognition. May be unnecessary for 0.1.
Stop-word removal -> N-Grams -> TFIDF -> Naive Bayes
Should work on 20 Newsgroups and RCV1.
Whether this belongs in stats or elsewhere is certainly up for debate.
Create a Makefile for the C components, and integrate with sbt.
Should call VLFeat dense SIFT via JNI.
Bash scripts should be put in a uniform format, and we should settle on some common spark options.
e.g. setting eventLogger to "on" and locality.wait to some reasonable default.
We should also have instructions about how to launch a cluster with spark-ec2 scripts and execute pipelines with this repo.
Simplest image pipeline - should run single-node with no issues.
Zongheng can fill in details - I think this is a stretch for release 0.1, so I'm not going to assign a milestone.
Transformer should downsample with PCA matrix.
Jenkins should build the repo, and run all unit tests. It should also automatically build/test PRs.
Call into MLlib
Ideally on the ACE2005 dataset.
Implemented with Stupid Backoff. Should have a small demo that shows what the language model can do - e.g. evaluate P(text comes from modeled langague).
Compute accuracy of a classifier.
Including featurization. Stretch goal - no milestone.
Linear Discriminant Analysis - as opposed to the topic modeling kind.
First commit contains a reasonable clean build script.
Downsample a collection.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.