Giter Site home page Giter Site logo

amplab / keystone Goto Github PK

View Code? Open in Web Editor NEW
468.0 468.0 116.0 19.26 MB

Simplifying robust end-to-end machine learning on Apache Spark.

Home Page: http://keystone-ml.org/

License: Apache License 2.0

Scala 96.19% Shell 0.89% Makefile 0.38% C++ 2.20% Python 0.11% R 0.22%

keystone's People

Contributors

concretevitamin avatar etrain avatar shivaram avatar stephentu avatar tomerk avatar vaishaal avatar vaishaal2 avatar zhaozhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keystone's Issues

Grayscale

With proper NTSC standard weights

MNIST Pipeline

Simplest image pipeline - should run single-node with no issues.

Accuracy

Compute accuracy of a classifier.

mAP

Mean average precision.

SIFT

Should call VLFeat dense SIFT via JNI.

Standard VOC Pipeline

VOC Pipeline with 59% MAP

Requires

  1. SIFT
  2. PCA
  3. GMM
  4. FisherVector

As well as a dataloader for VOC2007.

Clean up launch scripts

Bash scripts should be put in a uniform format, and we should settle on some common spark options.

e.g. setting eventLogger to "on" and locality.wait to some reasonable default.

We should also have instructions about how to launch a cluster with spark-ec2 scripts and execute pipelines with this repo.

MFCCs

Multi-frequency Cepstral Coefficients - a common set of speech features.

C build script

Create a Makefile for the C components, and integrate with sbt.

HMM

Used for speech pipeline. May be unncessary for release 0.1.

Standardize main() format and use scopt

Pipelines executed from the command line should have a uniform CLI. We should use scopt for argument parsing and be consistent with option names across pipelines.

LDA

Linear Discriminant Analysis - as opposed to the topic modeling kind.

LCS

Local Color Statistic

Random Features

Should support gaussian and cauchy random maps. Useful if there's a lazy interface as well.

GMM

Should fit a GMM on a sample of 1m features locally - call out to VGG GMM library with JNI.

Language Model Pipeline

Implemented with Stupid Backoff. Should have a small demo that shows what the language model can do - e.g. evaluate P(text comes from modeled langague).

PCA

Transformer should downsample with PCA matrix.

NER

Named Entity Recognition. May be unnecessary for 0.1.

FFT Transformer

Whether this belongs in stats or elsewhere is certainly up for debate.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.