Giter Site home page Giter Site logo

amplab / keystone Goto Github PK

View Code? Open in Web Editor NEW
467.0 467.0 116.0 19.26 MB

Simplifying robust end-to-end machine learning on Apache Spark.

Home Page: http://keystone-ml.org/

License: Apache License 2.0

Scala 96.19% Shell 0.89% Makefile 0.38% C++ 2.20% Python 0.11% R 0.22%

keystone's People

Contributors

concretevitamin avatar etrain avatar ngarneau avatar shivaram avatar stephentu avatar tomerk avatar vaishaal avatar vaishaal2 avatar zhaozhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

keystone's Issues

Grayscale

With proper NTSC standard weights

HMM

Used for speech pipeline. May be unncessary for release 0.1.

Standardize main() format and use scopt

Pipelines executed from the command line should have a uniform CLI. We should use scopt for argument parsing and be consistent with option names across pipelines.

GMM

Should fit a GMM on a sample of 1m features locally - call out to VGG GMM library with JNI.

mAP

Mean average precision.

LCS

Local Color Statistic

MFCCs

Multi-frequency Cepstral Coefficients - a common set of speech features.

Random Features

Should support gaussian and cauchy random maps. Useful if there's a lazy interface as well.

Standard VOC Pipeline

VOC Pipeline with 59% MAP

Requires

  1. SIFT
  2. PCA
  3. GMM
  4. FisherVector

As well as a dataloader for VOC2007.

NER

Named Entity Recognition. May be unnecessary for 0.1.

FFT Transformer

Whether this belongs in stats or elsewhere is certainly up for debate.

C build script

Create a Makefile for the C components, and integrate with sbt.

SIFT

Should call VLFeat dense SIFT via JNI.

Clean up launch scripts

Bash scripts should be put in a uniform format, and we should settle on some common spark options.

e.g. setting eventLogger to "on" and locality.wait to some reasonable default.

We should also have instructions about how to launch a cluster with spark-ec2 scripts and execute pipelines with this repo.

MNIST Pipeline

Simplest image pipeline - should run single-node with no issues.

PCA

Transformer should downsample with PCA matrix.

Language Model Pipeline

Implemented with Stupid Backoff. Should have a small demo that shows what the language model can do - e.g. evaluate P(text comes from modeled langague).

Accuracy

Compute accuracy of a classifier.

LDA

Linear Discriminant Analysis - as opposed to the topic modeling kind.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.