Giter Site home page Giter Site logo

pattern's Introduction

Overview

Pattern is a Cascading framework and library for machine learning model scoring at scale.

Pattern can read PMML models as workflow specifications for generating Cascading flows which can run on Apache Hadoop.

Pattern is still under active development under the wip-1.0 branch. Thus all wip releases are made available from the files.concurrentinc.com domain. When Pattern hits 1.0 and beyond, final releases will be under files.cascading.org.

See the pattern-examples subdirectory for sample apps.

For more information, visit: http://www.cascading.org/pattern/

PMML

Pattern currently supports the following PMML model types:

  • General Regression
  • Regression
  • Clustering
  • Tree
  • Mining - ensembles of the above models like Random Forest

In progress are:

  • Neural Network
  • Support Vector Machine

Not all aspects of each of the above models are supported. To request support for a particular model or model parameter, report an issue.

These PMML model types translate or compose into:

Note: Hierarchical Clustering is also implemented. The unit test for that algorithm currently excludes two data points. In regression tests with the Iris data set, we've isolated edge cases where the classifiers in R and Pattern do not agree. Then again, Iris data gets used to illustrate model behaviors with such properties. This will take some digging into numerical operations inside R.

Using

To use Pattern, there is no installation other than adding the necessary dependencies to Maven, Ivy, or Gradle.

To include the base core model libraries, use:

<dependency>
  <groupId>cascading</groupId>
  <artifactId>pattern-core</artifactId>
  <version>x.y.z</version>
</dependency>

To include the PMML parsing libraries and the PMMLPlanner, use:

<dependency>
  <groupId>cascading</groupId>
  <artifactId>pattern-pmml</artifactId>
  <version>x.y.z</version>
</dependency>

Other sub-projects and artifacts are simply in place to faciliate testing on various platforms, the above dependencies have no dependencies on Cascading Hadoop or local modes, they are completely independent of the underying platforms.

Reporting Issues

The best way to report an issue is to add a new test to SimplePMMLPlatformTest along with the expected result set and submit a pull request on GitHub.

Failing that, feel free to open an issue on the Cascading/Pattern project site or mail the mailing list.

Developing

Running:

> gradle idea

from the root of the project will create all IntelliJ project and module files, and retrieve all dependencies.

pattern's People

Contributors

ceteri avatar cwensel avatar girish-a1 avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.