Giter Site home page Giter Site logo

h2oai / sparkling-water Goto Github PK

View Code? Open in Web Editor NEW
952.0 179.0 363.0 47.8 MB

Sparkling Water provides H2O functionality inside Spark cluster

Home Page: https://docs.h2o.ai/sparkling-water/3.3/latest-stable/doc/index.html

License: Apache License 2.0

Shell 1.13% Scala 66.71% Java 1.33% Batchfile 0.33% Python 16.69% CSS 0.17% Groovy 4.56% R 2.38% HCL 1.67% TeX 5.03%
h2o spark machine-learning integration pysparkling rsparkling big-data pyspark scala

sparkling-water's Introduction

sparkling-water-logo

mvn-badge apache-2-0-license Powered by H2O.ai

Sparkling Water

Sparkling Water integrates H2O-3, a fast scalable machine learning engine with Apache Spark. It provides:

  • Utilities to publish Spark data structures (RDDs, DataFrames, Datasets) as H2O-3's frames and vice versa.
  • DSL to use Spark data structures as input for H2O's algorithms.
  • Basic building blocks to create ML applications utilizing Spark and H2O APIs.
  • Python interface enabling use of Sparkling Water directly from PySpark.

Getting Started

User Documentation

Read the documentation for Spark 3.5 (or 3.4 , 3.3 , 3.2 , 3.1, 3.0, 2.4, 2.3)

Download Binaries

Download the latest version for Spark 3.5 (or 3.4, 3.3, 3.2, 3.1, 3.0, 2.4, 2.3)

Each Sparkling Water release is also published into the Maven Central (more details below).


Try Sparkling Water!

Sparkling Water is distributed as a Spark application library which can be used by any Spark application. Furthermore, we provide also zip distribution which bundles the library and shell scripts.

There are several ways of using Sparkling Water:

  • Sparkling Shell (Spark Shell with Sparkling Water included)
  • Sparkling Water driver (Spark Submit with Sparkling Water included)
  • Spark Shell and include Sparkling Water library via --jars or --packages option
  • Spark Submit and include Sparkling Water library via --jars or --packages option
  • PySpark with PySparkling

Run Sparkling shell

The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via --jars option. The Sparkling Shell supports creation of an H2O-3 cloud and execution of H2O-3 algorithms.

  1. Either download or build Sparkling Water
  2. Configure the location of Spark cluster:

    export SPARK_HOME="/path/to/spark/installation"
    export MASTER="local[*]"

    In this case, local[*] points to an embedded single node cluster.

  3. Run Sparkling Shell:

    bin/sparkling-shell

    Sparkling Shell accepts common Spark Shell arguments. For example, to increase memory allocated by each executor, use the spark.executor.memory parameter: bin/sparkling-shell --conf "spark.executor.memory=4g"

  4. Initialize H2OContext

    import ai.h2o.sparkling._
    val hc = H2OContext.getOrCreate()

    H2OContext starts H2O services on top of Spark cluster and provides primitives for transformations between H2O-3 and Spark data structures.

Use Sparkling Water with PySpark

Sparkling Water can be also used directly from PySpark and the integration is called PySparkling.

See PySparkling README to learn about PySparkling.

Use Sparkling Water via Spark Packages

To see how Sparkling Water can be used as Spark package, please see Use as Spark Package.

Use Sparkling Water in Windows environments

See Windows Tutorial to learn how to use Sparkling Water in Windows environments.

Sparkling Water examples

To see how to run examples for Sparkling Water, please see Running Examples.

Maven packages

Each Sparkling Water release is published into Maven central with following coordinates:

  • ai.h2o:sparkling-water-core_{{scala_version}}:{{version}} - Includes core of Sparkling Water
  • ai.h2o:sparkling-water-examples_{{scala_version}}:{{version}} - Includes example applications
  • ai.h2o:sparkling-water-repl_{{scala_version}}:{{version}} - Spark REPL integration into H2O Flow UI
  • ai.h2o:sparkling-water-ml_{{scala_version}}:{{version}} - Extends Spark ML package by H2O-based transformations
  • ai.h2o:sparkling-water-scoring_{{scala_version}}:{{version}} - A library containing scoring logic and definition of Sparkling Water MOJO models.
  • ai.h2o:sparkling-water-scoring-package_{{scala_version}}:{{version}} - Lightweight Sparkling Water package including all dependencies required just for scoring with H2O-3 and DAI MOJO models.
  • ai.h2o:sparkling-water-package_{{scala_version}}:{{version}} - Sparkling Water package containing all dependencies required for model training and scoring. This is designed to use as Spark package via --packages option.

    Note: The {{version}} references to a release version of Sparkling Water, the {{scala_version}} references to Scala base version.

The full list of published packages is available here.


Sparkling Water Backends

Sparkling water supports two backend/deployment modes - internal and external. Sparkling Water applications are independent on the selected backend. The backend can be specified before creation of the H2OContext.

For more details regarding the internal or external backend, please see Backends.


FAQ

List of all Frequently Asked Questions is available at FAQ.


Development

Complete development documentation is available at Development Documentation.

Build Sparkling Water

To see how to build Sparkling Water, please see Build Sparkling Water.

Develop applications with Sparkling Water

An application using Sparkling Water is regular Spark application which bundling Sparkling Water library. See Sparkling Water Droplet providing an example application here.

Contributing

Just drop us a PR! For inspiration look at our list of issues, feel free to create one.

Filing Bug Reports and Feature Requests

You can file a bug report of feature request directly in Github Issues Github Issues.

Have Questions?

We also respond to questions tagged with sparkling-water and h2o tags on the Stack Overflow.

Change Logs

Change logs are available at Change Logs.


Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.