Giter Site home page Giter Site logo

chenwenjia1991 / hopsworks Goto Github PK

View Code? Open in Web Editor NEW

This project forked from logicalclocks/hopsworks

0.0 1.0 0.0 113.97 MB

Hopsworks - Data-Intensive AI platform with a Feature Store

Home Page: https://www.logicalclocks.com/

License: GNU Affero General Public License v3.0

Java 59.69% HTML 14.39% CSS 3.42% JavaScript 14.15% Python 0.02% Shell 0.17% Ruby 8.04% Jupyter Notebook 0.11%

hopsworks's Introduction

Hopsworks

Support forum

Overview

Hopsworks is a platform for both the design and operation of data analytics and machine learning applications. You can design ML applications in Jupyter notebooks in Python and operate them in workflows orchestrated by Airflow, while running on HopsFS, the world's most scalable HDFS-compatible distributed hierarchical filesystem (peer-reviewed, 1.2m ops/sec on Spotify's Hadoop workload). HopsFS also solves the small-files problem of HDFS, by storing small files on NVMe disks in the horizontally scalable metadata layer. Hopsworks is also a platform for data engineering, with support for Spark, Flink, and Kafka. As an on-premises platform, Hopsworks has unique support for project-based multi-tenancy, horizontally scalable ML pipelines, and managed GPUs-as-a-resource.

Multi-tenancy - Projects, Users, Datasets

Hopsworks provides Projects as a privacy-by-design sandbox for data, including sensitive data, and for managing collaborating teams - like GitHub. Datasets can be shared between projects - like Dropbox. Each project has its own Anaconda environment, enabling python dependencies to be managed by the data scientists themselves.

HopsML

HopsML is our framework for writing end-to-end machine learning workflows in Python. We support Airflow to orchestrate workflows with: ETL in PySpark or TensorFlow, a Feature Store, AutoML hyperparameter optimization techniques over many hosts and GPUs in Keras/TensorFlow/PyTorch, in addition to distributed training such as Collective AllReduce.

Jupyter notebooks can be used to write all parts of the pipeline, and TensorBoard to visualize experiment results during and after training. Models can be deployed in Kubernetes (built-in or external) and monitored in production using Kafka/Spark-Streaming. For more information see the docs.

Feature Store

The feature store is as a central place to store curated features for machine learning pipelines in Hopsworks. A feature is a measurable property of some data-sample. It could be for example an image-pixel, a word from a piece of text, the age of a person, a coordinate emitted from a sensor, or an aggregate value like the average number of purchases within the last hour. Features can come directly from tables or files or can be derived values, computed from one or more data sources. For more information see the docs.

TLS security

Uniquely in Hadoop, Hops supports X.509 certificates for authentication and authorization: users, services, jobs and TLS for in-flight encryption. At-rest encryption is also supported using ZFS-on-Linux.

HopsFS

HopsFS is a drop-in replacement for HDFS that adds distributed metadata and "small-files in metadata (NVMe disks)" support to HDFS.

Information

Documentation

Hopsworks documentation, includung user-guide, development guide, feature store, hops, HopsML, is available at https://hopsworks.readthedocs.io.

Hopsworks REST API is documented with Swagger and hosted by SwaggerHub.

To build and deploy swagger on your own Hopsworks instance you can follow the instructions found in this guide.

Installing Hopsworks

Installation of Hopsworks and all its services is automated with the Karamel software. Instructions on how to install the entire platform are available here.

For a local single-node installation, to access Hopsworks just point your browser at:

  http://localhost:8080/hopsworks
  usename: [email protected]
  password: admin

Admin email may differ on your installation. Please refer to your Karamel cluster definition to access/set the email.

Build instructions

Hopsworks consists of the backend module which is packaged in two files, hopsworks.ear and hopsworks-ca.war, and the front-end module which is packaged in a single .war file.

Build Requirements (for Ubuntu)

NodeJS server and bower, both required for building the front-end.

sudo apt install nodejs-legacy
sudo apt-get install npm
sudo npm cache clean
# You must have a version of bower > 1.54
sudo npm install bower -g
sudo npm install grunt -g

Build with Maven

mvn install

Maven uses yeoman-maven-plugin to build both the front-end and the backend. Maven first executes the Gruntfile in the yo directory, then builds the back-end in Java. The yeoman-maven-plugin copies the dist folder produced by grunt from the yo directory to the target folder of the backend.

You can also build Hopsworks without the frontend (for Java EE development and testing):

mvn install -P-web

Front-end Development

The javascript produced by building maven is obsfuscated. For debugging javascript, we recommend that you use the following script to deploy changes to HTML or javascript to your vagrant machine:

cd scripts
./js.sh

You should also add the chef recipe to the end of your Vagrantfile (or Karamel cluster definition):

 hopsworks::dev

For development

You can build Hopsworks without running grunt/bower using:

mvn install -P-dist

Then run your script to upload your javascript to snurran.sics.se:

cd scripts
./deploy.sh [yourName]

Testing Guide

The following steps must be taken to run Hopsworks integration tests:

-Warning: This test will clean hdfs and drop Hopsworks database. So it should only be used on a test machine.

First create a .env file by copying the .env.example file. Then edit the .env file by providing your specific configuration.

   cd hopsworks/hopsworks-IT/src/test/ruby/
   cp .env.example .env

Then export environments to match the server you are deploying to:

   GLASSFISH_HOST_NAME=localhost
   GLASSFISH_HTTP_PORT=8181
   GLASSFISH_ADMIN_PORT=4848

Change the server login credentials in hopsworks-IT/pom.xml

  <properties>
    ...
    <glassfish.admin>{username}</glassfish.admin>
    <glassfish.passwd>{password}</glassfish.passwd>
    ...
  </properties>

Export environments for Selenium integration test:

   HOPSWORKS_URL=http://localhost:8181/hopsworks
   HEADLESS=[true|false]
   BROWSER=[chrome|firefox]

To compile, deploy and run the integration test:

   cd hopsworks/
   mvn clean install -Pjruby-tests

If you have already deployed hopsworks-ear and just want to run the integration test:

   cd hopsworks/hopsworks-IT/src/test/ruby/
   bundle install
   rspec --format html --out ../target/test-report.html

To run a single test

   cd hopsworks/hopsworks-IT/src/test/ruby/
   rspec ./spec/session_spec.rb:60

To skip tests that need to run inside a vm

   cd hopsworks/hopsworks-IT/src/test/ruby/
   rspec --format html --out ../target/test-report.html --tag ~vm:true

When the test is done if LAUNCH_BROWSER is set to true in .env, it will open the test report in a browser.

hopsworks's People

Contributors

ermiasg avatar tkakantousis avatar evsav avatar kouzant avatar alorlea avatar robzor92 avatar kerkinos avatar siroibaf avatar amor3 avatar berthoug avatar o-alex avatar misdess avatar giannokostas avatar limmen avatar gayana06 avatar gholamiali avatar kai-chi avatar maismail avatar moritzmeister avatar augustbonds avatar fil0x avatar juancroca avatar bcleenders avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.