Giter Site home page Giter Site logo

spark-vlbfgs's Introduction

spark-vlbfgs

This package is an implementation of the Vector-free L-BFGS solver and some scalable machine learning algorithms for Apache Spark.

Apache Spark MLlib provides scalable implementation of popular machine learning algorithms, which lets users train models from big dataset and iterate fast. The existing implementations assume that the number of parameters is small enough to fit in the memory of a single machine. However, many applications require solving problems with billions of parameters on a huge amount of data such as Ads CTR prediction and deep neural network. This requirement far exceeds the capacity of exisiting MLlib algorithms many of which use L-BFGS as the underlying solver. In order to fill this gap, we developed Vector-free L-BFGS for MLlib. Vector-free L-BFGS avoids the expensive dot product operations in the two loop recursion and greatly improves computation efficiency with a great degree of parallelism. It can solve optimization problems with billions of parameters in the Spark SQL framework where the training data are often generated. The algorithm scales very well and enables a variety of MLlib algorithms to handle a massive number of parameters over large datasets.

Supported algorithms

spark-vlbfgs currently supports the following algorithms:

  • Logistic Regression
  • Linear Regression

with regularization:

  • L1
  • L2
  • Elastic Net

To be supported:

  • Softmax Regression
  • Multilayer Perceptron Classifier

Build and run spark-vlbfgs

spark-vlbfgs is built using Apache Maven. To build spark-vlbfgs and its example programs, run:

mvn clean package -DskipTests

by default this project will be built against spark-2.0.0 with scala-2.11, if you want to specify other version, use maven -D parameter such as:

mvn clean package -Dscala.binary.version=2.10 -Dspark.version=2.0.0

then run example:

spark-submit
   --master yarn
   --num-executors 10
   --executor-cores 2
   --class org.apache.spark.ml.example.VLORExample
   /path/to/spark-vlbfgs-0.1-SNAPSHOT.jar [paramlist]

Example

You can train a logistic regression model via spark-vlbfgs API which is consistent with Apache Spark MLlib:

val dataset: Dataset[_] = spark.read.format("libsvm").load("data/a9a")
val trainer = new VLogisticRegression()
  .setColsPerBlock(100)
  .setRowsPerBlock(10)
  .setColPartitions(3)
  .setRowPartitions(3)
  .setRegParam(0.5)
val model = trainer.fit(dataset)

println(s"Vector-free logistic regression coefficients: ${model.coefficients}")

Talks

Reference

Contact & Acknowledgements

If you have any questions or encounter bugs, feel free to submit an issue or contact:

We are immensely grateful to Xiangrui Meng for the initial work and guidance during the design and development of spark-vlbfgs.

spark-vlbfgs's People

Contributors

weichenxu123 avatar yanboliang avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.