Giter Site home page Giter Site logo

marlin's Introduction

Marlin

A distributed matrix operations library build on top of Spark. Now, the master branch is in version 0.4-SNAPSHOT.

##Branches Notice This branch(spark-marlin) built on a custom version Spark to get better performance for matrix operations, however this branch has not been published out. If you use the official version Spark, please refer to master branch or spark-1.0.x branch

##Prerequisites As Marlin is built on top of Spark, you need to get the Spark installed first. If you are not clear how to setup Spark, please refer to the guidelines here. Currently, Marlin is developed on the APIs of Spark 1.4.0 version.

##Compile Marlin We use Maven to build our project currently, you can just type mvn package -DskipTests to get the jar package. Moreover, you can assign profile e.g. spark-1.3, spark-1.2, hadoop-2.4, to build Marlin according to your environment.

As the API changes in Breeze, we have specially created a new branch named spark-1.0.x which means it is compatible with Spark version 1.0.x, while the master branch mainly focus on the later newest versions of Spark

##Run Marlin We have already offered some examples in edu.nju.pasalab.marlin.examples to show how to use the APIs in the project. For example, if you want to run two large matrices multiplication, use spark-submit method, and type in command

$./bin/spark-submit \
 --class edu.nju.pasalab.marlin.examples.MatrixMultiply
 --master <master-url> \
 --executor-memory <memory> \
 marlin_2.10-0.2-SNAPSHOT.jar \
 <matrix A rows> <martrix A columns> \
 <martrix B columns> <cores cross the cluster>

Note: Because the pre-built Spark-assembly jar doesn't have any files about netlib-java native compontent, which means you cannot use the native linear algebra library e.g BLAS to accelerate the computing, but have to use pure java to perform the small block matrix multiply in every worker. We have done some experiments and find it has a significant performance difference between the native BLAS computing and the pure java one, here you can find more info about the performance comparison and how to load native library.

Note: this example use MTUtils.randomDenVecMatrix to generate distributed random matrix in-memory without reading data from files.

Note: <cores cross the cluster> is the num of cores across the cluster you want to use.

##Martix Operations API in Marlin Currently, we have finished some APIs, you can find documentation in this page.

##Algorithms and Performance Evaluation The details of the matrix multiplication algorithm is here.

###Performance Evaluation We have done some performance evaluation of Marlin. It can be seen here.

##Contact gurongwalker at gmail dot com

myasuka at live dot com

marlin's People

Contributors

myasuka avatar ronggu avatar wangzk avatar xingkungao avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

marlin's Issues

roadmap

Do you have any plan update the library to base on spark 1.6?

Merge with Spark 2.0 or 1.6.2

Hello PasaLab,
Thanks for your amazing work.
Can you please update your code in order to work with Spark 2.0 or 1.6.2 at least?
Regards

Performance Test

As for matrix multiplication,how big matrices can you support at most using your current configurations(one 32G master plus sixteen 24G workers)???Looking forward to your reply!

is it possible to support Complex Double data type ?

Hi,
I am looking into the code to check if it is feasible to support Complex Double data type for matrix inverse and multiplication.

I realize that you have use a couple of external packages:
BLAS : you use dspr so I can replace it with zspr I presume ?
ARPACK : you use dsaupd and dseupd, I can not find a equivalent method.
Breeze : it supports Complex data type, so should be fine I guess ?

What's your assessment/advice for supporting Complex Double data type ?

many thanks
canal

Inverse of a matrix

Hello,
Looking at the source code, there is a comment in DenseVecMatrix for the inverse method (line 570: "get the inverse of the triangular matrix"). But I think since we support LU decomposition, we can inverse a non-triangular square matrix right ?

And also, the matrix inverse and multiplication is 'out-of-core' - that means the calculation is not limited to the available physical memory, am I correct ? I have a fairly large matrix (1 million x 1 million, double precision)

thank you for sharing the code,
canal

How to use MKL with saury and spark without root account?

After trials and errors, I finally made spark and saury work with MKL on my working cluster without su or sudo (I don't have the password for root). Here is the procedures:

Example environment: MKL, spark-1.0.2, saury

Package needed and the download path:
blas: > wget http://www.netlib.org/blas/blas.tgz
cblas: > wget http://www.netlib.org/blas/blast-forum/cblas.tgz
netlib-java: > git clone https://github.com/fommil/netlib-java.git

\0. prepare /lib and /include directory at home

mkdir ~/lib
cd ~/lib
ln -s /opt/intel/mkl/lib/intel64/libmkl_rt.so libblas.so.3
ln -s /opt/intel/mkl/lib/intel64/libmkl_rt.so liblapack.so.3
(symbolic link libblas.so.3 and liblapack.so.3 to libmkl_rt.so)
mkdir ~/include
export LD_LIBRARY_PATH=/home/***/lib

\1. build netlib BLAS

tar zxvf blas.tgz
cd BLAS/
make all
cp ./blas_LINUX.a ~/lib/blas.a

\2. build netlab CBLAS

tar zxvf cblas.tgz
cd CBLAS/
ln -s Makefile.LINUX Makefile.in (this step is required by CBLAS/README, but failed in my installation)
modify Makefile.in:
modify BLLIB, CBLIB, CBDIR (see CBLAS/README for detail)
make all
cp CBLAS/lib/cblas.a ~/lib/
cd CBLAS/include/
cp * ~/include/ (copied cblas_f77.h cblas.h to ~/include/)

\3. build netlib-java to get netlib-native_system-linux-x86_64-natives-1.1.jar, jniloader.jar and native_system-java.jar

cd netlib-java/
sed -i "s/1.2-SNAPSHOT/1.1/g" grep -rl 1.2-SNAPSHOT .
mvn package (build may fail, ignore it)
cd native_system/
mvn package (build may fail, ignore it)
cd xbuilds/
mvn package (build may fail, ignore it)
cd linux-x86_64/
mvn package (build may fail, ignore it)
vi target/netlib-native/com_github_fommil_netlib_NativeSystemBLAS.c

line 36
-- #include <cblas.h>
++ #include "/home/***/include/cblas.h"

cd ../../../netlib/JNI/
vi netlib-jni.c

line 2
-- #include <cblas.h>
++ #include "/home/***/include/cblas.h"

cd - (return to linux-x86_64/)
vi pom.xml

line 78, line 79
-- -lblas
-- -llapack
++ -lmkl_rt
line 54 to line 68
delete 15 lines:

com.github.fommil.netlib
generator


blas


lapack


arpack


mvn package (this build should succeed)
cd target/
ls
you should see netlib-native_system-linux-x86_64-natives.jar
cd lib/
ls
you should see jniloader.jar and native_system-java.jar

\4. build spark-1.0.2 with the jars we just get
reference: http://apache-spark-user-list.1001560.n3.nabble.com/Native-library-can-not-be-loaded-when-using-Mllib-PCA-td7042.html

(1). build the spark assembly once

./make-distribution.sh -Pnetlib-lgpl

(2). copy jniloader.jar, native_system-java-1.1.jar and netlib-native_system-linux-x86_64-1.1-natives.jar to $SPARK_HOME/lib_managed/natives .

(3). copy netlib-native_system-linux-x86_64-1.1-natives.jar to ~/.ivy2/cache/com.github.fommil.netlib/netlib-native_system-linux-x86_64/jars to replace the existing one. make sure the name is consistent with the original one.

(4). modify $SPARK_HOME/assembly/pom.xml add a plugin under build/plugins

        com.googlecode.addjars-maven-plugin         addjars-maven-plugin         1.0.5                                                         add-jars                                                                                                         ${basedir}/../lib_managed/natives                                                                                       (5). rebuild spark

Now you should have spark-1.0.2 with call to MKL as BLAS
Enjoy it!

Matrix 5000 x 5000 inverse

Hello,

I am trying to invert a 5000 x 5000 matrix at Google DataProc, (code below) the code already works for a 1000 x 1000 matrix in my local pc.

However, it seems something is happening when calling the inverse method, the job fails and I get this in the log :
Any ideas ?

LOG:

fourth
fifth
17/09/14 14:32:15 INFO org.apache.hadoop.mapred.FileInputFormat: Total input paths to process : 1
sixth
septh

[Stage 1:> (0 + 2) / 2]
[Stage 1:=============================> (1 + 1) / 2]

17/09/14 14:32:28 INFO com.github.fommil.jni.JniLoader: successfully loaded /tmp/jniloader3386225062470282445netlib-native_system-linux-x86_64.so
17/09/14 14:32:29 INFO com.github.fommil.jni.JniLoader: already loaded netlib-native_system-linux-x86_64.so

CODE
def main(args: Array[String]) {
System.out.println("first")
val conf = new SparkConf()
System.out.println("second")
conf.set("spark.default.parallelism","8")
System.out.println("third")
val sc = new SparkContext(conf)
System.out.println("fourth")
val SIZE = 5000
System.out.println("fifth")
val ma = sc.textFile("gs://sparkfilesjsaray/matr_5000.csv")
.map(line => line.split(",").map(_.toDouble)).zipWithIndex().map(line=> (line._2, BDV(line._1)) )
System.out.println("sixth")
val matrix = new DenseVecMatrix(ma,SIZE,SIZE)
System.out.println("septh")
val inverse = matrix.inverse()
System.out.println("eight")
inverse.saveToFileSystem("gs://sparkfilesjsaray/output5000.csv")
System.out.println("nine")
System.out.println("Done")
System.out.println("first")
}

error when compiling from master branch

Saw an error when compiling from the master branch:

MatrixSuite.scala:306: type mismatch;
found : Int (2)
required: (Int, Int, Int)
[ERROR] var result = ma.multiply(denVecMat, 2)
^

one error found

I think it should be something like:
val result = ma.multiply(denVecMat, (2, 2, 2))

canal

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.