Giter Site home page Giter Site logo

source's Introduction

This is the implementation of paper DAve-QN: A Distributed Averaged Quasi-Newton Method with Local Superlinear Convergence Rate, accepted at the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS-2020).

This is a high performance implementation in C using MPI. In order to compare to the state-of-the-art, we have implemented GIANT, DAve-RPG and DANE with all needed scripts to run.

We also provide a MATLAB implementation of DAve-QN for further use.

Requirements

  • Intel MKL 11.1.2
  • MVAPICH2/2.1
  • mpicc 14.0.2

Compilation

First we have to set the environment variable MKLROOT. This depends on the path that MKL is installed. For default installation on LINUX systems:

$ export MKLROOT=/opt/intel/mkl/

then we can compile the code using the provided makefile:

$ make

Tests

DAve-QN accepts multiple parameters as input. A typical test looks like this:

$ mpirun -np 3 dave_qn.o /path/to/mnist 60000 9994156 780 40 1 0.01 1

This will run dave_qn on mnist dataset with 3 processors (2 workers and 1 master, indicated by -np). The input parameters to dave_qn are described as below:

mpirun -np [number of processors] [path] [nrows] [nnz] [ncols] [iterations] [lambda] [gamma] [freq]

  • path: full path to the dataset
  • nrows: number of samples in the dataset
  • nnz: number of non-zeros in the dataset
  • ncols: number of columns in the dataset
  • iterations: number if iterations to run
  • lambda: regularization parameter
  • gamma: initial step size for better initialization – this should be very small, usually 1e-3 or 1e-4.
  • Freq: frequency of computing objective function

A simple bash file is provided that can run mnist on 2 workers and one master. In order to run, mnist dataset has to be split in two and put in a directory called dataset next to the code. Therefore, dataset folder should contain mnist, mnist-0 and mnist-1 which are respectively the main dataset, the first split and the second split. Then, you can simply run the code using:

$ sh test.sh

NOTE: you can find all the scripts for tests in the scripts directory.

MATLAB Implementation

We also provide a MATLAB implementation for DAve-QN which you can find it in "MATLAB Code" directory. You will need LIBSVM LIBSVM.

How to split a dataset?

We provide a simple jar file that can split a dataset into arbitrary pieces. For example, to split the mnist dataset into 2 parts, you should put mnist in the dataset folder. Then, simply run the following:

$ java -jar Split.jar /path/to/dataset mnist 60000 2

To be more precise, Split.jar accepts the following parameters:

$ java -jar Split.jar path filename nrows nparts 

where path indicates the directory that contains the main dataset, filename is the dataset name, nrows is the number of rows in the dataset and nparts is the number of parts.

NOTE: you can find mnistSplit.sh in the scripts folder and run sh mnistSplit.sh.

Output

The output of the code contains three columns. First column is the time in milliseconds, second column is the objective function value and third column is the norm of the gradient.

Troubleshooting

If you get an error as mpirun was unable to find the specified executable file..., most likely it means that you have not compile the code properly. Make sure you have compiled the code using “make” command without any error.

If you get an error as File not found!, this means that one of the needed files for the dataset is not present. Make sure you put the dataset in the proper destination and you split it before running the code. Please refer to How to split a dataset section.

source's People

Contributors

dave-qn avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.