Giter Site home page Giter Site logo

m0h3en / bndf Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 3.79 MB

Structured Big data framework based on Apache Spark for storing and manipulating large scale multi-channel neurophysiological recording data

License: Apache License 2.0

Scala 100.00%
hpc big-data computational-neuroscience

bndf's Introduction

BNDF

Introduction

BNDF is a library for storing and processing large-scale single(multi) unit and multichannel array recording data in a distributed manner. This library is build on top of Apache Spark and Apache Hadoop. For storing large-scale raw data, Apache Parquet a columnar data structure and Apache Hive are used on top of Hadoop distributed file system (HDFS). Meta-data information of raw data, are constructed as nested JSON files and stored in mongoDB. BNDF's APIs can be used in Scala, java, python, R and partially in Matlab.

Key Advantages

BNDF provide capabilities including, but not limited to:

  • Scalable data processing.
  • Efficient and fast data storage for experimenters.
  • Efficient and fast data processing for data analyst.
  • A major movement toward standardized data and meta-data format.

File Format

Currently, BNDF supports MAT files as raw input data with conditions described in MAT File Library.

Getting Started

Building from Source

You will need to have Apache Maven 3.6.0 or later installed in order to compile BNDF.

$ git clone https://github.com/M0h3eN/bndf.git
$ cd bndf
$ mvn install 

Run and Deployment

BNDF could run on any cluster or single machine running and configured following tools

BNDF executive jar file take two parameters in the following order

  • DATA_PATH
  • MONGO_URI
$ spark-submit \ 
    --class com.ipm.nslab.bndf.${BNDF_MODULE_NAME} \
    --master ${SPARK_MASTER(s)_URL} | yarn | mesos \
    --deploy-mode client | cluster \ 
    --executor-memory ${SPARK_EXECUTOR_MEMORY}G \
    --total-executor-cores ${SPARK_EXECUTOR_CORES} \
    --driver-memory ${SPARK_DRIVER_MEMORY}G \
    PATH_TO_BNDF_JAR_FILE/bndf-${JAR_FILE_VERSION}.jar DATA_PATH  MONGO_URI

Spark-submit's parameters detailed information are available in submitting-applications. For creating a private cluster and information about other runtime parameters not discussed here, see BndfCluster.

Data Locality

It is very important input data placed in the same network as cluster or, a fast access network that communicates with cluster's network.

  • Create a shared storage accessible by all the nodes in the cluster.
  • Copy input data directly in HDFS using WebHDFS REST API or other protocol like Mountable HDFS.

Otherwise, data locality could create major bottlenecks while processing data with Spark.

Limitation, Current, and Future work

Currently, BNDF is at its early stage of development and require various extensions to be a fully functional framework. These are the most well-known cases:

  • Currently, BNDF only supports MAT files. I am working on adding commonly used file format like, Nwb, Nix, and Nio.
  • The functionality of BNDF schema, or generally its parser is limited. Since most of the file formats which is commonly used in the neuroscience community are somehow based on the HDF5 file format, I decided to create a general parser based on the HDF5 file format for converting commonly used file format into a more distributed friendly structure.
  • From the processing perspective, BNDF currently has only distributed spike sorter module. Various extensions should be considered here, like:
    1. The Spike sorter module needs to be improved, and generalized in a way that it could handle more up-to-date and complex spike sorting algorithms.
    2. Different distributed processing algorithms should be added to BNDF.
  • Currently, the usage of BNDF is restricted to a spark-submit job which is not trivial from the user's point of view. Several considerations may apply for ease of use in future works, for example:
    1. Creation of a shell-based application for interacting easily with BNDF.
    2. Creation of a web-based application to access BNDF core functionality even more easily.

Documentation

BNDF documentation are available in bndf-doc.

bndf's People

Contributors

m0h3en avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.