Giter Site home page Giter Site logo

matlab-parquet's Introduction

MATLAB Interface for Apache Parquet

Introduction

Apache™ Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regardless of the choice of data processing framework, data model or programming language.

The MATLAB interface for Apache Parquet provides for reading and writing of Apache Parquet files from within MATLAB. Functionality includes:

  • Read and write of local Parquet files
  • Access to meta data of a Parquet file
  • A MATLAB Datastore for reading Parquet files

For newer MATLAB releases, starting with R2019a, consider using the shipping Parquet support, see https://www.mathworks.com/help/releases/R2019b/matlab/parquet-files.html.

Requirements

MathWorks Products (http://www.mathworks.com)

  • Requires MATLAB release R2017b or newer

3rd Party Products:

For building the JAR file, please make sure the following products are already installed (or install & downlaod from provided links):

Apache Hadoop installation and configuration

Linux/MacOS

Download & unzip binaries from Apache Hadoop official website to a local folder.

Microsoft® Windows®

On Windows, a compatible utility version called winutils.exe can be downloaded from https://github.com/steveloughran/winutils/raw/master/hadoop-2.8.3/bin/winutils.exe. After download, we would recommend placing the executable under <repo_root>\Software\MATLAB\lib\hadoop\bin\winutils.exe

Note that you will need to first manually create the lib\hadoop\bin folders

More detailed information on Windows install can be found here.

Installation

Installation of the interface requires building the support package (Jar file) and setting the environment variable value for HADOOP_HOME. Before proceeding:

  • Install Java SDK and Maven.
  • Clone repository or download + unzip/tar latest sources release.
  • Create/Set HADOOP_HOME environment variable to point to Apache™ Hadoop® installation local folder (Linux/MacOS) or to the folder where winutils.exe executable is located (as suggested/explained below) (Windows)

The links to download these products are provided in the section 3rd party products.

To set the environment variable, please follow rules for your operating system. Please note, that this environment variable must be set prior to starting MATLAB. Changing the environment variable from within MATLAB will not have the desired effect.

Build the Jar file

To install the interface, you must first build the Jar file.

cd <this_repo>
cd Software/Java
mvn clean package

Install & Verify MATLAB package

Now you can open MATLAB and install the support package.

cd <this_repo>/Software
install

Restart MATLAB, and verify installation: Windows

parquetwin('verify')

In case of issues, please refer to the following documentation. Otherwise, you're good to go.

Linux

parquettools('meta')

Usage

To write a variable to a Parquet file:

A = magic(5);
parquetwrite('m5.parquet', A);

and you can read the same file with

B = parquetread('m5.parquet');

A few unit tests can be run with

results = runParquetTests()

For more details, look at the Basic Usage document.

Documentation

See documentation for more information.

License

The license for MATLAB interface for Parquet is available in the LICENSE.md file in this GitHub repository. This package uses certain third-party content which is licensed under separate license agreements. See the pom.xml file for third-party software downloaded at build time.

Enhancement Request

Provide suggestions for additional features or capabilities using the following link:
https://www.mathworks.com/products/reference-architectures/request-new-reference-architectures.html

Support

Email: [email protected]


matlab-parquet's People

Contributors

asollander avatar aelhelou avatar hosagrahara avatar dependabot[bot] avatar

Watchers

James Cloos avatar

Forkers

rmd13

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.