Giter Site home page Giter Site logo

erdal-pb / torcharrow Goto Github PK

View Code? Open in Web Editor NEW

This project forked from pytorch/torcharrow

0.0 0.0 0.0 9.74 MB

High performance model preprocessing library on PyTorch

License: BSD 3-Clause "New" or "Revised" License

Shell 1.83% C++ 19.87% Python 69.94% CMake 0.48% Jupyter Notebook 7.88%

torcharrow's Introduction

TorchArrow: a data processing library for PyTorch

This library currently does not have a stable release. The API and implementation may change. Future changes may not be backward compatible.

TorchArrow is a torch.Tensor-like Python DataFrame library for data preprocessing in PyTorch models, with two high-level features:

  • DataFrame library (like Pandas) with strong GPU or other hardware acceleration (under development) and PyTorch ecosystem integration.
  • Columnar memory layout based on Apache Arrow with strong variable-width and nested data support (such as string, list, map) and Arrow ecosystem integration.

Installation

You will need Python 3.7 or later. Also, we highly recommend installing an Miniconda environment.

First, set up an environment. If you are using conda, create a conda environment:

conda create --name torcharrow python=3.7
conda activate torcharrow

Version Compatibility

The following is the corresponding torcharrow versions and supported Python versions.

torch torcharrow python
main / nightly main / nightly >=3.7, <=3.10
1.13.0 0.2.0 >=3.7, <=3.10

Colab

Follow the instructions in this Colab notebook

Nightly Binaries

Experimental nightly binary on macOS (requires macOS SDK >= 10.15) and Linux (requires glibc >= 2.17) for Python 3.7, 3.8, and 3.9 can be installed via pip wheels:

pip install --pre torcharrow -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html

From Source

If you are installing from source, you will need Python 3.7 or later and a C++17 compiler.

Get the TorchArrow Source

git clone --recursive https://github.com/pytorch/torcharrow
cd torcharrow
# if you are updating an existing checkout
git submodule sync --recursive
git submodule update --init --recursive

Install Dependencies

On macOS

HomeBrew is required to install development tools on macOS.

# Install dependencies from Brew
brew install --formula ninja flex bison cmake ccache icu4c boost gflags glog libevent

# Build and install other dependencies
scripts/build_mac_dep.sh ranges_v3 fmt double_conversion folly re2

On Ubuntu (20.04 or later)

# Install dependencies from APT
apt install -y g++ cmake ccache ninja-build checkinstall \
    libssl-dev libboost-all-dev libdouble-conversion-dev libgoogle-glog-dev \
    libgflags-dev libevent-dev libre2-dev libfl-dev libbison-dev
# Build and install folly and fmt
scripts/setup-ubuntu.sh

Install TorchArrow

For local development, you can build with debug mode:

DEBUG=1 python setup.py develop

And run unit tests with

python -m unittest -v

To build and install TorchArrow with release mode:

python setup.py install

License

TorchArrow is BSD licensed, as found in the LICENSE file.

torcharrow's People

Contributors

wenleix avatar facebook-github-bot avatar bearzx avatar damianr99 avatar oswinc avatar vancexu avatar kgpai avatar scotts avatar mbasmanova avatar ylgh avatar andrewaikens87 avatar tianshu-bao avatar nayef211 avatar hanqiwu0704 avatar kit1980 avatar miiiira avatar laithsakka avatar kedar-parab avatar dracifer avatar ejguan avatar amyreese avatar syscl avatar youmeiz avatar waitingkuo avatar yellpine avatar thatch avatar stroxler avatar syoummer avatar pedroerp avatar parmeet avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.