Giter Site home page Giter Site logo

jdtcc / aistore Goto Github PK

View Code? Open in Web Editor NEW

This project forked from nvidia/aistore

0.0 0.0 0.0 63.86 MB

AIStore: scalable storage for AI applications

Home Page: https://aiatscale.org

License: MIT License

Shell 3.94% Python 6.76% Go 85.31% Makefile 0.41% HCL 0.05% Jupyter Notebook 3.35% Dockerfile 0.18%

aistore's Introduction

AIStore is a lightweight object storage system with the capability to linearly scale out with each added storage node and a special focus on petascale deep learning.

License Go Report Card

AIStore (AIS for short) is a built from scratch, lightweight storage stack tailored for AI apps. It's an elastic cluster that can grow and shrink at runtime and can be ad-hoc deployed, with or without Kubernetes, anywhere from a single Linux machine to a bare-metal cluster of any size.

AIS consistently shows balanced I/O distribution and linear scalability across arbitrary numbers of clustered nodes. The ability to scale linearly with each added disk was, and remains, one of the main incentives. Much of the initial design was also driven by the ideas to offload custom dataset transformations (often referred to as ETL). And finally, since AIS is a software system that aggregates Linux machines to provide storage for user data, there's the requirement number one: reliability and data protection.

Features

  • Deploys anywhere. AIS clusters are immediately deployable on any commodity hardware, on any Linux machine(s).
  • Highly available control and data planes, end-to-end data protection, self-healing, n-way mirroring, erasure coding, and arbitrary number of extremely lightweight access points.
  • REST API. Comprehensive native HTTP-based API, as well as compliant Amazon S3 API to run unmodified S3 clients and apps.
  • Unified namespace across multiple remote backends including Amazon S3, Google Cloud, and Microsoft Azure.
  • Network of clusters. Any AIS cluster can attach any other AIS cluster, thus gaining immediate visibility and fast access to the respective hosted datasets.
  • Turn-key cache. Can be used as a standalone highly-available protected storage and/or LRU-based fast cache. Eviction watermarks, as well as numerous other management policies, are per-bucket configurable.
  • ETL offload. The capability to run I/O intensive custom data transformations close to data - offline (dataset to dataset) and inline (on-the-fly).
  • File datasets. AIS can be immediately populated from any file-based data source (local or remote, ad-hoc/on-demand or via asynchronus batch).
  • Small files. Sharding. To serialize small files, AIS supports TAR, TAR.GZ, ZIP, and MessagePack formats, and provides the entire spectrum of operations to make the corresponding sharding transparent to the apps.
  • Kubernetes. Provides for easy Kubernetes deployment via a separate GitHub repo and AIS/K8s Operator.
  • Command line management. Integrated powerful CLI for easy management and monitoring.
  • Access control. For security and fine-grained access control, AIS includes OAuth 2.0 compliant Authentication Server (AuthN). A single AuthN instance executes CLI requests over HTTPS and can serve multiple clusters.
  • Distributed shuffle extension for massively parallel resharding of very large datasets.
  • Batch jobs. APIs and CLI to start, stop, and monitor documented batch operations, such as prefetch, download, copy or transform datasets, and many more.

AIS runs natively on Kubernetes and features open format - thus, the freedom to copy or move your data from AIS at any time using the familiar Linux tar(1), scp(1), rsync(1) and similar.

For developers and data scientists, there's also:

For the original AIStore white paper and design philosophy, for introduction to large-scale deep learning and the most recently added features, please see AIStore Overview (where you can also find six alternative ways to work with existing datasets). Videos and animated presentations can be found at videos.

Finally, getting started with AIS takes only a few minutes.


Deployment options

AIS deployment options, as well as intended (development vs. production vs. first-time) usages, are all summarized here.

Since prerequisites boil down to, essentially, having Linux with a disk the deployment options range from all-in-one container to a petascale bare-metal cluster of any size, and from a single VM to multiple racks of high-end servers. But practical use cases require, of course, further consideration and may include:

Option Objective
Local playground AIS developers and development, Linux or Mac OS
Minimal production-ready deployment This option utilizes preinstalled docker image and is targeting first-time users or researchers (who could immediately start training their models on smaller datasets)
Easy automated GCP/GKE deployment Developers, first-time users, AI researchers
Large-scale production deployment Requires Kubernetes and is provided via a separate repository: ais-k8s

Further, there's the capability referred to as global namespace: given HTTP(S) connectivity, AIS clusters can be easily interconnected to "see" each other's datasets. Hence, the idea to start "small" to gradually and incrementally build high-performance shared capacity.

For detailed discussion on supported deployments, please refer to Getting Started.

For performance tuning and preparing AIS nodes for bare-metal deployment, see performance.

Installing from release binaries

Generally, AIStore (cluster) requires at least some sort of deployment procedure. There are standalone binaries, though, that can be built from source or, alternatively, installed directly from GitHub:

$ ./deploy/scripts/install_from_binaries.sh --help

The script installs aisloader and CLI from the most recent, or the previous, GitHub release. For CLI, it'll also enable auto-completions (which is strongly recommended).

PyTorch integration

AIS is one of the PyTorch Iterable Datapipes.

Specifically, TorchData library provides:

to list and, respectively, load data from AIStore.

Further references and usage examples - in our technical blog at https://aiatscale.org/blog:

Since AIS natively supports a number of remote backends, you can also use (PyTorch + AIS) to iterate over Amazon S3 and Google Cloud buckets, and more.

Reuse

This repo includes SGL and Slab allocator intended to optimize memory usage, Streams and Stream Bundles to multiplex messages over long-lived HTTP connections, and a few other sub-packages providing rather generic functionality.

With a little effort, they all could be extracted and used outside.

Guides and References

License

MIT

Author

Alex Aizman (NVIDIA)

aistore's People

Contributors

alex-aizman avatar virrages avatar vladimirmarkelov avatar knopt avatar saiprashanth173 avatar mjnovice avatar sasanap avatar aaronnw avatar cicovic-andrija avatar prytu avatar liangdrew avatar haochengg avatar gaikwadabhishek avatar ruyangl avatar hondhan avatar jhc1210 avatar rkoo19 avatar bforbesnvidia avatar grmaltby avatar shrirama avatar smkuls avatar dhruvaalam avatar usmanong avatar ryan-beisner avatar soumyabk avatar straill-nvidia avatar ambarsarkar avatar mike-gee avatar dineshkumarmp9 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.