Giter Site home page Giter Site logo

mabel's Introduction

overlapping arrows

mabel is a Data Engineering platform designed to run in serverless environments.

mabel just runs when you need it, scaling to zero, making it efficient and ideal for deployments to platforms like Kubernetes, GCP Cloud Run, AWS Fargate and Knative.

License Status Regression Suite codecov Static Analysis PyPI Latest Release Maintainability Rating Security Rating mabel Downloads Code style: black commit_freq last_commit PyPI Latest Release FOSSA Status

Focus on What Matters

We've built mabel to enable Data Analysts to write complex data engineering tasks quickly and easily, so they could get on with doing what they do best.

from mabel import Reader

data = Reader(dataset="test_data")
print(data.count())

Key Features

  • On-the-fly compression
  • Low-memory requirements, even with terabytes of data
  • Indexing and partitioning of data for fast reads
  • Cursors for tracking reading position between processes
  • Partial SQL DQL (Data Query Language) support
  • Schema and data_expectations validation

Installation

From PyPI (recommended)

pip install --upgrade mabel

From GitHub

pip install --upgrade git+https://github.com/mabel-dev/mabel

A preview release of mabel is available from PyPI

pip install --upgrade mabelbeta

You may need to manually uninstall mabel before the test version will install.

These versions are usually labelled with an a (signifying alpha status) in the library version. Alpha versions are more likely to have functional issues.

Guides

How to Read Data

Dependencies

  • orjson for JSON (de)serialization
  • bitarray for handling high density boolean data
  • siphashc for non-cryptographic hashing
  • pydantic to define internal data models
  • zstandard for real-time on disk compression
  • LZ4 for real-time in memory compression
  • simdjson for fast JSON deserialization
  • cython for precompilation

There are a number of optional dependencies which are usually only required for specific features and functionality. These are listed in tests/requirements.txt.

Integrations

mabel comes with adapters for the following data services:

Service
GCP Storage Google Cloud Storage
MinIo MinIO
AWS S3 AWS S3
Azure Azure Blob Storage
Local Local Storage

Mabel is extensible with adapters for other data services as required.

Deployment and Execution

mabel supports running on a range of platforms, including:

Platform
Docker Docker
Kubernetes Kubernetes
Windows Windows (Notice1)
Linux Linux (Notice2)

Notice1 - Some non-core features are not available on Windows.
Notice2 - Tested on Debian (WSL) and Ubuntu.

How Can I Contribute?

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

If you have a suggestion for an improvement or a bug, raise a ticket or start a discussion.

Want to help build mabel? See the contribution guidance.

License

Apache 2.0

FOSSA Status

mabel's People

Contributors

cclauss avatar dobb1n avatar fossabot avatar gva-jjoyce avatar joocer avatar snyk-bot avatar xb500 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.