Giter Site home page Giter Site logo

gnk-delegate / spetlr Goto Github PK

View Code? Open in Web Editor NEW

This project forked from spetlr-org/spetlr

0.0 0.0 0.0 1.72 MB

A python SPark ETL libRary (SPETLR) for Databricks. https://discord.gg/p9bzqGybVW

Home Page: https://spetlr.com

License: MIT License

Python 99.84% Jinja 0.16%

spetlr's Introduction

spetlr

A python ETL libRary (SPETLR) for Databricks powered by Apache SPark.

Visit SPETLR official webpage: https://spetlr.com/

NEWS

Support for LTS9.1 is ending. See the issue for discussions.

TransformerNC class will be removed permanently. Follow the PR.

Table of Contents

Description

SPETLR has a lot of great tools for working with ETL in Databricks. But to make it easy for you to consider why you need SPETLR here is a list of the core features:

  • ETL framework: A common ETL framework that enables reusable transformations in an object-oriented manner. Standardized structures facilitate cooperation in large teams.

  • Integration testing: A framework for creating test databases and tables before deploying to production in order to ensure reliable and stable data platforms. An additional layer of data abstraction allows full integration testing.

  • Handlers: Standard connectors with commonly used options reduce boilerplate.

For more information, visit SPETLR official webpage: https://spetlr.com/

Important Notes

This package can not be run or tested without access to pyspark. However, installing pyspark as part of our installer gave issues when other versions of pyspark were needed. Hence we took out the dependency from our installer.

Installation

Install SPETLR from PyPI: PyPI version PyPI

pip install spetlr

Development Notes

To prepare for development, please install these additional requirements:

  • Java 8
  • pip install -r test_requirements.txt

Then install the package locally

python setup.py develop

Testing

Local tests

After installing the dev-requirements, execute tests by running:

pytest tests

These tests are located in the ./tests/local folder and only require a Python interpreter. Pull requests will not be accepted if these tests do not pass. If you add new features, please include corresponding tests.

Cluster tests

Tests in the ./tests/cluster folder are designed to run on a Databricks cluster. The Pre-integration Test utilizes Azure Resource deployment - and can only be run by the spetlr-org admins.

To deploy the necessary Azure resources to your own Azure Tenant, run the following command:

.\.github\deploy\deploy.ps1 -uniqueRunId "yourUniqueId"

Be aware that the applied name for uniqueRunId should only contain lower case and numbers, and its length should not exceed 12 characters.

Afterward, execute the following commands:

.\.github\submit\build.ps1
.\.github\submit\submit_test_job.ps1

General Project Info

Github top language Github stars Github forks Github size Issues Open PyPI spetlr badge

Contributing

Feel free to contribute to SPETLR. Any contributions are appreciated - not only new features, but also if you find a way to improve SPETLR.

If you have a suggestion that can enhance SPETLR, please fork the repository and create a pull request. Alternatively, you can open an issue with the "enhancement" tag.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/NewSPETLRFeature)
  3. Commit your Changes (git commit -m 'Add some SEPTLRFeature')
  4. Push to the Branch (git push origin feature/NewSPETLRFeature)
  5. Open a Pull Request

Build Status

Post-Integration

Releases

Releases to PyPI is an Github Action which needs to be manually triggered.

Release PyPI spetlr badge

Contact

For any inquiries, please use the SPETLR Discord Server.

spetlr's People

Contributors

mrmasterplan avatar laujohansson avatar tbtdg avatar christianhelle avatar jeppe-blixen-dg avatar lajdelegate avatar farbo avatar jeppeblixen avatar frederikgjensen avatar martinboge avatar andersbjernaa avatar gustavnk avatar lasseaj avatar radekbuczkowski avatar okami1 avatar davidkallesen avatar gnk-delegate avatar perkops avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.