A lightweight framework to host your ETL data-pipeline
GitHub: https://github.com/elau1004/ETLite E-Mail: [email protected]
project
A lightweight framework to host your ETL data-pipeline
License: MIT License
A lightweight framework to host your ETL data-pipeline
GitHub: https://github.com/elau1004/ETLite E-Mail: [email protected]
project
We need a standalone SQLite DB with the option to swap it out for a Client-Server DB such as PostgreSQL or MS-SQL Server express.
This database shall have the job definition and the following functionalities:
Use Case:
The core engine shall execute a directed acyclic graph (DAG) of jobs. It need to provide the best throughput experience.
The engine shall the following features:
Use Case:
We need the public documentation to be on this project wiki
https://github.com/elau1004/ETLite/wiki
The front page README.MD need to be keep current.
After the profile result is persisted, we need to validate its shape to determine if the profile is within a configured threshold.
Threshold can the following limits:
If the profiled data fails any of the certification rules, an alert need to sent.
Use Case:
This framework need to be package and be deployed out to the internet for the world to download and install. It should be able to be install using pip.
Use Case:
If all goes well, I as a Data Engineer, should be able to install the package using:
pip install etlite
Each media, especially Rest API, should have a default workflow methods to be invoke by the framework. returning a None value means this method is to be skipped.
We should have a interactive GUI front end to interface with the database. We should NOT implement this as a web application for the user reach is not required.
Use the following colors for status:
We should be able to invoke this framework from the command line or from the scheduler or from a serverless trigger.
Use Case:
As a data engineer, for me to develop a new ETL, I would typically perform the following steps:
pipenv shell
Start my virtual environment.
pip install etlite
The above will install this framework we are building from PyPi.
etlite init
Once the Python framework is installed, we initialize a new default directory structure (which should be very similar to out project directory structure). If a directory exist, it should warn and prompt for an over-write.
etlite init db
This step needs to be run once to initialize the Job Control DB where we store our job configuration and other collected metrics as we run our job. If we change in the config file to point to a different DB, a new set of DB objects shall be create there. Re-running this step should not corrupt your DB.
etlite new Code "Job Description"
This step register a new unique job code and a description in the DB. Furthermore, a templated code for our new job shall be created that requires further modification. If a file exist, it should warn and prompt for an over-write.
etlite list TABLE
This step list the rows in the ETL Control DB. The default is to list the rows in ascending ID order. The value for table need to be case in-sensitive. This command can have options parameter that will paginate the display. The default would be:
etlite run Code
This step run the "jobCode" ETL. The default generated ETL script should run without error but does nothing until we enhance it. This command can have many options parameters that will be validated, such as:
This command should also provide task(certain part of the job) based options
A fresh set of data that is process should be profiled. Each field should be aggregated with:
The profiled data results should be persisted into the database.
Ideally, we should collected these stats are we process the data rather than post ingestion.
Use Case:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.