Giter Site home page Giter Site logo

etl's Introduction

Extract Transform Load - Abstraction

Minimum PHP Version Latest Stable Version Latest Unstable Version License Tests

Description

Flow PHP ETL is a simple ETL (Extract Transform Load) abstraction designed to implement Filters & Pipes architecture.

Typical Use Cases

  • Sync data from external systems (API)
  • File processing
  • Pushing data to external systems
  • Data migrations

Using this library makes sense when we need to move data from one place to another, doing some transformations in between.

For example, let's say we must synchronize data from external API periodically, transform them into our internal data structure, filter out things that didn't change, and load in bulk into the database.

This is a perfect scenario for ETL.

Features

  • Low memory consumption even when processing thousands of records
  • Type safe Rows/Row/Entry abstractions
  • Filtering
  • Built in Rows objects comparison
  • Rich collection of Row Entries

Row Entries

Extensions

Extension provides generic, not really related to any specific data source/storage transformers/loaders.

Name Transformer Loader (write)
Transformers โœ… ๐Ÿšซ
Loaders ๐Ÿšซ โœ…

Adapters

Adapter connects ETL with existing data sources/storages and including some times custom data entries.

Name Extractor (read) Loader (write)
Memory โœ… โœ…
Doctrine - DB โœ… โœ…
Elasticsearch N/A โœ…
CSV โœ… โœ…
JSON โœ… N/A
XML โœ… N/A
HTTP โœ… N/A
Excel N/A N/A
Logger ๐Ÿšซ โœ…
  • โœ… - at least one implementation is available
  • ๐Ÿšซ - implementation not possible
  • N/A - not implementation available yet

โ— If adapter that you are looking for is not available yet, and you are willing to work on one, feel free to create one as a standalone repository. Well designed and documented adapters can be pulled into flow-php organization that will give them maintenance and security support from the organization.

Installation

composer require flow-php/etl:1.x@dev

Usage

<?php

use Flow\ETL\ETL;
use Flow\ETL\Extractor;
use Flow\ETL\Loader;
use Flow\ETL\Row;
use Flow\ETL\Rows;
use Flow\ETL\Transformer;

require_once __DIR__ . '/../vendor/autoload.php';

$extractor = new class implements Extractor {
    public function extract(): Generator
    {
        yield new Rows(
            Row::create(
                new Row\Entry\ArrayEntry('user', ['id' => 1, 'name' => 'Norbret', 'roles' => ['DEVELOPER', 'ADMIN']])
            )
        );
    }
};

$transformer = new class implements Transformer {
    public function transform(Rows $rows): Rows
    {
        return $rows->map(function (Row $row): Row {
            $dataArray = $row->get('user')->value();

            return Row::create(
                new Row\Entry\IntegerEntry('id', $dataArray['id']),
                new Row\Entry\StringEntry('name', $dataArray['name']),
                new Row\Entry\ArrayEntry('roles', $dataArray['roles'])
            );
        });
    }
};

$loader = new class implements Loader {
    public function load(Rows $rows): void
    {
        var_dump($rows->toArray());
    }
};

ETL::extract($extractor)
    ->transform($transformer)
    ->load($loader);

Error Handling

In case of any exception in transform/load steps, ETL process will break, in order to change that behavior please set custom ErrorHandler.

Error Handler defines 3 behavior using 2 methods.

  • ErrorHandler::throw(\Throwable $error, Rows $rows) : bool
  • ErrorHandler::skipRows(\Throwable $error, Rows $rows) : bool

If throw returns true, ETL will simply throw an error. If `skipRows' returns true, ETL will stop processing given rows, and it will try to move to the next batch. If both methods returns false, ETL will continue processing Rows using next transformers/loaders.

There are 3 build in ErrorHandlers (look for more in adapters):

Error Handling can be set directly at ETL:

ETL::extract($extractor)
    ->onError(new IgnoreError())
    ->transform($transformer)
    ->load($loader);

Development

In order to install dependencies please, launch following commands:

composer install

Run Tests

In order to execute full test suite, please launch following command:

composer build

It's recommended to use pcov for code coverage however you can also use xdebug by setting XDEBUG_MODE=coverage env variable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.