Flow PHP ETL is a simple ETL (Extract Transform Load) abstraction designed to implement Filters & Pipes architecture.
- Sync data from external systems (API)
- File processing
- Pushing data to external systems
- Data migrations
Using this library makes sense when we need to move data from one place to another, doing some transformations in between.
For example, let's say we must synchronize data from external API periodically, transform them into our internal data structure, filter out things that didn't change, and load in bulk into the database.
This is a perfect scenario for ETL.
- Low memory consumption even when processing thousands of records
- Type safe Rows/Row/Entry abstractions
- Filtering
- Built in Rows objects comparison
- Rich collection of Row Entries
- ArrayEntry
- BooleanEntry
- CollectionEntry
- DateTimeEntry
- FloatEntry
- IntegerEntry
- NullEntry
- ObjectEntryEntry
- StringEntry
- StructureEntry
Extension provides generic, not really related to any specific data source/storage transformers/loaders.
Name | Transformer | Loader (write) |
---|---|---|
Transformers | โ | ๐ซ |
Loaders | ๐ซ | โ |
Adapter connects ETL with existing data sources/storages and including some times custom data entries.
Name | Extractor (read) | Loader (write) |
---|---|---|
Memory | โ | โ |
Doctrine - DB | โ | โ |
Elasticsearch | N/A | โ |
CSV | โ | โ |
JSON | โ | N/A |
XML | โ | N/A |
HTTP | โ | N/A |
Excel | N/A | N/A |
Logger | ๐ซ | โ |
- โ - at least one implementation is available
- ๐ซ - implementation not possible
N/A
- not implementation available yet
โ If adapter that you are looking for is not available yet, and you are willing to work on one, feel free to create one as a standalone repository.
Well designed and documented adapters can be pulled into flow-php
organization that will give them maintenance and security support from the organization.
composer require flow-php/etl:1.x@dev
<?php
use Flow\ETL\ETL;
use Flow\ETL\Extractor;
use Flow\ETL\Loader;
use Flow\ETL\Row;
use Flow\ETL\Rows;
use Flow\ETL\Transformer;
require_once __DIR__ . '/../vendor/autoload.php';
$extractor = new class implements Extractor {
public function extract(): Generator
{
yield new Rows(
Row::create(
new Row\Entry\ArrayEntry('user', ['id' => 1, 'name' => 'Norbret', 'roles' => ['DEVELOPER', 'ADMIN']])
)
);
}
};
$transformer = new class implements Transformer {
public function transform(Rows $rows): Rows
{
return $rows->map(function (Row $row): Row {
$dataArray = $row->get('user')->value();
return Row::create(
new Row\Entry\IntegerEntry('id', $dataArray['id']),
new Row\Entry\StringEntry('name', $dataArray['name']),
new Row\Entry\ArrayEntry('roles', $dataArray['roles'])
);
});
}
};
$loader = new class implements Loader {
public function load(Rows $rows): void
{
var_dump($rows->toArray());
}
};
ETL::extract($extractor)
->transform($transformer)
->load($loader);
In case of any exception in transform/load steps, ETL process will break, in order to change that behavior please set custom ErrorHandler.
Error Handler defines 3 behavior using 2 methods.
ErrorHandler::throw(\Throwable $error, Rows $rows) : bool
ErrorHandler::skipRows(\Throwable $error, Rows $rows) : bool
If throw
returns true, ETL will simply throw an error.
If `skipRows' returns true, ETL will stop processing given rows, and it will try to move to the next batch.
If both methods returns false, ETL will continue processing Rows using next transformers/loaders.
There are 3 build in ErrorHandlers (look for more in adapters):
Error Handling can be set directly at ETL:
ETL::extract($extractor)
->onError(new IgnoreError())
->transform($transformer)
->load($loader);
In order to install dependencies please, launch following commands:
composer install
In order to execute full test suite, please launch following command:
composer build
It's recommended to use pcov for code coverage however you can also use
xdebug by setting XDEBUG_MODE=coverage
env variable.