Giter Site home page Giter Site logo

lanicon / sharpetl Goto Github PK

View Code? Open in Web Editor NEW

This project forked from steelden/sharpetl

0.0 1.0 0.0 461 KB

SharpETL is a simple and flexible .NET library designed to aid in process of extract, transform and load of an unstructured data.

License: GNU Lesser General Public License v3.0

C# 100.00%

sharpetl's Introduction

SharpETL

SharpETL is a simple and flexible .NET library designed to aid in process of extract, transform and load of an unstructured data. It supports simple Python scripts to help with data extraction and manupulation.

Example

The very basic and straightforward configuration with 3 actions.

  1. Configure data source
ISource source1 = sourceFactory.CreateXlsSource(@"..\..\RandomData.xlsx", true);
  1. Configure processing script
string scriptText = @"
def OnElement(action, element):
  print '---OnElement'
  yield element          # just push recieved element forward
def OnCompleted(action):
  print '===OnCompleted'
  yield
  return
";
IScript script1 = scriptFactory.CreatePythonScript("pythonScript1", scriptText);
  1. Configure generator action
IAction action1 = actionFactory.CreateSourceAction("sourceAction1", source1);
  1. Configure transform action
ISource source2 = sourceFactory.CreateNullSource("nullSource1");
IAction action2 = actionFactory.CreateScriptAction("scriptAction1", source2, script1);
  1. Configure output action
IAction action3 = actionFactory.CreateBinarySerializeAction("binaryAction1", @"Data.bin");
  1. Create processing engine
IEngine engine = engineFactory.Create(c => {
    c.UseDefaultServiceResolver();
    c.UseDefaultContextProvider();
    c.UseReactivePlanner();
    c.AddLink(action1, action2);
    c.AddLink(action2, action3);
});
  1. Run it!
engine.Run();

Project Roadmap

  • Windows service.

    It would be very useful to have a windows service application that could run a number of preconfigured IEngines.

  • Visual configuration tool.

    UI is needed for creating and setting up configurations.
    It would greatly simplify process of setting up IEngine for non-programmers.

  • Documentation.

    The project really lacks documentation and comments throughout the code.
    Some demo projects would be nice too.

  • Tests.

    Project could use more tests.
    First of all tests of IAction derived classes.

  • XML configuration.

    Add IConfigurationData<T> for most of actions.

  • Fluent configuration.

    Add simplified fluent styled configuration.
    Should look like c.Fluently.FromXls().Debug().Join(...).ToBinary().

  • Messaging.

    Add support for sending and recieving IElement objects as messages.
    Potential candidates include ZeroMQ, MassTransit.

  • SQLite support.

    Due to missing freely available OleDB driver for SQLite support for accessing SQLite databases needs to be added manually.

  • Internal logging.

    Better internal logging needed to help debug misbehaving configurations.

  • Multithreading.

    As of now IEngine actions run in sequental single threaded mode because of debugging issues.
    Actions should be switched to multithreaded mode using .ObserveOn(Scheduler.NewThread) and tested under load.

  • Planners.

    EnumerablePlanner needs attention. Right now it is in unusable state.
    Implement and experiment with planner based on Disruptor-net technology.
    Together with ZeroMQ messaging could result in extremely fast data processing.

  • Code cleanup.

    All DataElement specific actions (SharpETL.Actions.Db) should be separated from SharpETL.Actions into a new project (SharpETL.DbActions).

sharpetl's People

Contributors

steelden avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.