Giter Site home page Giter Site logo

exatrkxdataio's Introduction

Exa.TrkX Data IO

Generic data reader for common ML routine.

Objective

Train multiple model though a pipeline with different input and output data format can be very annoying. For example, when we want to investigate performance of each stage, we need to deal different data format for each stage and this usually requires lots of hard-coding.

This package is aim to simplify this process by providing additional data format definition file. Though this file, data can be read, process, and form suitable dataframe for later use in clean, readable way.

Features

  • Merge data to single object across files from different locations.
  • Data processing API for multi-dimensional data.
  • Extensible API for custom data format.

TODO

  • Enhance data processing API.
  • Support shared data for all events.
  • Provide data post-processing API to create missing column automatically.

Preparation

Install

You can clone this project and install with

pip3 install -e .

and walk through examples, or install package only by

pip3 install git+https://github.com/rlf23240/ExaTrkXDataIO

Testing

Before you start using this package, it is highly recommended seeing some examples in examples folder. To run the example, you need:

  • Install package using pip3 install -e .

  • Get data and place at least 10 event under examples/data. In this example, we use particles/event{evt_id}-particles.csv and feature_store/{evt_id} files. It should be placed as following:

    data

  • Read through examples/configs/reader/default.yaml and examples/read.py to see how configuration file works.

  • Run examples/read.py.

Customization

EventFileParser

EventFileParser is responsible for loading data from file and extract desired columns from it. To customize file parsing, you can inherit EventFileParser and implement following two method:

  • load(self, path: Path) -> Any:

    Load your data from file and return it here.

  • extract(self, data: Any, tag: str) -> np.array:

    Extract column from data you previously loaded in load method here.

Finally, declare your parser in configuration file and you are way to go.

EventDataProcessor

EventDataProcessor is responsible for process data into suitable way to fit into a column of dataframe. For flexibility, the process is breaking into series of procedure and you are free to define your custom step. To customize data processing, you can inherit EventDataProcessor and implement following method:

  • process(self, data: np.array, **kwargs) -> np.array:

    Process data and return your result here. No need to constrain yourself to return 1-D array, it is responsibility for user to guarantee the processing pipeline only resulting an 1-D array.

Finally, declare your processor in configuration file and you are way to go.

exatrkxdataio's People

Contributors

rlf23240 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.