Giter Site home page Giter Site logo

0xibra / fluxify Goto Github PK

View Code? Open in Web Editor NEW
2.0 1.0 0.0 167 KB

A micro python library that retrieves and organizes data from a yaml mapping.

Python 99.86% Dockerfile 0.14%
python mapping data-structure data-flow data-flow-control data-manipulation data-mapper yaml-mapping yaml csv json xml

fluxify's Introduction

Fluxify

A Python package that eases the process of retrieving, organizing and altering data.

Required packages

  • pandas
  • imperium
  • ijson

Installation

pip install fluxify

Main classes

fluxify.mapper.Mapper

This class is used read and processing fast files with small amounts of data that can be loaded into memory.

fluxify.lazy_mapper.LazyMapper

You've probable guessed it, this class is used to iterate on large files of data wether it is of format CSV, JSON or XML.

Usage

Retrieve data from a simple CSV file

id,brand,price,state,published_at
938,Xaomi,390.90,used,2020-01-03 12:32:29
04593,iPhone,1299.90,new,2020-01-02 09:48:12

Mapper implementation

from fluxify.mapper import Mapper
import yaml

# Could also be loaded from a file
yamlmapping = """
brand:
    col: 1
price:
    col: 2
state:
    col: 3
publish_date:
    col: 4
    transformations:
        - { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
    conditions:
        -
            condition: "subject['state'] == 'new'"
            returnOnSuccess: True
            returnOnFail: False
"""

Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = Mapper(_type='csv')
data = mapper.map('path/to/csvfile.csv', Map)
print(data)

Output

[
    {
        'brand': 'Xaomi',
        'price': '390.90',
        'state': 'used',
        'published_date': '12:32 03/01/2020'
        'is_new': False
    },
    {
        'brand': 'iPhone',
        'price': '1299.90',
        'state': 'new',
        'published_date': '09:48 02/01/2020'
        'is_new': True
    }
]

LazyMapper implementation

The LazyMapper does not return all the mapped data at the end,
instead it maps the data in small sizes that you can specify in order to not max out the memory.

from fluxify.lazy_mapper import LazyMapper
import yaml

# Could also be loaded from a file
yamlmapping = """
brand:
    col: 1
price:
    col: 2
state:
    col: 3
publish_date:
    col: 4
    transformations:
        - { transformer: 'date', in_format: '%Y-%m-%d %H:%M:%S', out_format: '%H:%M %d/%m/%Y' }
is_new:
    conditions:
        -
            condition: "subject['state'] == 'new'"
            returnOnSuccess: True
            returnOnFail: False
"""

Map = yaml.load(yamlmapping, Loader=yaml.FullLoader)
mapper = LazyMapper(_type='csv', error_tolerance=True, bulksize=500)
mapper.map('path/to/csvfile.csv', Map)

def some_callback(results):
    for item in results:
        pass # Perform some action

mapper.set_callback(some_callback)

mapper.map('path/to/csvfile.csv', Map)

As you can see, in this example the mapper will call the callback function every time it accumulates 500 mapped items.

Mapping settings

col key is used to specify the column number or attribute name from where the value must be retrieved.
If you want to specify the input data as the retrieved value use _all_ as the value of col

transformations key is used to apply transformations to the retrieved value. Available transformers are listed below.

conditions key is used to apply conditions and alter the retrieved value.
These conditions are in Python syntax, but you may not use all of Python's native functions.
Available functions are listed below.

default is used to define a default value for when a retrieved value is null.
Warning: If the default key is defined with a value, it will be applied before applying transformations and conditions.

Special cases for JSON and XML

XML
Set the multiple to true if you want to retrieve data from multiple XML tags with the same name.
Use the index key with multiple: true if you wish to retrieve only one value from a number of XML tags.

When retrieving a XML value, the default behaviour is to retrieve the .text value of the tag.
If you wish to change this, to retrieve a tag containing many other tags, use raw key and set it to false.
This will return you an object of type xml.etree.Element, you could later apply transformations on this object to alter, organize and retrieve the data.

JSON
Use index key to retrieve a specific value from an array.
Of course, it only works if the retrieved value is of type array.

Supported formats

Format CSV JSON XML TXT
Supported YES YES YES NO

Transformers

Fluxify has built-in transformers that can alter/modify the data.

Function Arguments Description
number stringvalue Parses a string to an integer or float value
split delimiter, index Splits a string into parts with a delimiter and returns the splitted result if the index argument is not defined.
date in_format, out_format Let's you format a date string to the desired format.
replace search, new Replaces the search value with new value from string
boolean No arguments Parses a string to Boolean if the string contains [true
equipments_from_string delimiter Custom usage
options_from_string delimiter Custom usage

Exceptions

Fluxify has different Exception classes for different reasons They reside in the exceptions sub-package fluxify.exceptions

Class Arguments Description
ArgumentNotFoundException message This exception is raised whenever a argument is not found.
InvalidArgumentException message This exception is raised when a passed parameter/argument is invalid.
ConditionNotFoundException message This exception is raised when the "condition" key is not defined in the mapping.
UnsupportedTransformerException message This exception is raised when a transformer other than the ones defined above, is used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.