Giter Site home page Giter Site logo

ityutin / df-and-order Goto Github PK

View Code? Open in Web Editor NEW
3.0 1.0 2.0 150 KB

Using df-and-order your interactions with dataframes become very clean and predictable.

License: MIT License

Python 100.00%
dataframes pandas data-transformation machine-learning

df-and-order's Introduction

Python 3.7 CodeFactor Maintainability codecov

๐Ÿ—„๏ธ df-and-order

Yeah, it's just like Law & Order, but Dataframe & Order!

pip install df_and_order

Using df-and-order your interactions with dataframes become very clean and predictable.

  • Tired of absolute file paths to data in shared notebooks in your repository?
  • Can't remember how your datasets were generated?
  • Want to have safe and reproducible data transformations?
  • Like declarative config-based solutions?

Good news for you!

How it looks in code?

Imagine the world where all you need to do for reading some dataframe you need just a few lines:

reader = MagicDfReader()
df = reader.read(df_id='user_activity_may_2020')

Maybe you are interested in some transformed version of that dataframe? No problem!

reader = MagicDfReader()
# ready to fit a model on!
model_input_df = reader.read(df_id='user_activity_may_2020', transform_id='model_input')

Wow. Is it really magic?

df-and-order works with yaml configs. Every config contains metadata about a dataset as well as all desired transfomations. Here's an example:

df_id: user_activity_may_2020  # here's the dataframe identifier
initial_df_format: csv
metadata:  # this section contains some useful information about the dataset
  author: Data Man
  data_collection_date: 2020-05-01
transforms:
  model_input:  # here's the transform identifier
    df_format: csv
    in_memory:  # means we want to perform transformations in memory every time we calling it, permanent transforms are supported as well
    - module_path: df_and_order.steps.pd.DropColsTransformStep  # file where to find class describing some transformation. this one drops columns
      params:  # init params for the transformation class
        cols:
        - redundant_col
    - module_path: df_and_order.steps.DatesTransformStep  # another transformation that converts str to datetime
      params:
        cols:
        - date_col

Okay, what exactly is a df-and-order's transform?

Every transformation is about changing an initial dataset in any way.

A transformation is made of one or many steps. Each step represents some operation. Here are examples of such operations:

  • dropping cols
  • adding cols
  • transforming existing cols
  • etc

df-and-order uses subclasses of DfTransformStepConfig to describe a step. It's possible and highly recommended to declare init parameters for any step in config. Using Single Responsibility principle we achieve a granular control over our entire transformation.

Just by looking at the config you can say how the transformed dataframe was created.

Take a look at the more detailed overview to find more exciting stuff.

I also wrote an article to describe the benefits, check it out! There are lemurs and stuff.

Hope the lib will help somebody to boost the productivity.

df-and-order's People

Contributors

ityutin avatar

Stargazers

 avatar Mikhail avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.