Giter Site home page Giter Site logo

precon's Issues

Add pre-commit hooks for devs

I want to add some pre-commit hooks for developers.

  • Remove whitespace

  • Flake 8 linting

  • Check commit msg subject len

Chang to applying _get_adjustments in round_and_adjust function

Change from the following:

    elif isinstance(obj, pd.core.frame.DataFrame):

        # Create an empty DataFrame to fill with adjustments
        adjustments = pd.DataFrame().reindex_like(obj)

        for index, row in iter_method(obj):
            # Create a selector based on the axis
            slice_ = axis_slice(index, axis)

            adjustments.loc[slice_] = _get_adjustments(row, decimals)

to this:

    elif isinstance(obj, pd.core.frame.DataFrame):

        adjustments = obj.apply(_get_adjustments, args=(decimals), axis=axis)

This should also allow for the removal of:

    iter_dict = {
        0: pd.DataFrame.iterrows,
        1: pd.DataFrame.iteritems,
    }
    iter_method = iter_dict.get(axis)

Slimming the function right down.

While taking care of this, remember to also do the following:

  • - Ensure empty line at EOF
  • - change the isinstance calls so that we're removing core.Series./core.Frame

Documentation

It would be useful to be able to view the docs for this project.

Currently, I think, you have to clone and build them yourself?

A solution would be to use GitHub pages to serve the docs as this works well with sphinx.

Chain function produces incorrect indices if period missing

The chain does not handle missing periods correctly but still produces a result.

import pandas as pd
from pandas import Timestamp
import precon

df_all_periods = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), 100.478600),  # April 2019 value present
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_period_missing = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), None),  # April 2019 value missing
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

expected = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 105.386833),
        (Timestamp('2019-03-01'), 106.046713),
        (Timestamp('2019-04-01'), 104.745404),
        (Timestamp('2019-05-01'), 104.921789),
        (Timestamp('2019-06-01'), 104.704227),
        (Timestamp('2019-07-01'), 106.521034),
        (Timestamp('2019-08-01'), 104.881133),
        (Timestamp('2019-09-01'), 106.401255),
        (Timestamp('2019-10-01'), 106.767473),
        (Timestamp('2019-11-01'), 107.133691),
        (Timestamp('2019-12-01'), 107.499909),
        (Timestamp('2020-01-01'), 107.915346),
        (Timestamp('2020-02-01'), 108.682084),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_all_periods['chained'] = precon.chain(df_all_periods)

df_period_missing['chained'] = precon.chain(df_period_missing)

pd.concat([df_all_periods, df_period_missing, expected], keys=['all_periods', 'period_missing', 'expected'], axis=1)

In the above example expected is calculated for if all periods are present but using the equation of unlinked index * linked base / 100 so the chained indices after the missing period are not affected. precon.chain doesn't have an issue as it uses a backfill after shifting the indices by one period to fill in the first month.

Add a fillna(0) to the weights in the aggregation method to stop Zero Division bug

Still totally unsure whether this will solve the issue a user is experiencing, but in some adapted code the lines were:

    zeros_and_nans = indices.isna() | indices.eq(0)
    weights = weights.mask(zeros_and_nans, 0).fillna(0)

Consider implementing it on it's own line with a comment explaining why that fill is necessary. Also find out what edge case it solves and write a test for it.

weights = weights.mask(indices.isna() | indices.eq(0), 0)

dropna in jan_adjustment will remove all values in row

adjusted = adjusted.dropna()

The above line in the function means if passing in a dataframe with the following format

date col1 col2
2019-01-01 101 NaN
... ... ...
2019-05-01 104 NaN
2019-06-01 103 100
... ... ...
2020-01-01 101 102

(i.e. col2 timeseries starts later than col1 ) then jan_adjustment will drop the entire row for 2019-01-01.

Not sure on the correct behaviour, but anecdotally removing the dropna seems to work well.

Modify get_base_prices to only fill within year

This might need some generalisation later on, but replace what is there for now. Maybe this function can move too, index_methods? Move index_calculator there too?

precon/precon/imputation.py

Lines 143 to 152 in ea185fa

def get_base_prices(
prices: pd.DataFrame,
base_period: int = 1,
axis: pd._typing.Axis = 0,
ffill: bool = True,
) -> pd.DataFrame:
"""Returns the prices at the base month in the same shape as prices.
Default behaviour is to fill forward values, but can be changed to
return NaN where not base_month by setting ffill=False.

Add new aggregation functionality

Add functionality to aggregate for a given MultiIndex level or set of levels, and extend that functionality to enable an aggregation up a hierarchical tree given by a set of MultiIndex levels.

Add tests and ensure docstrings are thorough.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.