Giter Site home page Giter Site logo

precon's People

Contributors

mitches-got-glitches avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

martinr-l

precon's Issues

Chain function produces incorrect indices if period missing

The chain does not handle missing periods correctly but still produces a result.

import pandas as pd
from pandas import Timestamp
import precon

df_all_periods = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), 100.478600),  # April 2019 value present
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_period_missing = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 101.093900),
        (Timestamp('2019-03-01'), 101.726900),
        (Timestamp('2019-04-01'), None),  # April 2019 value missing
        (Timestamp('2019-05-01'), 100.647800),
        (Timestamp('2019-06-01'), 100.439100),
        (Timestamp('2019-07-01'), 102.181900),
        (Timestamp('2019-08-01'), 100.608800),
        (Timestamp('2019-09-01'), 102.067000),
        (Timestamp('2019-10-01'), 102.418300),
        (Timestamp('2019-11-01'), 102.769600),
        (Timestamp('2019-12-01'), 103.120900),
        (Timestamp('2020-01-01'), 103.519414),
        (Timestamp('2020-02-01'), 100.710500),
    ],
    columns=('period', 'index_value'),
).set_index('period')

expected = pd.DataFrame.from_records([
        (Timestamp('2018-01-01'), 100.000000),
        (Timestamp('2018-02-01'), 100.527400),
        (Timestamp('2018-03-01'), 100.894000),
        (Timestamp('2018-04-01'), 100.689100),
        (Timestamp('2018-05-01'), 102.670400),
        (Timestamp('2018-06-01'), 100.811000),
        (Timestamp('2018-07-01'), 102.632500),
        (Timestamp('2018-08-01'), 103.133200),
        (Timestamp('2018-09-01'), 103.111400),
        (Timestamp('2018-10-01'), 103.417700),
        (Timestamp('2018-11-01'), 103.155800),
        (Timestamp('2018-12-01'), 103.616800),
        (Timestamp('2019-01-01'), 104.246480),
        (Timestamp('2019-02-01'), 105.386833),
        (Timestamp('2019-03-01'), 106.046713),
        (Timestamp('2019-04-01'), 104.745404),
        (Timestamp('2019-05-01'), 104.921789),
        (Timestamp('2019-06-01'), 104.704227),
        (Timestamp('2019-07-01'), 106.521034),
        (Timestamp('2019-08-01'), 104.881133),
        (Timestamp('2019-09-01'), 106.401255),
        (Timestamp('2019-10-01'), 106.767473),
        (Timestamp('2019-11-01'), 107.133691),
        (Timestamp('2019-12-01'), 107.499909),
        (Timestamp('2020-01-01'), 107.915346),
        (Timestamp('2020-02-01'), 108.682084),
    ],
    columns=('period', 'index_value'),
).set_index('period')

df_all_periods['chained'] = precon.chain(df_all_periods)

df_period_missing['chained'] = precon.chain(df_period_missing)

pd.concat([df_all_periods, df_period_missing, expected], keys=['all_periods', 'period_missing', 'expected'], axis=1)

In the above example expected is calculated for if all periods are present but using the equation of unlinked index * linked base / 100 so the chained indices after the missing period are not affected. precon.chain doesn't have an issue as it uses a backfill after shifting the indices by one period to fill in the first month.

Add a fillna(0) to the weights in the aggregation method to stop Zero Division bug

Still totally unsure whether this will solve the issue a user is experiencing, but in some adapted code the lines were:

    zeros_and_nans = indices.isna() | indices.eq(0)
    weights = weights.mask(zeros_and_nans, 0).fillna(0)

Consider implementing it on it's own line with a comment explaining why that fill is necessary. Also find out what edge case it solves and write a test for it.

weights = weights.mask(indices.isna() | indices.eq(0), 0)

Chang to applying _get_adjustments in round_and_adjust function

Change from the following:

    elif isinstance(obj, pd.core.frame.DataFrame):

        # Create an empty DataFrame to fill with adjustments
        adjustments = pd.DataFrame().reindex_like(obj)

        for index, row in iter_method(obj):
            # Create a selector based on the axis
            slice_ = axis_slice(index, axis)

            adjustments.loc[slice_] = _get_adjustments(row, decimals)

to this:

    elif isinstance(obj, pd.core.frame.DataFrame):

        adjustments = obj.apply(_get_adjustments, args=(decimals), axis=axis)

This should also allow for the removal of:

    iter_dict = {
        0: pd.DataFrame.iterrows,
        1: pd.DataFrame.iteritems,
    }
    iter_method = iter_dict.get(axis)

Slimming the function right down.

While taking care of this, remember to also do the following:

  • - Ensure empty line at EOF
  • - change the isinstance calls so that we're removing core.Series./core.Frame

Modify get_base_prices to only fill within year

This might need some generalisation later on, but replace what is there for now. Maybe this function can move too, index_methods? Move index_calculator there too?

precon/precon/imputation.py

Lines 143 to 152 in ea185fa

def get_base_prices(
prices: pd.DataFrame,
base_period: int = 1,
axis: pd._typing.Axis = 0,
ffill: bool = True,
) -> pd.DataFrame:
"""Returns the prices at the base month in the same shape as prices.
Default behaviour is to fill forward values, but can be changed to
return NaN where not base_month by setting ffill=False.

dropna in jan_adjustment will remove all values in row

adjusted = adjusted.dropna()

The above line in the function means if passing in a dataframe with the following format

date col1 col2
2019-01-01 101 NaN
... ... ...
2019-05-01 104 NaN
2019-06-01 103 100
... ... ...
2020-01-01 101 102

(i.e. col2 timeseries starts later than col1 ) then jan_adjustment will drop the entire row for 2019-01-01.

Not sure on the correct behaviour, but anecdotally removing the dropna seems to work well.

Add new aggregation functionality

Add functionality to aggregate for a given MultiIndex level or set of levels, and extend that functionality to enable an aggregation up a hierarchical tree given by a set of MultiIndex levels.

Add tests and ensure docstrings are thorough.

Add pre-commit hooks for devs

I want to add some pre-commit hooks for developers.

  • Remove whitespace

  • Flake 8 linting

  • Check commit msg subject len

Documentation

It would be useful to be able to view the docs for this project.

Currently, I think, you have to clone and build them yourself?

A solution would be to use GitHub pages to serve the docs as this works well with sphinx.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.