onsbigdata / precon Goto Github PK
View Code? Open in Web Editor NEWFunctions for price index economics.
License: MIT License
Functions for price index economics.
License: MIT License
The chain does not handle missing periods correctly but still produces a result.
import pandas as pd
from pandas import Timestamp
import precon
df_all_periods = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 101.093900),
(Timestamp('2019-03-01'), 101.726900),
(Timestamp('2019-04-01'), 100.478600), # April 2019 value present
(Timestamp('2019-05-01'), 100.647800),
(Timestamp('2019-06-01'), 100.439100),
(Timestamp('2019-07-01'), 102.181900),
(Timestamp('2019-08-01'), 100.608800),
(Timestamp('2019-09-01'), 102.067000),
(Timestamp('2019-10-01'), 102.418300),
(Timestamp('2019-11-01'), 102.769600),
(Timestamp('2019-12-01'), 103.120900),
(Timestamp('2020-01-01'), 103.519414),
(Timestamp('2020-02-01'), 100.710500),
],
columns=('period', 'index_value'),
).set_index('period')
df_period_missing = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 101.093900),
(Timestamp('2019-03-01'), 101.726900),
(Timestamp('2019-04-01'), None), # April 2019 value missing
(Timestamp('2019-05-01'), 100.647800),
(Timestamp('2019-06-01'), 100.439100),
(Timestamp('2019-07-01'), 102.181900),
(Timestamp('2019-08-01'), 100.608800),
(Timestamp('2019-09-01'), 102.067000),
(Timestamp('2019-10-01'), 102.418300),
(Timestamp('2019-11-01'), 102.769600),
(Timestamp('2019-12-01'), 103.120900),
(Timestamp('2020-01-01'), 103.519414),
(Timestamp('2020-02-01'), 100.710500),
],
columns=('period', 'index_value'),
).set_index('period')
expected = pd.DataFrame.from_records([
(Timestamp('2018-01-01'), 100.000000),
(Timestamp('2018-02-01'), 100.527400),
(Timestamp('2018-03-01'), 100.894000),
(Timestamp('2018-04-01'), 100.689100),
(Timestamp('2018-05-01'), 102.670400),
(Timestamp('2018-06-01'), 100.811000),
(Timestamp('2018-07-01'), 102.632500),
(Timestamp('2018-08-01'), 103.133200),
(Timestamp('2018-09-01'), 103.111400),
(Timestamp('2018-10-01'), 103.417700),
(Timestamp('2018-11-01'), 103.155800),
(Timestamp('2018-12-01'), 103.616800),
(Timestamp('2019-01-01'), 104.246480),
(Timestamp('2019-02-01'), 105.386833),
(Timestamp('2019-03-01'), 106.046713),
(Timestamp('2019-04-01'), 104.745404),
(Timestamp('2019-05-01'), 104.921789),
(Timestamp('2019-06-01'), 104.704227),
(Timestamp('2019-07-01'), 106.521034),
(Timestamp('2019-08-01'), 104.881133),
(Timestamp('2019-09-01'), 106.401255),
(Timestamp('2019-10-01'), 106.767473),
(Timestamp('2019-11-01'), 107.133691),
(Timestamp('2019-12-01'), 107.499909),
(Timestamp('2020-01-01'), 107.915346),
(Timestamp('2020-02-01'), 108.682084),
],
columns=('period', 'index_value'),
).set_index('period')
df_all_periods['chained'] = precon.chain(df_all_periods)
df_period_missing['chained'] = precon.chain(df_period_missing)
pd.concat([df_all_periods, df_period_missing, expected], keys=['all_periods', 'period_missing', 'expected'], axis=1)
In the above example expected
is calculated for if all periods are present but using the equation of unlinked index * linked base / 100
so the chained indices after the missing period are not affected. precon.chain
doesn't have an issue as it uses a backfill after shifting the indices by one period to fill in the first month.
This is to abide by the numpy style convention.
Additional contributions functions were developed for the consumer prices faster indicators project. Pull these into precon
.
period_on_period_contributions
contributions_level
contributions_up_hierarchy
Review existing contributions code and add some tests and documentation.
The ternary operator is unnecessary here - a simple conditional will do since it returns True or False anyway.
Line 82 in 46752ea
Still totally unsure whether this will solve the issue a user is experiencing, but in some adapted code the lines were:
zeros_and_nans = indices.isna() | indices.eq(0)
weights = weights.mask(zeros_and_nans, 0).fillna(0)
Consider implementing it on it's own line with a comment explaining why that fill is necessary. Also find out what edge case it solves and write a test for it.
Line 68 in cf0df3a
Change from the following:
elif isinstance(obj, pd.core.frame.DataFrame):
# Create an empty DataFrame to fill with adjustments
adjustments = pd.DataFrame().reindex_like(obj)
for index, row in iter_method(obj):
# Create a selector based on the axis
slice_ = axis_slice(index, axis)
adjustments.loc[slice_] = _get_adjustments(row, decimals)
to this:
elif isinstance(obj, pd.core.frame.DataFrame):
adjustments = obj.apply(_get_adjustments, args=(decimals), axis=axis)
This should also allow for the removal of:
iter_dict = {
0: pd.DataFrame.iterrows,
1: pd.DataFrame.iteritems,
}
iter_method = iter_dict.get(axis)
Slimming the function right down.
While taking care of this, remember to also do the following:
Similar to Matt's implementation here:
in_year_base = indices.resample('AS').first()
# Align base indices to full time series values
in_year_base = (
in_year_base
.reindex_like(indices, method='ffill')
This might need some generalisation later on, but replace what is there for now. Maybe this function can move too, index_methods? Move index_calculator there too?
Lines 143 to 152 in ea185fa
Line 80 in cf0df3a
Add a .fillna(base_prices)
method to cover the NaNs created by the shift.
Be mindful that this is changing in impute_base_prices
too, but it's covered their already with the .fillna(start_prices)
.
Line 24 in 4e441a7
The above line in the function means if passing in a dataframe with the following format
date | col1 | col2 |
---|---|---|
2019-01-01 | 101 | NaN |
... | ... | ... |
2019-05-01 | 104 | NaN |
2019-06-01 | 103 | 100 |
... | ... | ... |
2020-01-01 | 101 | 102 |
(i.e. col2 timeseries starts later than col1 ) then jan_adjustment will drop the entire row for 2019-01-01.
Not sure on the correct behaviour, but anecdotally removing the dropna seems to work well.
The axis to adjust across.
Add functionality to aggregate for a given MultiIndex level or set of levels, and extend that functionality to enable an aggregation up a hierarchical tree given by a set of MultiIndex levels.
Add tests and ensure docstrings are thorough.
I think index_calculator may need to support multiple scenarios:
Consider a sensible way of implementing this - might need some tests first!
Create a generator to create random index data in a reproducible way.
Support the generation of hierarchical structure of indices.
I want to add some pre-commit hooks for developers.
Remove whitespace
Flake 8 linting
Check commit msg subject len
There's a bug here, since base_period is a list rather than a single int. Change to isin()
method.
Line 191 in 4e441a7
Line 77 in 823c0c1
Add this after:
if adjustments is not None:
base_prices = base_prices * adjustments
And update docstring.
It would be useful to be able to view the docs for this project.
Currently, I think, you have to clone and build them yourself?
A solution would be to use GitHub pages to serve the docs as this works well with sphinx.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.