Comments (7)
The proper check for year-avergae rate accumaulation is as below. Thanks @zarak for spotting the 'mutation' error!
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""1999-01-31 81.7
1999-02-28 96.9
1999-03-31 106.0
1999-04-30 97.6
1999-05-31 100.2
1999-06-30 100.7
1999-07-31 100.0
1999-08-31 106.5
1999-09-30 100.5
1999-10-31 102.1
1999-11-30 100.5
1999-12-31 116.0
2000-01-31 83.3
2000-02-29 97.9
2000-03-31 105.9
2000-04-30 98.2
2000-05-31 99.6
2000-06-30 101.1
2000-07-31 101.4
2000-08-31 105.6
2000-09-30 100.1
2000-10-31 102.2
2000-11-30 101.4
2000-12-31 115.6"""), sep = ' ', header = None, names = ['date', 'X_rog'],
index_col = 'date', converters=dict(date=pd.to_datetime))
df = df / 100
df = df.cumprod()
z = df.resample('A').sum()
rate = (z / z.shift() * 100).round(1).dropna()
assert rate.loc['2000',].iloc[0,0] == 109.0
from parser-rosstat-kep.
from parser-rosstat-kep.
Moved workfile to /issues/todo_df_check.py
from parser-rosstat-kep.
from parser-rosstat-kep.
some of my dfa yoy vs cumprod of dfm rog for RETAIL_SALES:
import pandas as pd
from io import StringIO
df = pd.read_csv(StringIO("""1999-01-31 81.7
1999-02-28 96.9
1999-03-31 106.0
1999-04-30 97.6
1999-05-31 100.2
1999-06-30 100.7
1999-07-31 100.0
1999-08-31 106.5
1999-09-30 100.5
1999-10-31 102.1
1999-11-30 100.5
1999-12-31 116.0
2000-01-31 83.3
2000-02-29 97.9
2000-03-31 105.9
2000-04-30 98.2
2000-05-31 99.6
2000-06-30 101.1
2000-07-31 101.4
2000-08-31 105.6
2000-09-30 100.1
2000-10-31 102.2
2000-11-30 101.4
2000-12-31 115.6"""), sep = ' ', header = None, names = ['date', 'X_rog'],
index_col = 'date', converters=dict(date=pd.to_datetime))
df = df / 100
df.cumprod()
z = df.resample('A').sum()
rate = z.iloc[1,0] / z.iloc[0,0]
#1.0029784065524945
#dfa.RETAIL_SALES_yoy
#Out[41]:
#1999-12-31 94.2
#2000-12-31 109.0
from parser-rosstat-kep.
89b76da It may be preferable to use fillna(0)
instead of dropna()
.
Consider the following aggregated dataframe:
In [208]: aggregate_rates_to_annual_average(df1)
Out[208]:
INDPRO_rog INVESTMENT_rog RETAIL_SALES_FOOD_rog \
1999-12-31 NaN NaN NaN
2000-12-31 NaN 117.364097 107.405475
RETAIL_SALES_NONFOOD_rog RETAIL_SALES_rog WAGE_REAL_rog
1999-12-31 NaN NaN NaN
2000-12-31 110.50661 109.006228 120.181979
Using dropna()
on this returns an empty dataframe because by default any row or column with an NaN
is dropped. We could use dropna(how='all')
:
In [212]: aggregate_rates_to_annual_average(df1).dropna(how='all')
Out[212]:
INDPRO_rog INVESTMENT_rog RETAIL_SALES_FOOD_rog \
2000-12-31 NaN 117.364097 107.405475
RETAIL_SALES_NONFOOD_rog RETAIL_SALES_rog WAGE_REAL_rog
2000-12-31 110.50661 109.006228 120.181979
but the remaining NaN
value will evaluate to False
against the threshold unless it is dropped too.
In [213]: aggregate_rates_to_annual_average(df1).dropna(how='all') < 150
Out[213]:
INDPRO_rog INVESTMENT_rog RETAIL_SALES_FOOD_rog \
2000-12-31 False True True
RETAIL_SALES_NONFOOD_rog RETAIL_SALES_rog WAGE_REAL_rog
2000-12-31 True True True
Or is evaluating to False
here the expected outcome?
from parser-rosstat-kep.
Some takeaways for this issue.
We had a task to check incoming dfa, dfq, dfm. This task has stages of:
- df-level primitives. like accum* function
- setup test arguments for a single 'resolution' function
- get result for the setup
- optionally calculate coverage
With this we did:
a) write and test the primitives
b) deciding what to check and prepare variables to be checked
c) setting up feed of tests, digestible by a resolution function
d) running a feed of checks to get a pass/fail result for the test suite
e) knowing 'coverage' of checks - what was not tested?
My lessons learned are:
- primities are testable with unittests and simple input values
- good when you arrive to a common formula for the primitives (like accum(df1)-df2 < epsilon)
- primitives should be clearly separated from setup
- sacrifices had to be made not to test everything
In #66 df_transform() separating primitives from transform job is also important, but setting teh varnames was more definite - we exactly know want variables we transform.
@zarak, your comment is welcome!
We leave it to rest for a while before integrating to parsing validation procedure.
from parser-rosstat-kep.
Related Issues (20)
- add Vintage.upload() method HOT 1
- processed/latest folder needs better handling
- certain variables not found in Vinatage.validate() HOT 2
- review check procedure HOT 7
- Missing values should not be False at dataframe construction HOT 5
- shorter decimal representation in CSV file HOT 2
- replace Table class with Table2
- add coverable badge
- adapt code to create html with headers and charts HOT 8
- code review for `dev-sceleton` branch
- speed up manage.parse() HOT 3
- create parsing definition for 'profit' variable
- start of minimal example in julia HOT 1
- start of minimal example in go
- clean notebooks folder and dev_scrap branch
- duplicate code: get_year() vs clean year()
- why smaller code has longer running time?
- trace where duplicate values are created
- how to control warnings issue?
- industial goods production
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parser-rosstat-kep.