mini-kep / parsers Goto Github PK
View Code? Open in Web Editor NEWExtract and upload data to database
Extract and upload data to database
@kkravchuk : note some refactoring needed for tests, marked as TODO
Can we also add test for to_markdown.py
, something very simple?
По поводу шедулера - есть два варианта https://devcenter.heroku.com/articles/scheduler и https://devcenter.heroku.com/articles/clock-processes-python
Просто шедулер. Запуск строго или каждые 10 минут, или каждый час, или каждый день. По сути команда heroku run ... В нашем случае это heroku run python <parser_name>. Вопрос - команды запуска шедулеров тогда в один файл класть или по файлам раскидывать и столько же шедулеров создавать?
APS Scheduler. Создаётся процесс типа clock в Heroku, запускается файл с расписанием, когда какой парсер дёргать. Для описания используется python библиотека APSScheduler. Плюс в том, что можно более гибко настроить расписание
tests class for upload_to_database():
pls remember the testing guidelines
Time: 1-2h.
Need a setup script so we can easily use it as a dependency.
Please create it or review this PR: #22
What are pytest capabilities to run a specific group of tests?
Review http://pytest.readthedocs.io/en/reorganize-docs/new-docs/user/pytestmark.html + provide example how we can run:
we can now have a start_date
parsers/parsers/tests/test_runner.py
Lines 129 to 233 in 268c1ab
Refactroing result is ususally parametrised tests.
================================== FAILURES ===================================
__________________ Test_make_date.test_on_none_returns_today __________________
self = <parsers.tests.test_helpers.Test_make_date object at 0x0000016A99BDA128>
def test_on_none_returns_today(self):
> assert DateHelper.make_date(None) == datetime.date.today()
E AssertionError: assert datetime.date(2017, 9, 29) == datetime.date(2017, 9, 30)
E + where datetime.date(2017, 9, 29) = <function DateHelper.make_date at 0x0000016A9881ABF8>(None)
E + where <function DateHelper.make_date at 0x0000016A9881ABF8> = DateHelper.make_date
E + and datetime.date(2017, 9, 30) = <built-in method today of type object at 0x00000000609F6720>()
E + where <built-in method today of type object at 0x00000000609F6720> = <class 'datetime.date'>.today
E + where <class 'datetime.date'> = datetime.date
tests\test_helpers.py:33: AssertionError
===================== 1 failed, 44 passed in 4.11 seconds =====================
See TODO in ParserBase.upload()
and Dataset.upload()
Example:
'Uploaded 350 datapoints in 0.12 seconds in 1 attempt(s)'
from parsers.runner import CBR_USD
# this will pass
assert CBR_USD('2017-01-01').upload()
# this will fail
assert CBR_USD().upload()
gen = Dataset.yield_dicts(start='2017-09-01')
Use a little part of the code in previous tests, delete rest of it:
https://github.com/mini-kep/parsers/blob/master/parsers/tests/test_runner.py#L121-L276
@kkravchuk, please note it should be based on https://github.com/epogrebnyak/data-fx-oil/blob/master/brent.py, which is newer than https://github.com/epogrebnyak/data-fx-oil/blob/master/eia.py.
Please do not use eia.py.
There is a format to deliever information about parser development status for kep and cbr-usd:
I want to make a configuration dictionary with these parameters, like
desc = dict(name='rosstat-kep',
freq='m',
text='Parse sections of '
'Short-term Economic Indicators (KEP)'
'monthly Rosstat publication')
The converter function should map this dictionary to markdown string.
RosstatKEP_* datatsets have quite many float numbers, most with long representation. Any common method to round/beautify values like 349.89999999999998
?
Lines 73 to 81 in 9316e18
Code tested: https://github.com/mini-kep/parsers/blob/master/parsers/getter/brent.py
Note brent.py has dependency injection at def yield_brent_dicts(download_func=fetch):
This should be used in designing the test for yield_brent_dicts()
- need a mock fucntion to test yield_brent_dicts()
.
Same should be done in other getters.
After done, delete one datapoint in #52
Upload scenarios cover a batch of jobs that uploader should do. So far they are:
<entry><id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(6832)</id><title type="text"/><updated>2017-11-08T21:11:43Z</updated><author><name/></author><link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(6832)"/><category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme"/><content type="application/xml"><m:properties><d:Id m:type="Edm.Int32">6832</d:Id><d:NEW_DATE m:type="Edm.DateTime">2017-04-14T00:00:00</d:NEW_DATE><d:BC_1MONTH m:type="Edm.Double">0</d:BC_1MONTH><d:BC_3MONTH m:type="Edm.Double">0</d:BC_3MONTH><d:BC_6MONTH m:type="Edm.Double">0</d:BC_6MONTH><d:BC_1YEAR m:type="Edm.Double">0</d:BC_1YEAR><d:BC_2YEAR m:type="Edm.Double">0</d:BC_2YEAR><d:BC_3YEAR m:type="Edm.Double">0</d:BC_3YEAR><d:BC_5YEAR m:type="Edm.Double">0</d:BC_5YEAR><d:BC_7YEAR m:type="Edm.Double">0</d:BC_7YEAR><d:BC_10YEAR m:type="Edm.Double">0</d:BC_10YEAR><d:BC_20YEAR m:type="Edm.Double">0</d:BC_20YEAR><d:BC_30YEAR m:type="Edm.Double">0</d:BC_30YEAR><d:BC_30YEARDISPLAY m:type="Edm.Double">0</d:BC_30YEARDISPLAY></m:properties></content></entry>
this day should be empty
Currently there is a class to handle imput parameters and upoload in runner.py and getter functions/classes
in getters submodule. Probably they can be just one class.
Can add thin wrappers for string date handling + uploader functionality.
Line 105 in acd2a91
<entry xmlns="http://www.w3.org/2005/Atom">
<id>http://data.treasury.gov/Feed.svc/DailyTreasuryYieldCurveRateData(3458)</id>
<title type="text"></title><updated>2017-11-07T14:01:54Z</updated>
<author><name /></author>
<link rel="edit" title="DailyTreasuryYieldCurveRateDatum" href="DailyTreasuryYieldCurveRateData(3458)" />
<category term="TreasuryDataWarehouseModel.DailyTreasuryYieldCurveRateDatum" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" /><content type="application/xml"><m:properties xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata"><d:Id m:type="Edm.Int32" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">3458</d:Id><d:NEW_DATE m:type="Edm.DateTime" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">2010-10-11T00:00:00</d:NEW_DATE><d:BC_1MONTH m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_3MONTH m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_6MONTH m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_1YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_2YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_3YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_5YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_7YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_10YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_20YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_30YEAR m:type="Edm.Double" m:null="true" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" /><d:BC_30YEARDISPLAY m:type="Edm.Double" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices">0</d:BC_30YEARDISPLAY></m:properties></content></entry>
Lines 3 to 4 in d6e9f17
See test coverage in https://codecov.io/gh/mini-kep/parsers/tree/master/parsers/getter
Must have same logic of adding functions to ParserBase
class, see repo README.md
Following issue #1: we have hand-made summary tables in markdown about parser information:
kep
: https://github.com/ru-stat/parser-cbr-usd/blob/master/README.mdcbr-usd
:https://github.com/mini-kep/parser-rosstat-kep#parcer-summaryIn <definitons.py> we have a way to keep parser information in class and show it to the user as markdown. Some of the summary information about the parser is used as parameters for its invokation.
This issue will result in defining what information about the parsers is needed to run them and to present their summary to the user.
Current attribures are (from here):
class RosstatKEP(Parser):
name = 'rosstat-kep'
does_what = 'Parse sections of KEP Rosstat publication'
freqs = 'aqm'
all_varnames = ['CPI_rog', 'RUR_EUR_eop']
start_date = make_date('1999-01-31')
The issue should result in new parser summaries for all parsers listed here.
The information we specify for each parser class is used to a) bound the call to the parser (eg available variables, frequencies, dat limits) and b) make description more understandable.
Make sure https://travis-ci.org/mini-kep/parser-template works
Finishing tests with FIXME
Housekeeping:
Bigger issues:
{'BRENT': dict(ru='Цена нефти Brent', en='Brent oil price')}
There are no tests for:
Ideally, need to keep tests simple/readable and closer to real, not imaginary risks in the program.
Also, best to keep in mind what tests are:
Everyone confortable with these ideas/defintions, @Andres-Unt, @muroslav2909 , @JaroVojtek?
Must keep access tokens outside repo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.