Giter Site home page Giter Site logo

db's People

Contributors

epogrebnyak avatar eskarimov avatar lynxrv21 avatar mwangikinuthia avatar perevedko avatar sppps avatar varnie avatar vasmitr avatar wintercomes avatar zarak avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

db's Issues

can we prevent CSV output to be rendered as html in browser?

When looking at the source of csv output I can see the orginal CSV file easily:

view-source:https://minikep-db.herokuapp.com/api/datapoints?name=USDRUR_CB&freq=d&start_date=2017-08-01&end_date=2017-10-01

...but in browser it gets rendered to html. Can we prevent html rendering in browser? What controls for it? some headers?

commented tests must relate to different functions in views.py

db/tests/test_views.py

Lines 189 to 217 in 3cec7d0

#TODO: these test should relate to something else not covered in query.py
#class TestGetResponseDatapoints(TestCaseBase):
#
# data_dicts = [{"date": "1999-01-31", "freq": "m", "name": "CPI_ALCOHOL_rog", "value": 109.7},
# {"date": "1999-01-31", "freq": "m",
# "name": "CPI_FOOD_rog", "value": 110.4},
# {"date": "1999-01-31", "freq": "m", "name": "CPI_NONFOOD_rog", "value": 106.2}]
#
# def _make_sample_datapoints_list(self):
# return [Datapoint(**params) for params in self.data_dicts]
#
# def test_json_serialising_is_valid(self):
# data = self._make_sample_datapoints_list()
# response = get_datapoints_response(data, 'json')
# parsed_json = json.loads(response.data)
# self.assertEqual(self.data_dicts, parsed_json)
#
# def test_csv_serialising_is_valid(self):
# data = self._make_sample_datapoints_list()
# response = get_datapoints_response(data, 'csv')
# csv_string = str(response.data, 'utf-8')
# self.assertEqual(
# ',CPI_ALCOHOL_rog\n1999-01-31,109.7\n1999-01-31,110.4\n1999-01-31,106.2\n', csv_string)
#
# def test_invalid_output_format_should_fail(self):
# data = self._make_sample_datapoints_list()
# with self.assertRaises(CustomError400):
# get_datapoints_response(data, 'html')

buggy TokenHelper.__as_date

Hello everybody.

This link http://mini-kep.herokuapp.com/ru/series/GDP/a/yoy/12311111111111111111
produces:

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is >overloaded or there is an error in the application.

because TokenHelper.__as_date method is buggy.

Here's the relevant stacktrace:

127.0.0.1 - - [24/Oct/2017 16:26:24] "GET /ru/series/GDP/a/yoy/12311111111111111111 HTTP/1.1" 500 -
Traceback (most recent call last):
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1997, in call
return self.wsgi_app(environ, start_response)
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1985, in wsgi_app
response = self.handle_exception(e)
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1540, in handle_exception
reraise(exc_type, exc_value, tb)
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app
response = self.full_dispatch_request()
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise
raise value
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request
rv = self.dispatch_request()
File "/home/varnie/thrash/my-virt-environments/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request
return self.view_functionsrule.endpoint
File "/home/varnie/thrash/projects/db/db/custom_api/views.py", line 22, in time_series_api_interface
return custom_api.CustomGET(domain, varname, freq, inner_path).get_csv_response()
File "/home/varnie/thrash/projects/db/db/custom_api/custom_api.py", line 180, in init
ip = InnerPath(inner_path)
File "/home/varnie/thrash/projects/db/db/custom_api/custom_api.py", line 141, in init
self.dates = helper.get_dates_dict()
File "/home/varnie/thrash/projects/db/db/custom_api/custom_api.py", line 73, in get_dates_dict
result['start_date'] = self._as_date(start_year, month=1, day=1)
File "/home/varnie/thrash/projects/db/db/custom_api/custom_api.py", line 107, in _as_date
day=day).strftime('%Y-%m-%d')
OverflowError: Python int too large to convert to C long

Also, not sure whether it is relevant or not, but the following link:
https://minikep-db.herokuapp.com/api/datapoints?name=BRENT&freq=d&start_date=2017-01-01
also produces:

Internal Server Error

The server encountered an internal error and was unable to complete your request. Either the server is >overloaded or there is an error in the application.

(I've taken this link from the README.MD: http://joxi.ru/J2b57NMiXJWWLm )

Thank you for your time.

pipeline discussion

This goes from parser 1 to database
====================================
[
    {
        "date": "2014-01-31",
        "freq": "m",
        "varname": "CPI_rog",
        "value": 100.6
    },

 ....
   
 {
        "date": "2015-12-31",
        "freq": "m",
        "varname": "CPI_rog",
        "value": 100.8
    },
    {
        "date": "2015-12-31",
        "freq": "m",
        "varname": "RUR_EUR_eop",
        "value": 79.7
    }
]

This goes from parser 2 to database
===================================
[
    {
        "date": "2014-03-31",
        "freq": "q",
        "varname": "CPI_rog",
        "value": 102.3
    }
]

User query
==========
{'end': '2015-12',
 'freq': 'm',
 'start': '2014-01',
 'varnames': ['CPI_rog', 'RUR_EUR_eop']}

App response to query: json with epoch timestamps
=================================================
This format is default input 
to user reader function pd.read_json()
{'CPI_rog': {'1391126400000': 100.6,
             '1393545600000': 100.7,
   ...
             '1448841600000': 100.8,
             '1451520000000': 100.8},
 'RUR_EUR_eop': {'1391126400000': 48.1,
                 '1393545600000': 49.35,
    ... 
                '1448841600000': 70.39,
                 '1451520000000': 79.7}}

User's local dataframe
======================
            CPI_rog  RUR_EUR_eop
2014-01-31    100.6        48.10
2014-02-28    100.7        49.35
2014-03-31    101.0        49.05
2014-04-30    100.9        49.51
2014-05-31    100.9        47.27
2014-06-30    100.6        45.83
2014-07-31    100.5        47.90
2014-08-31    100.2        48.63
2014-09-30    100.7        49.95
2014-10-31    100.8        54.64
2014-11-30    101.3        61.41
2014-12-31    102.6        68.34
2015-01-31    103.9        78.11
2015-02-28    102.2        68.69
2015-03-31    101.2        63.37
2015-04-30    100.5        56.81
2015-05-31    100.4        58.01
2015-06-30    100.2        61.52
2015-07-31    100.8        64.65
2015-08-31    100.4        75.05
2015-09-30    100.6        74.58
2015-10-31    100.7        70.75
2015-11-30    100.8        70.39
2015-12-31    100.8        79.70
Identical to source data: True

unclear test - maybe parametrise?

db/tests/test_views.py

Lines 184 to 197 in 9108da4

# FIXME: ------------------------------------------------------------------
def test_get_names_on_random_freq_returns_sorted_list_of_names_for_given_random_freq(self):
random_freq = self.query_random_freq_from_test_data()
response = self.query_names_for_freq(freq=random_freq)
result = json.loads(response.get_data().decode('utf-8'))
# expected result
expected_result = []
for row in read_test_data():
if row['freq'] == random_freq and row['name'] not in expected_result:
expected_result.append(row['name'])
expected_result = sorted(expected_result)
# check
assert result == expected_result
# ------------------------------------------------------------------------

change tests for 'custom_api' to 'decomposer'

_________________ ERROR collecting tests/test_custom_api.py __________________
ImportError while importing test module 'C:\Users\Евгений\Documents\GitHub\mini-kep-db\tests\test_custom_api.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests\test_custom_api.py:2: in <module>
    import db.custom_api.custom_api as custom_api
E   ModuleNotFoundError: No module named 'db.custom_api.custom_api'

new api/dataframe endpoint

Based on functionality from #37 can develop following endpoints:

api/dataframe?freq=a
api/dataframe?freq=q
api/dataframe?freq=m
api/dataframe&freq=d

Each endpoint should provide pandas-readable csv with all variables at such frequency

Should alos accept name, start_date and end_date parameters.

API extensions

  • api/datapoints/varnames/{freq} lists all variable names available as that frequency {freq} (assumed to be in aqmwd)
  • variable descriptions GDP -> Gross domestic product
  • use unit descriptions
  • varname splitter goes to helper function, will need to use it locally

unclear test

db/tests/test_queries.py

Lines 54 to 60 in 346ae2c

def test_upsert_updates_value_for_existing_row(self):
upsert(self.dp1_dict)
dp1_updated_value = self.dp1_dict['value'] + 4.56
dp1_dict_with_new_value = {k: v if k != "value" else dp1_updated_value for k, v in self.dp1_dict.items()}
upsert(dp1_dict_with_new_value)
datapoint = select_datapoints(**self.dp1_search_param).first()
self.assertEqual(datapoint.value, dp1_updated_value)

arrange variables in sections

For the purposes of listing variables in groups and easier browsing we need a grouping dictionary that
lists variable nameheads.

Example:

{
    'GDP components': ['GDP', 'INVESTMENT'],
    'Prices': ['CPI', 'CPI_FOOD', 'CPI_NONFOOD', 'CPI_ALCOHOL'],
    'Foreign trade': ['EXPORT_GOODS', 'IMPORT_GOODS'],
    'Exchange rates': ['USDRUR_CB']
    'Interest rates': []
}

Based on nameheads and variable name splitting method one can derive the list of actual variable names like GDP_yoy.

change datapoints parameter handler class

Original class:

db/db/api/utils.py

Lines 51 to 143 in 6536d1b

class DatapointParameters:
"""Parameter handler for api\datapoints endpoint."""
def __init__(self, args):
self.args = args
self.name = self.get_name()
if not self.name:
raise CustomError400("<name> parameter is required")
self.freq = self.get_freq()
if not self.freq:
raise CustomError400("<freq> parameter is required")
def get_freq(self):
freq = self.args.get('freq')
self.validate_freq_exist(freq)
return freq
def get_name(self):
freq = self.get_freq()
name = self.args.get('name')
self.validate_name_exist_for_given_freq(freq, name)
return name
def get_start(self):
start_dt = self.get_dt('start_date')
if start_dt:
self.validate_start_is_not_in_future(start_dt)
return start_dt
def get_end(self):
end_dt = self.get_dt('end_date')
start_dt = self.get_start()
if start_dt and end_dt:
self.validate_end_date_after_start_date(start_dt, end_dt)
return end_dt
def get_dt(self, key: str):
dt = None
date_str = self.args.get(key)
if date_str:
dt = to_date(date_str)
return dt
def _get_boundary(self, direction):
query = queries.get_boundary_date(self.freq, self.name, direction)
return date_as_str(query)
def get_min_date(self):
return self._get_boundary(direction='start')
def get_max_date(self):
return self._get_boundary(direction='end')
def get(self):
"""Return query parameters as dictionary."""
return dict(name=self.name,
freq=self.freq,
start_date=self.get_start(),
end_date=self.get_end())
@staticmethod
def validate_freq_exist(freq):
allowed = list(queries.select_unique_frequencies())
if freq in allowed:
return True
else:
raise CustomError400(message=f'Invalid frequency <{freq}>',
payload={'allowed': allowed})
@staticmethod
def validate_name_exist_for_given_freq(freq, name):
possible_names = queries.possible_names_values(freq)
if name in possible_names:
return True
else:
msg = f'No such name <{name}> for <{freq}> frequency.'
raise CustomError400(message=msg,
payload={"allowed": possible_names})
@staticmethod
def validate_start_is_not_in_future(start_date):
current_date = datetime.date(datetime.utcnow())
#TODO: test on date = today must pass
if start_date > current_date:
raise CustomError400('Start date cannot be in future')
else:
return True
@staticmethod
def validate_end_date_after_start_date(start_date, end_date):
if end_date < start_date:
raise CustomError400('End date must be after start date')
else:
return True

duplicate code for base testcase

db/tests/test_views.py

Lines 49 to 87 in 9108da4

class TestCaseBase(unittest.TestCase):
"""Base class for testing flask application.
Use to compose setUp method:
self._prepare_app()
self._mount_blueprint()
self._prepare_db()
self._start_client()
"""
def _prepare_db(self):
data = read_test_data()
for datapoint in data:
datapoint['date'] = utils.to_date(datapoint['date'])
fsa_db.session.bulk_insert_mappings(Datapoint, data)
def _prepare_app(self):
self.app = make_app()
self.app_context = self.app.app_context()
self.app_context.push()
fsa_db.init_app(app=self.app)
fsa_db.create_all()
def _mount_blueprint(self):
self.app.register_blueprint(api_module)
def _start_client(self):
self.client = self.app.test_client()
def setUp(self):
self._prepare_app()
def tearDown(self):
fsa_db.session.remove()
fsa_db.drop_all()
self.app_context.pop()
def test_app_exists(self):
self.assertTrue(current_app is not None)

db/tests/test_basic.py

Lines 54 to 85 in 9108da4

class TestCaseBase(unittest.TestCase):
def prepare_db(self):
data = read_test_data()
for datapoint in data:
datapoint['date'] = utils.to_date(datapoint['date'])
fsa_db.session.bulk_insert_mappings(Datapoint, data)
def prepare_app(self):
self.app = make_app()
self.app_context = self.app.app_context()
self.app_context.push()
self.client = self.app.test_client()
fsa_db.init_app(app=self.app)
fsa_db.create_all()
def mount_blueprint(self):
self.app.register_blueprint(api_module)
self.app.register_blueprint(custom_api_bp)
def start_client(self):
self.client = self.app.test_client()
def setUp(self):
self.prepare_app()
def tearDown(self):
fsa_db.session.remove()
fsa_db.drop_all()
self.app_context.pop()
def test_app_exists(self):
self.assertTrue(current_app is not None)

select error handling function for views.py

This shuts down validation error messages and hurts testing

# Return validation errors as JSON
@api.errorhandler(400)
def handle_validation_error(err):
    exc = err.exc
    return jsonify({'errors': exc.messages}), 422

#@api.errorhandler(CustomError400)
#def handle_invalid_usage(error):
#    """
#    Generate a json object of a custom error
#    """
#    response = jsonify(error.to_dict())
#    response.status_code = error.status_code
#    return response

'api/frame' serves csv with many variables

http://minikep-db.herokuapp.com/api/datapoints?freq=a&name=GDP_yoy&start_date=2013-12-31:

,GDP_yoy
2013-12-31,101.3
2014-12-31,100.7
2015-12-31,97.2
2016-12-31,99.8

http://minikep-db.herokuapp.com/api/datapoints?freq=a&name=CPI_rog&start_date=2013-12-31:

,CPI_rog
2013-12-31,106.5
2014-12-31,111.4
2015-12-31,112.9
2016-12-31,105.4

Need to implement:

http://minikep-db.herokuapp.com/api/frame?freq=a&names=GDP_yoy,CPI_rog&start_date=2013-12-31:

,GDP_yoy,CPI_rog
2013-12-31,101.3,106.5
2014-12-31,100.7,111.4
2015-12-31,97.2,112.9
2016-12-31,99.8,105.4

This concatenation is easier when time index is the same, while in daily this may not be the case.

Todo:

  • pandas code to merge dataframes
  • example for daily data with different dates in time index
  • test based on this pandas code

To discuss:

  1. ideas for server-side implementation

    • may use same pandas code, but it may be rather slow
    • may construct csv file
  2. how should json behave - provide some data structure readable by pandas into dataframe
    or provide similar easy listing

variable text descriptions

The parsers should be able to emit variable descriptions in form of list of dictionaries:

[
    {'BRENT': dict(ru='Цена нефти Brent', en='Brent oil price')},
# ...
]

Need api/title endpoint with POST/GET methods to strore/retrieve this information from the database + new model to store this information.

api/title?name=BRENT should return dict(ru='Цена нефти Brent', en='Brent oil price')

POST on large datasets hangs application

2017-11-08T20:45:23.097873+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=POST path="/api/incoming" host=minikep-db.herokuapp.com request_id=b554c78d-d8b8-4461-af76-f2ee4e1c68dc fwd="5.227.7.58" dyno=web.1 connect=0ms service=30490ms status=503 bytes=0 protocol=https
2017-11-08T20:45:25.292033+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/api/info?freq=d&name=UST_30YEARDISPLAY" host=minikep-db.herokuapp.com request_id=ebbdfee2-0164-43d6-84a6-bb93aa7e3e9a fwd="54.74.71.204" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=http
2017-11-08T20:45:25.285737+00:00 heroku[router]: at=error code=H12 desc="Request timeout" method=GET path="/api/datapoints?freq=d&name=UST_30YEARDISPLAY&format=json" host=minikep-db.herokuapp.com request_id=f908e353-4c58-4afa-af3b-f1df9f4f52e0 fwd="54.74.71.204" dyno=web.1 connect=0ms service=30001ms status=503 bytes=0 protocol=http

upload tests fail

db/tests/test_views.py

Lines 27 to 54 in 3cec7d0

# FIXME ----------------------------------------------
@pytest.mark.xfail
def test_on_no_auth_token_returns_forbidden_status_error_code(self):
response = self.client.post('/api/datapoints')
assert response.status_code is ERROR_CODES
@pytest.mark.xfail
def test_on_new_data_upload_successfull_with_code_200(self):
_token_dict = dict(API_TOKEN=self.app.config['API_TOKEN'])
_data = json.dumps(self.test_data)
response = self.get_response(data=_data, headers=_token_dict)
assert response.status_code == 200
@pytest.mark.xfail
def test_on_existing_data_upload_successfull_with_code_200(self):
_token_dict = dict(API_TOKEN=self.app.config['API_TOKEN'])
_data = json.dumps(self.test_data[0:10])
response = self.get_response(data=_data, headers=_token_dict)
assert response.status_code == 200
@pytest.mark.xfail
def test_on_broken_data_upload_returns_error_code(self):
_token_dict = dict(API_TOKEN=self.app.config['API_TOKEN'])
response = self.get_response(data="___broken_json_data__", headers=_token_dict)
assert response.status_code in ERROR_CODES
# -----------------------------------------------------

specification for datapoint db

In this pipline, we have the following situation:

  • the parser can deliver a list of dicts, each dict is a datapoint
  • the database should have a POST method to api\incoming to write incoming json to db
  • the POST operation shoudl have some authentication
  • we simplify the rules for now and all data is upserted (newer data overwrites older)
  • the database has GET method styled around the datapoint key (variable name, frequency, start date and end date)
  • the GET operation is public API

@SuperVasya: please extend questions/descriptions.

At this issue we want:

  • description of db methods (based on above)
  • descritpion of incoming/outgoing data
  • list options to implemet (eg flask+sqlalchemy+heroku+postgres)
  • ideally, how to test this (in words/pseudocode)

tests in database

@SuperVasya - do we need to extend tests in database, what is the direction for this?

error: cannot use context manager

def _find_by(session_factory, condition=None):
    # FIXME: why not working with context manager?
    # DetachedInstanceError: Instance <Datapoint at 0x9074ef0> 
    # is not bound to a Session; attribute refresh operation cannot proceed
    with Session(session_factory) as session:
        query = session.query(Datapoint)
        if condition is not None:
            return query.filter_by(**condition).all()
        else:
            return query.all()    

text descriptions of units of measurement

[
    'rog': dict(ru='% к пред. периоду', en='% change to previous period')
    'yoy': dict(ru='% год к году', en='% change to 12 month earlier')
]

Can refactor dictionary below to one above.

UNIT_NAMES = {'bln_rub': 'млрд.руб.',
              'bln_usd': 'млрд.долл.',
              'gdp_percent': '% ВВП',
              'mln_rub': 'млн.руб.',
              'rub': 'руб.',
              'rog': '% к пред. периоду',
              'yoy': '% год к году',
              'ytd': 'период с начала года',
              'pct': '%',
              'bln_tkm': 'млрд. тонно-км'}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.