Comments (9)
TODO-DOCS:
- edit module ReStructuredText docstring for better appearance (no hanging lines)
TODO-REFACTOR:
- extract usage SPEC and UNIT from tables.py - write down how there are called here in issue comments
On separate branch or local file: - refactor Scope, Defintion and Specification classes based on proposal below
- write some
assert
s where seems necessary/critical, asserts will become parts of tests on next step
NOT TODO YET:
- re-write existing parsing instruction for SPEC
from parser-rosstat-kep.
Refactoring proposal: we have too many append methods in Definition/Scope, Scopes are added to Specification. Better logic is (edited):
- init Scope
- append more start and end lines to Scope if necessary
- init Definition with Scope and reader
- append Indicators to Definition using append()
- make Specification a collection of Definitions (not Scopes)
Tentative call:
# change 1 - init of empty Definition (not too good from OOP
# perspective, but we also consider ease of use for writing a definition
# and a logic that Specification is collection of Definitions)
main = Definition(scope=None, reader=None)
main.append(varname="GDP",
text=["Oбъем ВВП",
"Индекс физического объема произведенного ВВП, в %",
"Валовой внутренний продукт"],
required_units=["bln_rub", "yoy"],
desc="Валовый внутренний продукт (ВВП)",
sample="1999 4823 901 1102 1373 1447")
SPEC = Specification(main)
sc = Scope(start="3.5. Индекс потребительских цен",
end="4. Социальная сфера")
# change 2 - method name
sc.append(start="4.5. Индекс потребительских цен",
end="5. Социальная сфера")
# change 3 - init and composition
d = Definition(sc, reader = None)
d.append("CPI",
text="Индекс потребительских цен",
required_units="rog",
desc="Индекс потребительских цен (ИПЦ)")
d.append("CPI_NONFOOD",
text=["непродовольственные товары",
"непродовольст- венные товары"],
required_units="rog",
desc="ИПЦ (непродтовары)")
# change 4 - some methods will be depreciated
# change 5 - other changes needed?
SPEC.append(sc)
# change 6 - is access from outside unaffected?
from parser-rosstat-kep.
Note(EP): depreciated, here we do not need a global default for units_mapper_dict
usage in rows.py
import:import kep.spec as spec
in:
class Row:
as default parameter:
def get_unit(self, units_mapper_dict=spec.UNITS): for k in units_mapper_dict.keys(): if k in self.name: return units_mapper_dict[k] return False
from parser-rosstat-kep.
UNITS usage in tables.py
Import:
from kep.spec import UNITS
In:
class Tables:
def __init__(self, _rows, spec=SPEC, units=UNITS):
...
self.units = units
...
Later used in as self.units:
def yield_tables(self):
for csv_segment, pdef in self.to_parse:
for t in self.extract_tables(csv_segment, pdef, self.units):
yield t
passed to as units_dict:
@staticmethod
def extract_tables(csv_segment, pdef, units_dict):
# yield tables from csv_segment
tables = split_to_tables(csv_segment)
# parse tables to obtain labels
varnames_dict = pdef.headers
tables = [t.set_label(varnames_dict, units_dict) for t in tables]
...
In:
class Table:
EP: method parse depreciated (set_label and set_splitter are used instead):
In method parse as units_dict however I didn't find any call to it.
def parse(self, varnames_dict, units_dict, funcname): self.set_label(varnames_dict, units_dict) self.set_splitter(funcname) return self
In method set_label as units_dict:
def set_label(self, varnames_dict, units_dict):
for row in self.headers:
varname = row.get_varname(varnames_dict)
if varname:
self.varname = varname
self.lines[row.name] = self.KNOWN
unit = row.get_unit(units_dict)
if unit:
self.unit = unit
self.lines[row.name] = self.KNOWN
return self
from parser-rosstat-kep.
SPEC usege in tables.py
Import:
from kep.spec import SPEC
In:
class Tables:
...
def __init__(self, _rows, spec=SPEC, units=UNITS):
self.rowstack = RowStack(_rows)
self.spec = spec
self.units = units
self.required = [make_label(varname, unit)
for varname, unit in spec.required()]
self.make_queue()
def make_queue(self):
...
self.to_parse = []
for scope in self.spec.scopes:
# find segemnt limits
start, end = scope.get_bounds(self.rowstack.rows)
# pop csv segment
csv_segment = self.rowstack.pop(start, end)
# get current parsing definition
pdef = scope.get_parsing_definition()
self.to_parse.append([csv_segment, pdef])
csv_segment = self.rowstack.remaining_rows()
pdef = self.spec.get_main_parsing_definition()
self.to_parse.append([csv_segment, pdef])
from parser-rosstat-kep.
Thanks for extracting the calls, it is very useful perspective.
from parser-rosstat-kep.
New spec.py classes are here:
- restore usage in tables.py - existing tests should pass
- agree on class structure and usage
- docstrings editing
- bring usage examples from issue #38 to module docstrings
- ParsingInstruction docstring
- other docstrings
- tests
- write some assert's in file
- write or edit tests
- of tests are larger - explain constants, mocks, fixtures (will be an issues in tables.py)
- convert existing definitions to new format
- list of PROPOSALs marked in text
- make labels
- transformations layer diff GOV_ACCUM
- use sample data in required for as test values
- short names for variables in FRED style,
short=
Remaining classes:
- Definitinon
- Scope
- Specification
from parser-rosstat-kep.
@bakakaldsas, most review for spec.py done, can close after you approve
Remaining todos (not todo now):
- # TODO: add more definitons
- # TODO: transformations layer diff GOV_ACCUM
- # TODO: use sample in required
- # TODO: short names for variables in FRED style, short=
from parser-rosstat-kep.
Finished review, proceeding to #52.
Job description saved in https://github.com/epogrebnyak/mini-kep/edit/master/todo_refactoring.md and also below:
Intent: document and test current stable version (found in master branch)
By module do:
- (optional) propose/discuss/commit/eye review changes to code
- write assert statements and edit/enhance tests
- edit module docstrings:
- module
- classes, public methods and functions used in other parts of the program
- other docstrings/comments where needed
- code examples for documentation where needed
- make list of refactoring / enhancements by writing TODO and FIXME in code
For docstrings we use https://google.github.io/styleguide/pyguide.html#Comments.
For testing we use py.test and make follwoing kinds of tests depending on situation:
- unit tests:
- a dumb tests with no control values to make sure a method is callable (like for
__repr__()
), - unit test with simple control values for a small fucntion / method, ususally public ones
- a dumb tests with no control values to make sure a method is callable (like for
- behaviour tests with fixtures / mocks for larger functionalities
- end-to-end test on sample or real data
Finished work example: kep.files:
- code: https://github.com/epogrebnyak/mini-kep/blob/master/src/kep/files.py
- test: https://github.com/epogrebnyak/mini-kep/blob/master/src/kep/tests/test_files.py
- rst: https://github.com/epogrebnyak/mini-kep/blob/master/doc/kep.files.rst
- documentation: http://mini-kep-docs.s3-website-eu-west-1.amazonaws.com/kep.files.html
Must reference this #38 in #52
from parser-rosstat-kep.
Related Issues (20)
- add Vintage.upload() method HOT 1
- processed/latest folder needs better handling
- certain variables not found in Vinatage.validate() HOT 2
- review check procedure HOT 7
- Missing values should not be False at dataframe construction HOT 5
- shorter decimal representation in CSV file HOT 2
- replace Table class with Table2
- add coverable badge
- adapt code to create html with headers and charts HOT 8
- code review for `dev-sceleton` branch
- speed up manage.parse() HOT 3
- create parsing definition for 'profit' variable
- start of minimal example in julia HOT 1
- start of minimal example in go
- clean notebooks folder and dev_scrap branch
- duplicate code: get_year() vs clean year()
- why smaller code has longer running time?
- trace where duplicate values are created
- how to control warnings issue?
- industial goods production
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from parser-rosstat-kep.