Giter Site home page Giter Site logo

Comments (9)

epogrebnyak avatar epogrebnyak commented on June 24, 2024

TODO-DOCS:

  • edit module ReStructuredText docstring for better appearance (no hanging lines)

TODO-REFACTOR:

  • extract usage SPEC and UNIT from tables.py - write down how there are called here in issue comments
    On separate branch or local file:
  • refactor Scope, Defintion and Specification classes based on proposal below
  • write some asserts where seems necessary/critical, asserts will become parts of tests on next step

NOT TODO YET:

  • re-write existing parsing instruction for SPEC

from parser-rosstat-kep.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

Refactoring proposal: we have too many append methods in Definition/Scope, Scopes are added to Specification. Better logic is (edited):

  • init Scope
  • append more start and end lines to Scope if necessary
  • init Definition with Scope and reader
  • append Indicators to Definition using append()
  • make Specification a collection of Definitions (not Scopes)

Tentative call:

# change 1 - init of empty Definition (not too good from OOP
# perspective, but we also consider ease of use for writing a definition
# and a logic that Specification is collection of Definitions)

main = Definition(scope=None, reader=None)
main.append(varname="GDP",
            text=["Oбъем ВВП",
                  "Индекс физического объема произведенного ВВП, в %",
                  "Валовой внутренний продукт"],
            required_units=["bln_rub", "yoy"],
            desc="Валовый внутренний продукт (ВВП)",
            sample="1999	4823	901	1102	1373	1447")
SPEC = Specification(main)


sc = Scope(start="3.5. Индекс потребительских цен",
           end="4. Социальная сфера")
# change 2 - method name
sc.append(start="4.5. Индекс потребительских цен",
           end="5. Социальная сфера")           
# change 3 - init and composition
d = Definition(sc, reader = None)
d.append("CPI",
          text="Индекс потребительских цен",
          required_units="rog",
          desc="Индекс потребительских цен (ИПЦ)")
d.append("CPI_NONFOOD",
          text=["непродовольственные товары",
                "непродовольст- венные товары"],
          required_units="rog",
          desc="ИПЦ (непродтовары)")
# change 4 - some methods will be depreciated
# change 5 - other changes needed?
SPEC.append(sc)
# change 6 - is access from outside unaffected?

from parser-rosstat-kep.

bakakaldsas avatar bakakaldsas commented on June 24, 2024

Note(EP): depreciated, here we do not need a global default for units_mapper_dict

usage in rows.py
import:

import kep.spec as spec

in:

class Row:

as default parameter:

   def get_unit(self, units_mapper_dict=spec.UNITS):
      for k in units_mapper_dict.keys():
           if k in self.name:
               return units_mapper_dict[k]
       return False

from parser-rosstat-kep.

bakakaldsas avatar bakakaldsas commented on June 24, 2024

UNITS usage in tables.py
Import:

from kep.spec import UNITS

In:

class Tables:
    def __init__(self, _rows, spec=SPEC, units=UNITS):
...
        self.units = units
...

Later used in as self.units:

    def yield_tables(self):
        for csv_segment, pdef in self.to_parse:
            for t in self.extract_tables(csv_segment, pdef, self.units):
                yield t

passed to as units_dict:

    @staticmethod
    def extract_tables(csv_segment, pdef, units_dict):
        # yield tables from csv_segment
        tables = split_to_tables(csv_segment)
        # parse tables to obtain labels
        varnames_dict = pdef.headers
        tables = [t.set_label(varnames_dict, units_dict) for t in tables]
...

In:

class Table:

EP: method parse depreciated (set_label and set_splitter are used instead):

In method parse as units_dict however I didn't find any call to it.

  def parse(self, varnames_dict, units_dict, funcname):
       self.set_label(varnames_dict, units_dict)
       self.set_splitter(funcname)
       return self

In method set_label as units_dict:

    def set_label(self, varnames_dict, units_dict):
        for row in self.headers:
            varname = row.get_varname(varnames_dict)
            if varname:
                self.varname = varname
                self.lines[row.name] = self.KNOWN
            unit = row.get_unit(units_dict)
            if unit:
                self.unit = unit
                self.lines[row.name] = self.KNOWN
        return self

from parser-rosstat-kep.

bakakaldsas avatar bakakaldsas commented on June 24, 2024

SPEC usege in tables.py
Import:

from kep.spec import SPEC

In:

class Tables:
...
    def __init__(self, _rows, spec=SPEC, units=UNITS):
        self.rowstack = RowStack(_rows)
        self.spec = spec
        self.units = units
        self.required = [make_label(varname, unit)
                         for varname, unit in spec.required()]
        self.make_queue()

    def make_queue(self):
...
        self.to_parse = []
        for scope in self.spec.scopes:
            # find segemnt limits
            start, end = scope.get_bounds(self.rowstack.rows)
            # pop csv segment
            csv_segment = self.rowstack.pop(start, end)
            # get current parsing definition
            pdef = scope.get_parsing_definition()
            self.to_parse.append([csv_segment, pdef])
        csv_segment = self.rowstack.remaining_rows()
        pdef = self.spec.get_main_parsing_definition()
        self.to_parse.append([csv_segment, pdef])

from parser-rosstat-kep.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

Thanks for extracting the calls, it is very useful perspective.

from parser-rosstat-kep.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

New spec.py classes are here:

  • restore usage in tables.py - existing tests should pass
  • agree on class structure and usage
  • docstrings editing
    • bring usage examples from issue #38 to module docstrings
    • ParsingInstruction docstring
    • other docstrings
  • tests
    • write some assert's in file
    • write or edit tests
    • of tests are larger - explain constants, mocks, fixtures (will be an issues in tables.py)
  • convert existing definitions to new format
  • list of PROPOSALs marked in text
    • make labels
    • transformations layer diff GOV_ACCUM
    • use sample data in required for as test values
    • short names for variables in FRED style, short=

Remaining classes:

  • Definitinon
  • Scope
  • Specification

from parser-rosstat-kep.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

@bakakaldsas, most review for spec.py done, can close after you approve

Remaining todos (not todo now):

  • # TODO: add more definitons
  • # TODO: transformations layer diff GOV_ACCUM
  • # TODO: use sample in required
  • # TODO: short names for variables in FRED style, short=

from parser-rosstat-kep.

epogrebnyak avatar epogrebnyak commented on June 24, 2024

Replaces #22 and #15.

Finished review, proceeding to #52.

Job description saved in https://github.com/epogrebnyak/mini-kep/edit/master/todo_refactoring.md and also below:

Intent: document and test current stable version (found in master branch)

By module do:

  • (optional) propose/discuss/commit/eye review changes to code
  • write assert statements and edit/enhance tests
  • edit module docstrings:
    • module
    • classes, public methods and functions used in other parts of the program
    • other docstrings/comments where needed
  • code examples for documentation where needed
  • make list of refactoring / enhancements by writing TODO and FIXME in code

For docstrings we use https://google.github.io/styleguide/pyguide.html#Comments.

For testing we use py.test and make follwoing kinds of tests depending on situation:

  • unit tests:
    • a dumb tests with no control values to make sure a method is callable (like for __repr__()),
    • unit test with simple control values for a small fucntion / method, ususally public ones
  • behaviour tests with fixtures / mocks for larger functionalities
  • end-to-end test on sample or real data

Finished work example: kep.files:

Must reference this #38 in #52

from parser-rosstat-kep.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.