Giter Site home page Giter Site logo

openenergyplatform / oem2orm Goto Github PK

View Code? Open in Web Editor NEW
2.0 8.0 1.0 121 KB

Repository for the oem2orm tool. oem2orm creates SQL tables from open energy metadata strings. It also allows to upload and download metadata to and from a table on the OEP.

License: GNU General Public License v3.0

Python 100.00%
oep sqlalchemy-engine oemetadata database-migrations

oem2orm's Introduction

oem 2 orm

Create database tables (and schema) from oemetadata json file(s). This tool is part of the open-energy-metadata (OEM) integration into the OEP.

Installation

You can install pacakge using standard python installation: pip install oem2orm

or if you interested in CLI-version only you can install it using pipx (pipx must be installed): pipx install oem2orm see Pipx-Documentation for further information.

Usage

Read the Restrictions section and have a look at our tutorial section to get more information about the usage of oem2orm either as code module or CLI tool. The tutorials also provide information how to validate your oemetadata files.

Restrictions

To use this tool with the OEP API you need to be signed up to the OEP since you need to provide an API-Token.

For security reasons, tables can only be created in existing schemas and just in the schemas "model_draft" and "sandbox".

Keep in mind that f.e. the postgres types (column datatype) are not fully supported by the oedialct - work in progress.

Docs

Database connection

We use a global namedtuple called "DB" To store the sqlalchemy connection objects engine and metadata. The namedtuple is available wen import oem2orm in a script. To establish the namedtuple use the function setup_db_connection(). Now you can use DB.engine or DB.metadata. In the background the connection is established using oedialect and the http API of the oeplatform website.

oem2orm generator

The table objects (ORM) are generated on the fly from an oemetadata.json file. oemetadata is a metadata specification of the Open Energy Family. It includes about 50 fields that can be used to provide metadata for tabular data resources. A subset of these fields are grouped in the key "resources" (see out example) in the metadata. These fields describe the schema of the data table (like table name, columns, data types & table relations).

The method oem2orm provides to create data tables on the OEP. It is especially useful if you attempt to automate the table creation and already use python or already have a oemetadata file available. The alternatives are:

  1. manually describing the table object in JSON and then use the oep HTTP API directly to create a table.
  2. Use the User Interface of the oeplatform website to create a table and upload data.

Oemetadata format

Specification for the oemetadata

Oemetadata validation

The oemetadata specification is integrated into the open energy platform using a tool called omi (metadata integration). OMI provides functionality to run validation checks on the metadata up to the oemetadata version 1.6.0. oem2orm also provides a minimal oep compliance check that mocks the checks that are run on the oep website once the metadata is uploaded to a table.

Supported column data types

Currently oem2orm supports

    "bigint"
    "int":
    "integer"
    "varchar"
    "json"
    "text"
    "timestamp"
    "interval"
    "string"
    "float"
    "boolean"
    "date"
    "hstore"
    "decimal"
    "numeric"
    "double precision"

Spatial Types

"geometry point": Geometry("POINT",  spatial_index=False),
"geom": Geometry("GEOMETRY",  spatial_index=False),
"geometry": Geometry("GEOMETRY",  spatial_index=False),

We create columns with spatial datatypes using Geoalchemy2.

Database support

We only tested this tool with PostgreSQL & sqlalchemy version 1.3

oem2orm's People

Contributors

chrwm avatar henhuy avatar jh-rli avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

henhuy

oem2orm's Issues

Error while updating metadata on OEP is not forwarded

When updating metadata on the OEP via oep_oedialect_oem2orm.api_updateMdOnTable status_code of response is checked whether it is 200. If an error occurs (aka status_code != "200") the error gets logged, but no error is thrown.
Thus, a user cannot know whether updated worked or not.
I would suggest throwing a `MetadataError``with information from response if upload fails.

Handle single table objects, not list of tables

Requested by @jnnr:

Can we use the same function, but for single tables? This would allow us to not leave the loop here.

What I mean: Do not do this:

for table in tables:

and this within the function
metadata_files = [
str(file) for file in oem_folder_path.iterdir() if file.suffix == ".json"
]
for metadata_file in metadata_files:

Metadata to ORM feature

I requested a feature to convert the oemetadata JSON to ORM.
This makes the publication of data easy.

My collegue @henhuy developed this package md2orm and @jh-RLI pushed it to this repo.

Planned toolchain for Data Upload:

2020-03-09_MD2OEP-upload_0 2

  1. Create metadata from data
  2. Create ORM from metadata using MD2ORM
  3. Create ERM usind ERAlchemy
  4. Upload data to OEP using oedialect
  5. Upload metadata and add tags

Improve Docs

The documentation is still not complete.

  • Include requirements on OEM fields
  • Rework Usage

OEDialect and SQLAlchemy versions not working

Two errors in setup.py:

  1. OEDialect is set to v1.1 but must be 0.1.1 instead
  2. SQLAlchemy v1.3.14 is not working with latest OEDialect (>=1.3.16 there), thus this must be upgraded to v1.3.16 (latest version is not working - don't know which one latest working version)

Upload with oem2orm dtype `int` is interpreted as `bigint`

Using this tutorial and this metadata yields the table below on the OEP

grafik

Upload with example_df.to_sql(table_name, connection, schema=schema, if_exists='append', index=False) yields the error:

Out of range float values are not JSON compliant
Writing to umas_potentials failed!

and no data will be uploaded.

However, when I use https://modex.rl-institut.de/create_table/ to create and upload the table, this yields

grafik

and upload works fine.

The columns id and id_scal both contain integer values.

Final table: https://openenergy-platform.org/dataedit/view/model_draft/umas_potentials

Improve error messages

Some error messages are not shown to the user

  1. ce in

    except oedialect.engine.ConnectionException as ce:
    error_msg = f'Error when uploading table "{table.name}".'
    logging.error(error_msg)
    raise DatabaseError(error_msg) from ce

  2. null in type field results in internal server error,

Reproduce with:
Metadata meta_dat.json
Error msg:

    if item.split(" ")[-1] == "array":
       ^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'split'

try:
column_type = TYPES[field["type"]]
except KeyError:
raise MetadataError(
"Unknown column type", field, field["type"], metadata_file
)

Improve error and status messages when uploading tables

Uploading tables with oem2orm, e.g. using the script from the tutorial section, is not user-friendly due a lack of helpful error messages.

  • Inform the user which table is processed. Currently, when the table creation fails and throws a bad/unspecific error message, it is not clear which table is the reason (in case of uploading a bulk of tables)
  • Improve the error messages when violating the postgresql naming conventions for column headers and resources-name. Relates to #16 (comment)

How to clean release: Commits differ on master and develop post release

@henhuy, as you noticed, there is always (I assume since you merged master back to develop) a diff between master and develop after I do a release - I follow the Vincent Diressen workflow, but I think I missed something as sometimes the branches are in sync, but most of the time they differ by 2-3 commits, but actually contain the same content. Maybe this is because github creates additional merge commits? Or do i have to use another merge strategy.

Maybe you have an idea how to make it cleaner :)

git branching model

Parsing error when opening metadata

When using the package, a parsing error occurred when opening the metadata file in oep_oedialect_oem2orm.py

Changing the argument in line 169 from "r" to "rb" fixed the issue.

with open(metadata_file, "rb") as metadata_json:
        metadata = json.loads(metadata_json.read())
    tables_raw = jmespath.search("resources", metadata)

Update requirements.txt

When cloning the repo and creating and environment for development purposes from requirements.txt, I stumbled upon requirement problems for pandas and sqlalchemy while testing code.

@jh-RLI is it fair to fix both package versions to pandas==1.2.4, sqlalchemy==1.3.16 as noted here or does anything speaks against it? If only the latter is possible then a dev note would be nice.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.