Giter Site home page Giter Site logo

oca-conductor's Introduction

OCA conductor

Introduction

One of the fundamental concepts behind OCA is data harmonization. To harmonize data is to unify two or more datasets to be compliant to each other so that further processing can be performed on unified data. Harmonization may take place on the dataset level, where as a result all the datasets become unified to the same schema. Another type of harmonization is to achieve a compliant subset among schemas. The process to achieve harmonization is called a transformation. Transformation overlays are the recipe to how actually data (or schema) should be transformed.

OCA conductor serves dual purpose:

  • data validation where provided data record is verified against schema defined within OCA Bundle;
  • data transformation where injected datasets are transformed to the new state using transformation overlays.

OCA conductor is written in Rust thanks to which most C-based languages can be supported by exposing proper FFI-layer. We provide bindings for the following languages:

License

EUPL 1.2

We have distilled the most crucial license specifics to make your adoption seamless: see here for details.

Installation

Rust

[dependencies]
oca_conductor = "0.2.12"

Typescript and JavaScript (Node.JS based)

  • For oca-transformer
    npm i oca-data-transformer
    
  • For oca-validator
    npm i oca-data-validator
    

Python (validator)

Prerequisites:
Install Rust - https://www.rust-lang.org

Clone repository and go to bindings/ffi/validator directory

git clone https://github.com/THCLab/oca-conductor.git
cd bindings/ffi/validator

Run

cargo build --release --features validator
cargo run --bin uniffi-bindgen generate src/validator.udl -l python
cp target/release/libvalidator.so src/libuniffi_validator.so

All you need are validator.py and libuniffi_validator.so files from src directory.

Example usage:

from validator import *

oca_str = """
{
  "capture_base": {
    "attributes": {
      "n1": "Text",
      "n2": "Numeric"
    },
    "classification": "",
    "digest": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
    "flagged_attributes": [],
    "type": "spec/capture_base/1.0"
  },
  "overlays": [
    {
      "attribute_entry_codes": {
        "n1": ["op1", "op2"]
      },
      "capture_base": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
      "digest": "E4L-BukSBsqZoDDIJvw4_gGjAJs5It4UUfiA200lGup0",
      "type": "spec/overlays/entry_code/1.0"
    },
    {
      "attribute_conformance": {
        "n1": "M"
      },
      "capture_base": "ElNWOR0fQbv_J6EL0pJlvCxEpbu4bg1AurHgr_0A7LKc",
      "digest": "E4L-BukSBsqZoDDIJvw4_gGjAJs5It4UUfiA200lGup0",
      "type": "spec/overlays/conformance/1.0"
    }
  ]
}
"""

validator = Validator(oca_str)

validator.set_constraints(ConstraintsConfig(fail_on_additional_attributes=True))

try:
    validator.validate(
    """
    [
        {
            "n1": "op",
            "additional": 1
        },
        {
            "n2": 2
        }
    ]
    """
    )
except ValidationErrors.List as error_list:
    for error in error_list.errors:
        print(error)

# Result:
# ValidationError.Other(data_set='0', record='0', attribute_name='n1', message='\'n1\' value ("op") must be one of ["op1", "op2"]')
# ValidationError.Other(data_set='0', record='0', attribute_name='additional', message='unknown_attribute')
# ValidationError.Other(data_set='0', record='1', attribute_name='n1', message='missing_attribute')

Transformation overlays

Attribute Mapping Overlay

An Attribute Mapping Overlay defines attribute mappings between two distinct data models. The example below shows the schema manipulation of the source attribute name and the destination attribute name.

  {
    "attribute_mapping": {
      "firstName": "first_name",
      "surname": "last_name"
    },
    "capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
    "digest": "EBfxGsWV4aQGRfRsVH_tDfQCFaBnNqA7cq8YW1Q3DdwA",
    "type": "spec/overlays/mapping/1.0"
  }

Entry Mapping Overlay

An Entry Code Mapping Overlay defines entry key mappings between two distinct code tables (aka lookup tables) or datasets. The example below shows the mapping between three letter country codes into two letter country codes.

  {
    "attribute_entry_codes_mapping": {
      "country_code": [
        "AF:AFG","AL:ALB","DZ:DZA","AS:ASM","AD:AND","AO:AGO","AI:AIA","AQ:ATA","AG:ATG","AR:ARG","AM:ARM","AW:ABW","AU:AUS","AT:AUT","AZ:AZE","BS:BHS","BH:BHR","BD:BGD","BB:BRB","BY:BLR","BE:BEL","BZ:BLZ","BJ:BEN","BM:BMU","BT:BTN","BO:BOL","BQ:BES","BA:BIH","BW:BWA","BV:BVT","BR:BRA","IO:IOT","BN:BRN","BG:BGR","BF:BFA","BI:BDI","CV:CPV","KH:KHM","CM:CMR","CA:CAN","KY:CYM","CF:CAF","TD:TCD","CL:CHL","CN:CHN","CX:CXR","CC:CCK","CO:COL","KM:COM","CD:COD","CG:COG","CK:COK","CR:CRI","HR:HRV","CU:CUB","CW:CUW","CY:CYP","CZ:CZE","CI:CIV","DK:DNK","DJ:DJI","DM:DMA","DO:DOM","EC:ECU","EG:EGY","SV:SLV","GQ:GNQ","ER:ERI","EE:EST","SZ:SWZ","ET:ETH","FK:FLK","FO:FRO","FJ:FJI","FI:FIN","FR:FRA","GF:GUF","PF:PYF","TF:ATF","GA:GAB","GM:GMB","GE:GEO","DE:DEU","GH:GHA","GI:GIB","GR:GRC","GL:GRL","GD:GRD","GP:GLP","GU:GUM","GT:GTM","GG:GGY","GN:GIN","GW:GNB","GY:GUY","HT:HTI","HM:HMD","VA:VAT","HN:HND","HK:HKG","HU:HUN","IS:ISL","IN:IND","ID:IDN","IR:IRN","IQ:IRQ","IE:IRL","IM:IMN","IL:ISR","IT:ITA","JM:JAM","JP:JPN","JE:JEY","JO:JOR","KZ:KAZ","KE:KEN","KI:KIR","KP:PRK","KR:KOR","KW:KWT","KG:KGZ","LA:LAO","LV:LVA","LB:LBN","LS:LSO","LR:LBR","LY:LBY","LI:LIE","LT:LTU","LU:LUX","MO:MAC","MG:MDG","MW:MWI","MY:MYS","MV:MDV","ML:MLI","MT:MLT","MH:MHL","MQ:MTQ","MR:MRT","MU:MUS","YT:MYT","MX:MEX","FM:FSM","MD:MDA","MC:MCO","MN:MNG","ME:MNE","MS:MSR","MA:MAR","MZ:MOZ","MM:MMR","NA:NAM","NR:NRU","NP:NPL","NL:NLD","NC:NCL","NZ:NZL","NI:NIC","NE:NER","NG:NGA","NU:NIU","NF:NFK","MP:MNP","NO:NOR","OM:OMN","PK:PAK","PW:PLW","PS:PSE","PA:PAN","PG:PNG","PY:PRY","PE:PER","PH:PHL","PN:PCN","PL:POL","PT:PRT","PR:PRI","QA:QAT","MK:MKD","RO:ROU","RU:RUS","RW:RWA","RE:REU","BL:BLM","SH:SHN","KN:KNA","LC:LCA","MF:MAF","PM:SPM","VC:VCT","WS:WSM","SM:SMR","ST:STP","SA:SAU","SN:SEN","RS:SRB","SC:SYC","SL:SLE","SG:SGP","SX:SXM","SK:SVK","SI:SVN","SB:SLB","SO:SOM","ZA:ZAF","GS:SGS","SS:SSD","ES:ESP","LK:LKA","SD:SDN","SR:SUR","SJ:SJM","SE:SWE","CH:CHE","SY:SYR","TW:TWN","TJ:TJK","TZ:TZA","TH:THA","TL:TLS","TG:TGO","TK:TKL","TO:TON","TT:TTO","TN:TUN","TR:TUR","TM:TKM","TC:TCA","TV:TUV","UG:UGA","UA:UKR","AE:ARE","GB:GBR","UM:UMI","US:USA","UY:URY","UZ:UZB","VU:VUT","VE:VEN","VN:VNM","VG:VGB","VI:VIR","WF:WLF","EH:ESH","YE:YEM","ZM:ZMB","ZW:ZWE","AX:ALA"
      ]
    },
    "capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
    "digest": "EnBjuRoJs8TmfCQJdF8BdmmHwEUbCRrw45xwIFCS7qdY",
    "type": "spec/overlays/entry_code_mapping/1.0"
  }

Unit Mapping Overlay

A Unit Mapping Overlay defines target units when converting between different units of measurement. The example below demands measurement recalculation into mmol/L unit, where the source unit is known from the Unit Overlay of the OCA Bundle.

  {
    "attribute_units": {
      "blood_glucose": "mmol/L"
    },
    "capture_base": "EzMTX96ZkwRBzkUBI9c8jiXfHa6CfKJyrd6uH6vjIUZc",
    "digest": "EwpEtyZgBF0B9KXEufQQdeS2aN1GL4JIMcpVRffH3_pg",
    "metric_system": "nonSI",
    "type": "spec/overlays/unit/1.0"
  }

oca-conductor's People

Contributors

blelump avatar olichwiruk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.