Giter Site home page Giter Site logo

kislerdm / gbqschema_converter Goto Github PK

View Code? Open in Web Editor NEW
3.0 2.0 4.0 44 KB

Python library to convert google bigquery table schema to jsonschema

Home Page: https://pypi.org/project/gbqschema-converter

License: MIT License

Python 99.12% Shell 0.88%
python3 google-cloud-platform google-bigquery jsonschema bigquery-schema

gbqschema_converter's Introduction

Google BigQuery Table Schema Converter

license pyversion coverage test downloads

Python library to convert Google BigQuery table schema into draft-07 json schema and vice versa.

The library includes two main modules:

gbqschema_converter
├── gbqschema_to_jsonschema.py
└── jsonschema_to_gbqschema.py

Each of those modules has two main functions:

  • json_representation: corresponds to json output (input for gbqschema_to_jsonschema).
  • sdk_representation: corresponds to Google Python SDK format output (input for gbqschema_to_jsonschema).

Installation

python3 -m venv env && source ${PWD}/env/bin/activate
(env) pip install --no-cache-dir gbqschema_converter

Usage: CLI

Convert json-schema to GBQ table schema

(env) json2gbq -h
usage: json2gbq [-h] (-i INPUT | -f FILE)

Google BigQuery Table Schema Converter

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input object as string.
  -f FILE, --file FILE  Input object as file path.

Example: stdin

Execution:

(env) json2gbq -i '{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "format": "date-time"
        },
        "att_07": {
          "type": "string",
          "format": "time"
        }
      },
      "required": [
        "att_02"
      ]
    }
  }
}'

Output:

2020-04-08 21:42:51.700 [INFO ] [Google BigQuery Table Schema Converter] Output (5.52 ms elapsed):
[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]

Example: file

Execution:

(env) json2gbq -f ${PWD}/data/jsonschema.json

Output:

2020-04-08 21:57:25.516 [INFO ] [Google BigQuery Table Schema Converter] Output (6.39 ms elapsed):
[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "STRING",
    "mode": "NULLABLE"
  }
]

Convert GBQ table schema to json-schema

(env) gbq2json -h
usage: gbq2json [-h] (-i INPUT | -f FILE)

Google BigQuery Table Schema Converter

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input object as string.
  -f FILE, --file FILE  Input object as file path.

Example: stdin

Execution:

(env) gbq2json -i '[
  {
    "description": "Att 1",
    "name": "att_01",
    "type": "INTEGER",
    "mode": "NULLABLE"
  },
  {
    "description": "Att 2",
    "name": "att_02",
    "type": "NUMERIC",
    "mode": "REQUIRED"
  },
  {
    "name": "att_03",
    "type": "STRING",
    "mode": "NULLABLE"
  },
  {
    "name": "att_04",
    "type": "BOOLEAN",
    "mode": "NULLABLE"
  },
  {
    "name": "att_05",
    "type": "DATE",
    "mode": "NULLABLE"
  },
  {
    "name": "att_06",
    "type": "DATETIME",
    "mode": "NULLABLE"
  },
  {
    "name": "att_07",
    "type": "TIMESTAMP",
    "mode": "NULLABLE"
  }
]'

Output:

2020-04-08 21:51:05.370 [INFO ] [Google BigQuery Table Schema Converter] Output (1.08 ms elapsed):
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "pattern": "^[0-9]{4}-((|0)[1-9]|1[0-2])-((|[0-2])[1-9]|3[0-1])(|T)((|[0-1])[0-9]|2[0-3]):((|[0-5])[0-9]):((|[0-5])[0-9])(|.[0-9]{1,6})$"
        },
        "att_07": {
          "type": "string",
          "format": "date-time"
        }
      },
      "additionalProperties": false,
      "required": [
        "att_02"
      ]
    }
  }
}

Example: file

Execution:

(env) gbq2json -f ${PWD}/data/gbqschema.json

Output:

2020-04-08 21:55:20.275 [INFO ] [Google BigQuery Table Schema Converter] Output (1.72 ms elapsed):
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element"
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
          "description": "Att 2"
        },
        "att_03": {
          "type": "string"
        },
        "att_04": {
          "type": "boolean"
        },
        "att_05": {
          "type": "string",
          "format": "date"
        },
        "att_06": {
          "type": "string",
          "pattern": "^[0-9]{4}-((|0)[1-9]|1[0-2])-((|[0-2])[1-9]|3[0-1])(|T)((|[0-1])[0-9]|2[0-3]):((|[0-5])[0-9]):((|[0-5])[0-9])(|.[0-9]{1,6})$"
        },
        "att_07": {
          "type": "string",
          "format": "date-time"
        }
      },
      "additionalProperties": false,
      "required": [
        "att_02"
      ]
    }
  }
}

Usage: python program

Convert json-schema to GBQ table schema

Example: output as json

from gbqschema_converter.jsonschema_to_gbqschema import json_representation as converter

schema_in = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element",
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
        },
      }
      "required": [
        "att_02",
      ],
    },
  },
}

schema_out = converter(schema_in)
print(schema_out)

Output:

[{'description': 'Att 1', 'name': 'att_01', 'type': 'INTEGER', 'mode': 'NULLABLE'}, {'name': 'att_02', 'type': 'NUMERIC', 'mode': 'REQUIRED'}]

Example: output as list of SchemaField (SDK format)

from gbqschema_converter.jsonschema_to_gbqschema import sdk_representation as converter

schema_in = {
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "array",
  "items": {
    "$ref": "#/definitions/element",
  },
  "definitions": {
    "element": {
      "type": "object",
      "properties": {
        "att_01": {
          "type": "integer",
          "description": "Att 1"
        },
        "att_02": {
          "type": "number",
        },
      },
      "required": [
        "att_02",
      ],
    },
  },
}

schema_out = converter(schema_in)
print(schema_out)

Output:

[SchemaField('att_01', 'INTEGER', 'NULLABLE', 'Att 1', ()), SchemaField('att_02', 'NUMERIC', 'REQUIRED', None, ())]

Convert GBQ table schema to json-schema

Example: output as json

from gbqschema_converter.gbqschema_to_jsonschema import json_representation as converter

schema_in = [
    {
        'description': 'Att 1',
        'name': 'att_01',
        'type': 'INTEGER',
        'mode': 'NULLABLE'
    },
    {
        'name': 'att_02',
        'type': 'NUMERIC',
        'mode': 'REQUIRED'
    }
]

schema_out = converter(schema_in)
print(schema_out)

Output:

{'$schema': 'http://json-schema.org/draft-07/schema#', 'type': 'array', 'items': {'$ref': '#/definitions/element'}, 'definitions': {'element': {'type': 'object', 'properties': {'att_01': {
'type': 'integer', 'description': 'Att 1'}, 'att_02': {'type': 'number'}}, 'additionalProperties': False, 'required': ['att_02']}}}

Example: output as list of SchemaField (SDK format)

from gbqschema_converter.gbqschema_to_jsonschema import sdk_representation as converter
from google.cloud.bigquery import SchemaField

schema_in = [
    SchemaField('att_01', 'INTEGER', 'NULLABLE', 'Att 1', ()),
    SchemaField('att_02', 'NUMERIC', 'REQUIRED', None, ()),
]

schema_out = converter(schema_in)
print(schema_out)

Output:

{'$schema': 'http://json-schema.org/draft-07/schema#', 'type': 'array', 'items': {'$ref': '#/definitions/element'}, 'definitions': {'element': {'type': 'object', 'properties': {'att_01': {
'type': 'integer', 'description': 'Att 1'}, 'att_02': {'type': 'number'}}, 'additionalProperties': False, 'required': ['att_02']}}}

gbqschema_converter's People

Contributors

kislerdm avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

gbqschema_converter's Issues

support mode : REPEATED

Hello -- I was hoping to use this library to help with validating json objects before uploading to bigquery however it doesnt seem to support mode REPEATED objects (arrays).

schema converter returns empty list

This schema returns an empty list when using: schema_convert = module.json_representation(schema)

{
          "definitions": {},
          "$schema": "http://json-schema.org/draft-07/schema#",
          "type": "object",
          "title": "Add Account Schema",
          "required": [
            "AppName",
            "Provider",
            "AccountId"
          ],
          "properties": {
            "AppName": {
              "type": "string",
              "title": "Application Name",
              "pattern": "^[a-zA-Z]{1}[-a-zA-Z0-9_\\s]{0,39}$"
            },
            "Provider": {
              "type": "string",
              "title": "Cloud Service Provider",
              "enum": [
                "aws",
                "gcp",
                "azure"
              ]
            },
            "AccountId": {
              "type": "string",
              "title": "AWS Account Id",
              "pattern": "^[0-9]{12}$"
            },
            "Features": {
              "type": "object",
              "title": " Account Features",
              "required": [
                "Logs",
                "Encryption",
                "Backup"
              ],
              "properties": {
                "Logs": {
                  "type": "boolean",
                  "title": "Enable Logs"
                },
                "Encryption": {
                  "type": "boolean",
                  "title": "Enable Encryption"
                },
                "Backup": {
                  "type": "boolean",
                  "title": "Enable Backup"
                }
              },
              "additionalProperties": False
            }
          },
          "additionalProperties": False
        }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.