Giter Site home page Giter Site logo

datadotworld / datapackage-py Goto Github PK

View Code? Open in Web Editor NEW

This project forked from frictionlessdata/datapackage-py

0.0 2.0 0.0 524 KB

A Python library for working with Data Packages.

License: MIT License

Python 99.58% HTML 0.42%
dwstruct-t01-dist

datapackage-py's Introduction

DataPackage.py

Gitter Build Status Windows Build Status Test Coverage Support Python versions 2.7, 3.3, 3.4 and 3.5

A model for working with Data Packages.

Install

pip install datapackage

Examples

Reading a Data Package and its resource

import datapackage

dp = datapackage.DataPackage('http://data.okfn.org/data/core/gdp/datapackage.json')
brazil_gdp = [{'Year': int(row['Year']), 'Value': float(row['Value'])}
              for row in dp.resources[0].data if row['Country Code'] == 'BRA']

max_gdp = max(brazil_gdp, key=lambda x: x['Value'])
min_gdp = min(brazil_gdp, key=lambda x: x['Value'])
percentual_increase = max_gdp['Value'] / min_gdp['Value']

msg = (
    'The highest Brazilian GDP occured in {max_gdp_year}, when it peaked at US$ '
    '{max_gdp:1,.0f}. This was {percentual_increase:1,.2f}% more than its '
    'minimum GDP in {min_gdp_year}.'
).format(max_gdp_year=max_gdp['Year'],
         max_gdp=max_gdp['Value'],
         percentual_increase=percentual_increase,
         min_gdp_year=min_gdp['Year'])

print(msg)
# The highest Brazilian GDP occured in 2011, when it peaked at US$ 2,615,189,973,181. This was 172.44% more than its minimum GDP in 1960.

Validating a Data Package

import datapackage

dp = datapackage.DataPackage('http://data.okfn.org/data/core/gdp/datapackage.json')
try:
    dp.validate()
except datapackage.exceptions.ValidationError as e:
    # Handle the ValidationError
    pass

Retrieving all validation errors from a Data Package

import datapackage

# This descriptor has two errors:
#   * It has no "name", which is required;
#   * Its resource has no "data", "path" or "url".
descriptor = {
    'resources': [
        {},
    ]
}

dp = datapackage.DataPackage(descriptor)

for error in dp.iter_errors():
    # Handle error

Creating a Data Package

import datapackage

dp = datapackage.DataPackage()
dp.descriptor['name'] = 'my_sleep_duration'
dp.descriptor['resources'] = [
    {'name': 'data'}
]

resource = dp.resources[0]
resource.descriptor['data'] = [
    7, 8, 5, 6, 9, 7, 8
]

with open('datapackage.json', 'w') as f:
  f.write(dp.to_json())
# {"name": "my_sleep_duration", "resources": [{"data": [7, 8, 5, 6, 9, 7, 8], "name": "data"}]}

Using a schema that's not in the local cache

import datapackage
import datapackage.registry

# This constant points to the official registry URL
# You can use any URL or path that points to a registry CSV
registry_url = datapackage.registry.Registry.DEFAULT_REGISTRY_URL
registry = datapackage.registry.Registry(registry_url)

descriptor = {}  # The datapackage.json file
schema = registry.get('tabular')  # Change to your schema ID

dp = datapackage.DataPackage(descriptor, schema)

Push/pull Data Package to storage

Package provides push_datapackage and pull_datapackage utilities to push and pull to/from storage.

This functionality requires jsontableschema storage plugin installed. See plugins section of jsontableschema docs for more information. Let's imagine we have installed jsontableschema-mystorage (not a real name) plugin.

Then we could push and pull datapackage to/from the storage:

All parameters should be used as keyword arguments.

from datapackage import push_datapackage, pull_datapackage

# Push
push_datapackage(
    descriptor='descriptor_path',
    backend='mystorage', **<mystorage_options>)

# Import
pull_datapackage(
    descriptor='descriptor_path', name='datapackage_name',
    backend='mystorage', **<mystorage_options>)

Options could be a SQLAlchemy engine or a BigQuery project and dataset name etc. Detailed description you could find in a concrete plugin documentation.

See concrete examples in plugins section of jsontableschema docs.

Developer notes

These notes are intended to help people that want to contribute to this package itself. If you just want to use it, you can safely ignore them.

Updating the local schemas cache

We cache the schemas from https://github.com/dataprotocols/schemas using git-subtree. To update it, use:

git subtree pull --prefix datapackage/schemas https://github.com/dataprotocols/schemas.git master --squash

datapackage-py's People

Contributors

akariv avatar alexchandel avatar brew avatar bryonjacob avatar femtotrader avatar gthb avatar hozn avatar jhamrick avatar luizarmesto avatar mihi-tr avatar ndkv avatar psychemedia avatar pwalsh avatar rlafuente avatar roll avatar rufuspollock avatar sirex avatar vied12 avatar vitorbaptista avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.