Giter Site home page Giter Site logo

datapackage-r's Introduction

Data Package in R

Data-packages is a standard format for describing meta-data for a collection of datasets. The package datapkg provides convenience functions for retrieving and parsing data packages in R. To install in R:

library(devtools)
install_github("hadley/readr")
install_github("ropenscilabs/jsonvalidate")
install_github("ropenscilabs/datapkg")

Reading data

The datapkg_read function retrieves and parses data packages from a local or remote sources. A few example packages are available from the datasets and testsuite-py repositories. The path needs to point to a directory on disk or git remote or URL containing the root of the data package.

# Load client
library(datapkg)

# Clone via git
cities <- datapkg_read("git://github.com/datasets/world-cities")

# Same data but download over http
cities <- datapkg_read("https://raw.githubusercontent.com/datasets/world-cities/master")

The output object contains data and metadata from the data-package, with actual datasets inside the $data field.

# Package info
print(cities)

# Open actual data in RStudio Viewer
View(cities$data[[1]])

In the case of multiple datasets, each one is either referenced by index or, if available, by name (names are optional in data packages).

# Package with many datasets
euribor <- datapkg_read("https://raw.githubusercontent.com/datasets/euribor/master")

# List datasets in this package
names(euribor$data)
View(euribor$data[[1]])

Writing data

The package also has basic functionality to save a data frame into a data package and update the datapackage.json file accordingly.

# Create new data package
pkgdir <- tempfile()
datapkg_write(mtcars, path = pkgdir)
datapkg_write(iris, path = pkgdir)

# Read it back
mypkg <- datapkg_read(pkgdir)
print(mypkg$data$mtcars)

From here you can modify the datapackage.json file with other metadata.

Status

This package is work in progress. Current open issues:

  • Make readr parse 0/1 values for booleans: PR#406
  • Support "year only" dates (%Y). Not sure if this constituates a valid date actually: PR#407
  • R and readr require to specify which strings are interepreted as missing values. Default are empty string "" and NA. A similar property needs to be defined in the spec.
  • It is unclear what to do with parsing errors, or if the fields in datapackage.json does not match the csv data. Examples: s-and-p-500 and currency-codes

Features:

  • Writing data packages from data frames.

rOpenSci OKFN

datapackage-r's People

Contributors

jeroen avatar tle avatar

Watchers

 avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.