ropensci-archive / datagovau Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 0.0 21.12 MB

:no_entry: ARCHIVED :no_entry:

R 100.00%

open-data r australia ozunconf17 unconf

datagovau's Introduction

datagovau

This repository has been archived. The former README is now in README-NOT.md.

datagovau's People

Contributors

Stargazers

Watchers

datagovau's Issues

Add kml support

example kml file:

# search for datasets with "trees" in their name:
trees_md <- search_data("name:trees", limit = 1000)

# fails because it wants a kml:
brimbank <- trees_md %>%
  filter(name == "Brimbank Street Trees - Google kml") %>%
  get_data()

characterise_data seems complicated...

... and draws on global variables and other things. In combination with multi.sapply it just looks a bit complicated and maybe could be refactored to be easier to understand for future maintainers.

Currently this takes place in lines 23 to 51 of ./pkg/R/understanding_metadata.R

Cache metadata?

search_data is too slow for interactive use:

> system.time(search_data("name:fire"))
   user  system elapsed 
   0.06    0.00   12.43

The performance could be improved by caching the metadata:

in each package release
after the first invocation in a session
if the user requests it

The pros and cons of the first are obvious: it would be more reliable and faster, but won't return metadata that forms post-release. Frequent releases would limit the downside, and I think it's my preferred method.

Dealing with APIs

Some of the data on data.gov.au is available by API. At the moment we don't support this, which seems unfair!

use message() instead of print() for giving messages

eg in show_data()

Add GeoJSON support

for example:

trees_md <- search_data("name:trees", limit = 1000)

geelong <- trees_md %>%
  filter(name == "Geelong Trees GeoJSON") %>%
  get_data()
``

downloads data but gets columns wrong

for example - mixes in id and date; cause and location - doesn't understand separate columns:

res <- search_data("name:fire", limit = 20)
res %>% filter(can_use == "yes") %>% slice(3) %>% get_data %>% View

'unexpected unzipping of files'

library(datagovau)
library(dplyr)
res <- search_data("name:water", limit = 20)
res %>% filter(can_use == "yes") %>% slice(2) %>% show_data

gives:

[1] "https://datagovau.s3.amazonaws.com/bioregionalassessments/NIC/MBC/DATA/RiskAndUncertainty/FiguresMBC_drawdown_time_series_figure/352a2f65-ddbf-4251-a401-c7070d2c9208.zip"
Working with .zip (shp) file... Evaluate returned object.
trying URL 'https://datagovau.s3.amazonaws.com/bioregionalassessments/NIC/MBC/DATA/RiskAndUncertainty/FiguresMBC_drawdown_time_series_figure/352a2f65-ddbf-4251-a401-c7070d2c9208.zip'
Content type 'binary/octet-stream' length 766946 bytes (748 KB)
downloaded 748 KB

 Show Traceback
 
 Rerun with Debug
 Error in show_data(.) : Unexpected unzipping of files.

document the columns returned by search_data

That is, what do each of these 30 columns mean:

> res <- search_data("name:water", limit = 20)
> names(res)
 [1] "cache_last_updated"        "package_id"                "webstore_last_updated"    
 [4] "id"                        "size"                      "state"                    
 [7] "last_modified"             "hash"                      "description"              
[10] "format"                    "mimetype_inner"            "url_type"                 
[13] "mimetype"                  "cache_url"                 "name"                     
[16] "created"                   "url"                       "webstore_url"             
[19] "position"                  "revision_id"               "resource_type"            
[22] "verified_date"             "verified"                  "resource_locator_protocol"
[25] "resource_locator_function" "Description"               "autoupdate"               
[28] "datastore_active"          "wms_layer"                 "can_use"

Only can_use is made in R, the others all come from data.gov.au. We should document them (certainly the most important) as a definition list in the helpfile for search-data.

Add json support

example attempt to get a json, returns "sorry, can't work with this file yet"

library(datagovau)
library(dplyr)

# search for datasets with "trees" in their name:
trees_md <- search_data("name:trees")

# what datasets do we have?:
trees_md[ , "name"]

# get one:
wyndham <- trees_md %>% 
  filter(name == 'Wyndham Trees and latest inspection') %>%
  slice(1) %>%
  get_data()