The ddhconnect from tonyfujs

Convert all machine names to UI names

Might be helpful for non curation team users but only expose users to UI names rather than machine names?

download_resource() does not use the root_url parameter

download_resource uses the URL from dkanr_setup() and not the URL passed as the parameter. This is because it makes a call to get_resource_url() and fix_download_url() which do not take the URL as a parameter in the CRAN version.

write unit tests for search_catalog()

re-factor search_catalog() using dkanr core functions

Add `dkanr` as package dependency

get_resource_nid() not working

get_resource_nids() exists in dkanr. Delete existing get_resource_nid() from ddhconnect, and export the dkanr function to ddhconnect

Mapping using partial search sometimes maps to the wrong tid in map_tids()

Use of grep() to ignore formatting differences also introduces incorrect mappings due to partial matches. Ex: 'Albania' maps to the tid of 'Albanian' in map_tids().

Make default workflow_status published

Add parameter in create_json_body() to make the default status as "published".

Order function parameters

Order function parameters so that credentials and root url are always at a consistent position.

add datastore API functionalities

Add test json data

Add ready to use json data to use in the request's body

Re-factor search_ddh() using dkanr core functions

Improve create_json_body

Create helper function that maps metadata from / to machine_names space to pretty_names space
Create a helper function that takes 2 columns form an Excel sheet, and creates the list programmatically (Clara)

Parse error from get_fields()

from Alp

id = 118383
updated_data = c(
"title" = "Togo - Firm Surveys for Comparing Personal Initiative Training to Traditional Business Training 2013-2016 TEST",
"workflow_status" = "published"
)
updated_data_json = create_json_body(values = updated_data, node_type = "dataset")
update_dataset(nid = id, body = updated_data_json)

error

Error: parse error: trailing garbage
                                      3{     "microdata": {         "d
                     (right here) ------^

happens in:

ddhconnect/R/get_fields.R

Line 23 in fc3ee71

out <- httr::content(out)

Re-factor get_metadata() using core dkanr functions

Write unit tests for get_metadata() functions

(Check the ckanr package for one approach of writing tests for APIs)

Add some logic to differentiate dataset / resource functions

create_dataset() vs create_resource()
update_dataset() vs update_resource()
Code is currently the same.

Add internal data mapping machine names to form names

Need a place for users adding data to reference machine names to DDH form names. Also need explanation for machine names that you can use in search_catalog (such as nid).

Add add_file_to_resource() function

Link to DKAN documentation

Add authentication to function calling internal/xxxx endpoints

With the move to the cloud, these functions will break without authentication

Add function to download resources

New function for downloadable resources which will save files. Otherwise, users will have to located the
url file string field_link_api$und$url and declare their own file name. Also, might be good to print the citation (since we want to encourage use).

example
current

library(Rcurl)
resource_metadata = get_metadata(nid = 94974)
download.file(url = resource_metadata$field_link_api$und[[1]]$url, destfile = "WDI.zip")

suggested

# extract the existing file name? 
resource_download(resource_nid)

Add file path as a parameter to download_resource()

Write unit tests for search_ddh() function

review get_fields()

Currently: Doesn't take credentials argument. Should be updated.

Add function to filter resource search results

Is there a machine name to indicate whether the resource is actually the data. Might help to have a function that filters the resource results by download/query tool to make the actual data more obvious. Identifying the data file/query tool/link is not clear when there are several resources (ex: WDI is ~70).

example
current

indicators_resources = get_resource_nid(nid = wdi_nid)
# need to locate the tid for dataset resource_type
tid_download = 986

for (resource_id in indicators_resources){
  resource_metadata = get_metadata(nid = resource_id)
  if (resource_metadata$field_wbddh_resource_type$und$tid == tid_download){
    print(resource_metadata$title)
  }
}

library(rcurl)
resource_metadata = get_metadata(nid = 94974)
download.file(url = resource_metadata$field_link_api$und$url, destfile = "WDI")

suggested

indicators_resources = get_resource_nid(nid = wdi_nid)
get_data(resource_ids, access_type = c("download, "query tool"))

Add roxygen comments for get_datasets_count

Does this function need to stand independently? If so, it needs clarification for users about what can pass in datatype parameter (i.e. single strings or combination of filters, etc.).

Error: Result 25524 is not a length 1 atomic vector

ddhconnect/R/get_datasets_list.R

Line 36 in fc3ee71

field_wbddh_data_type <- purrr::map_chr(out,

Update resource format lookup table

The resource_json_format_lookup is missing some machine_names and the corresponding json_template values. Need to update the lookup to match the current form fields and values.

missing

format
geospatial api formats
harvest source
harvest source id
workflow status?
remove_file?

Add a build_json() function

Function that takes a list of named arguments, check whether the values are valid, and build a valid JSON string.

review get_lovs()

Doesn't take credentials argument. To implement.

get_datasets_count

Badly conceived. should rely on lower level search_dhh

Remove default node type in create_json_body()

The default value for the node_type parameter is 'dataset'. If the node type is 'resource' and if you updating the node without passing a parameter for node_type, then the node type is changed to dataset.

fix search_catalog to account for 0 results

include if-statement which accounts for 0 returned results along with corresponding error message

Add function or internal data for field names that take lovs

Need to add information about which fields take lovs and which do not. Cumbersome for user to locate all lov machine names in "get_lovs()"

Update DESCRIPTION with required packages

add readxl to imports in DESCRIPTION because it's used in map_metadata_excel()

ddhconnect/R/map_metadata_excel.R

Line 15 in 71ec88f

metadata_df <- read_xlsx(path)

Add wrapper function to attach file

dkanr::attach_file_to_node()

Update test json

current test files are not updated with latest json format
might need to consider adding a check?

Expose string values to the user instead of tid values for get_metadata() and search_catalog()

The creation of datasets and resources can still use tid values.

Update UI Names look ups

Hey @alpaziz and @seladore , I remember on team meeting you guys mentioned about this UI look up table. I found some fields out of date. I'm not sure whether you get a chance to look at it. I'm adding some this week.

map_tids returns wrong tids

The map_tids() function breaks for some values. For instance:

metadata = c("field_topic" = "Poverty", "field_wbddh_data_class" = "Public" ) map_tids(metadata) field_topic field_wbddh_data_class "376" "378"

field_wbddh_data_class should have value "358"

It may be safer to build this function using inner_join()

Create a dataframe from the input vector
Do an inner_join with the output of get_lovs()

add get_resource_list()

Improve output format for `search_catalog()`

Currently, the output is formatted as list of lists, including the "und" formatting. We might want to consider the output being a cleaner, more standardized format especially since multiple results can be returned at once.

example
current

[[1]]$field_contact_email
[[1]]$field_contact_email$und
[[1]]$field_contact_email$und[[1]]
[[1]]$field_contact_email$und[[1]]$value
[1] "[email protected]"

[[1]]$field_contact_email$und[[1]]$format
NULL

recommended
named list format or dataframe?

$star
[1] "wars"

Display more informative errors/warnings

create_dataset() and create_resource() both do not display informative error/warnings when a field is populated incorrectly.

example
current

Error: Client error: (406) Not Acceptable - error: An illegal choice has been detected. Please contact the site administrator.

suggested

{"form_errors":{"field_topic][und":"An illegal choice has been detected. Please contact the site administrator.","field_wbddh_data_type][und":"An illegal choice has been detected. Please contact the site administrator."]}

Add caching for get_lovs()

Cache it as a data file after the first call, with an option to update. Can also do this for get_fields(), get_required_fields() and get_lov_fields()?

Remove credentials parameter where it is not used

Examples of unused credentials parameter so far: get_metadata(), get_resource_nid(), search-catalog(), search_ddh(), get_datasets_list().

New function mapping strings to tids

Using search_catalog() with lov field names has a confusing work flow. Currently, the user must map their search string to the corresponding tid with get_lovs(). Might be simpler for user if we automated looking up the string and returning the tid.

example:
current input
filters = c("field_wbddh_data_type" = "295")
suggested input
filters = c("field_wbddh_data_type" = "geospatial")

tonyfujs / ddhconnect Goto Github PK

ddhconnect's People

Contributors

Stargazers

Watchers

ddhconnect's Issues

Recommend Projects

Recommend Topics

Recommend Org