colearendt / tidyjson Goto Github PK
View Code? Open in Web Editor NEWTidy your JSON data in R with tidyjson
License: Other
Tidy your JSON data in R with tidyjson
License: Other
Currently getting this
checking R code for possible problems ... NOTE
json_structure: no visible binding for global variable ‘level’
json_structure_arrays: no visible binding for global variable ‘type’
json_structure_arrays: no visible binding for global variable
‘document.id’
json_structure_arrays: no visible binding for global variable
‘child.id’
json_structure_arrays: no visible binding for global variable ‘level’
json_structure_arrays: no visible binding for global variable
‘parent.id’
... 9 lines ...
‘parent.id’
json_structure_objects: no visible binding for global variable ‘index’
json_structure_objects: no visible binding for global variable ‘key’
read_json: no visible global function definition for ‘tail’
should_json_structure_expand_more: no visible binding for global
variable ‘level’
Undefined global functions or variables:
child.id document.id index key level parent.id tail type
Consider adding
importFrom("utils", "tail")
to your NAMESPACE file.
I believe this can be solved by avoiding non-standard evaluation, and using the _
version of dplyr functions instead.
Currently it leaves the JSON as is
'{"a": 1, "b": [1, 2, 3]}' %>% spread_all
#> # A tbl_json: 1 x 2 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id a
#> <chr> <int> <dbl>
#> 1 {"a":1,"b":[1,2... 1 1
Perhaps instead it should strip these away:
'{"a": 1, "b": [1, 2, 3]}' %>% spread_all
#> # A tbl_json: 1 x 2 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id a
#> <chr> <int> <dbl>
#> 1 {"b":[1,2,3]} 1 1
This makes sense since they are already captured in the tbl_json
object, and it will make it easier to see that the next step should be enter_object
and then gather_array
.
Can wrap around toJSON, but with appropriate arguments.
This fails, and there is no left_join_ method:
new <- '[1, 2, 3]' %>% gather_array("num") %>%
left_join(data_frame(num = 1:3, letters = letters[1:3]), by = "num")
expect_is(new, "tbl_json")
"[[1, 2], [1, 2]]" %>% gather_array %>% gather_array
#> Error: found duplicated column name: array.index
Last time I submitted to CRAN they complained about these, import functions from magrittr instead.
Use the companies
dataset and visualize the structure of the JSON documents.
Counts the total number of nodes in the JSON.
Perhaps also unify their code.
Every row should correspond to a "node" in the JSON document with a unique ID, and should identify the most recent key used to access the node, it's type, length and parent.
Is it possible to use fromJSON to turn the JSON into lists one layer at a time, so that the JSON remains a string as you slowly unwind it?
This may be much slower, but would lead to a more natural implementation where the JSON remains a column of the data frame formatted as a character string and more easily printed.
Filter works:
companies[1:5] %>% as.tbl_json %>% filter(document.id == 1) %>% attr("JSON") %>% length
#> [1] 1
but slice does not:
companies[1:5] %>% as.tbl_json %>% slice(1) %>% attr("JSON") %>% length
#> [1] 5
Causes this not to work:
'{"x": 1}' %>% spread_values(x = jstring("x"))
#> Error in UseMethod("as.tbl_json") :
#> no applicable method for 'as.tbl_json' applied to an object of class "function"
Yet this works:
'{"x": 1}' %>% spread_values(y = jstring("x"))
#> document.id y
#> 1 1 1
Root level should be 0
Deprecate gather_keys
with .Deprecated
function, see backwards compatibility section in http://r-pkgs.had.co.nz/release.html#undefined
Should not throw a warning:
json %>% gather_array %>% gather_array
Should throw a warning:
json %>% gather_array("special") %>% gather_array("special")
Should use json_structure and create an igraph object. Initial version of code is in visualization vignette.
Should do the following:
Should be a subset of the logic in json_types
, and work in places like gather_factory
'{}' %>% json_structure
#> Error: wrong result size (2), expected 0 or 1
Should cover package internals as well as purrr vignette.
The following works:
'{"key": "value"}' %>% spread_values(key = jstring("key"))
#> document.id key
#> 1 1 value
but this does not:
'{"key": "value"}' %>% spread_values(key = jstring(key))
#> Error in as_function(.f, ...) : object 'key' not found
e.g.,
json %>%
at_depth(2, `%||%`, NA) %>%
map_df(. %$% tibble(name, email_address, number_of_employees, founded_year))
This should throw an error:
'{"key": "1"}' %>% spread_values(int = jnumber("key"))
#> document.id int
#> 1 1 1
Should work like:
'{"a": 1, "b": "x", "c": true}' %>% spread_all_values
sep
argument used to separate key names when objects are nestedMaybe it makes more sense for the tbl_json
object to be a tidyjson
object, like tbl_df
has moved to tibble
?
issues %>% gather_array %>% spread_all(recursive = FALSE)
#> Error in `[.data.frame`(z, , final_columns, drop = FALSE) :
#> undefined columns selected
Ideally, would:
spread_values(key1)
instead of spread_values("key1")
spread_values
would determine type automatically (converting NULLs to NAs of appropriate type)spread_values(key1, key2)
could be two top level keys or key2
nested under key1
Should:
spread_values
versus spread_all
is_json_X
functions to check inputsE.g., from purrr https://github.com/hadley/purrr/blob/master/R/along.R
#' These functions take the idea of \code{\link{seq_along}} and generalise
#' it to creating lists (\code{list_along}) and repeating values
#' (\code{rep_along}).
Set up a new travis CI integration
This will be very confusing to users:
> companies[1:5] %>% gather_keys %>% filter(is_json_object(.)) %>% gather_keys("key2")
#> # A tbl_json: 15 x 3 tibble with a "JSON" attribute
#> `attr(., "JSON")` document.id key key2
#> <chr> <int> <chr> <chr>
#> 1 "52cdef7e4bab8b... 1 _id $oid
#> 2 [[[150,22],"ass... 1 image available_sizes
#> 3 null 1 image attribution
#> 4 "52cdef7f4bab8b... 2 _id $oid
#> 5 [[[150,38],"ass... 2 image available_sizes
#> 6 null 2 image attribution
#> 7 "52cdef7d4bab8b... 3 _id $oid
#> 8 [[[150,36],"ass... 3 image available_sizes
#> 9 null 3 image attribution
#> 10 "52cdef7d4bab8b... 4 _id $oid
#> 11 ... 4 image available_sizes
#> 12 ... 4 image attribution
#> 13 ... 5 _id $oid
#> 14 ... 5 image available_sizes
#> 15 ... 5 image attribution
Nested arrays are difficult to work with. For example,
x <- '[[1, 2], 1]' %>% gather_array %>% json_types
x
#> document.id array.index type
#> 1 1 1 array
#> 2 1 2 number
At this point, there is no way to gather the next array unless we filter on type == 'array'
.
x %>% gather_array("level2")
#> Error in gather_array(., "level2") : 1 records are not arrays
x %>% filter(type == "array") %>% gather_array("level2")
#> document.id array.index type level2
#> 1 1 1 array 1
#> 2 1 1 array 2
append_values_number
works, but returns NA
for the array, and recursive = TRUE
doesn't work through the second level array. Further, it could be that the types are mixed.
Can be something simple.
tbl_json
objects should print like tbl_df
objects, except they should have an additional column at the end, titled something like attr("JSON")
, that shows the first N characters of the concise JSON representation of the JSON attribute.
Something like:
document.id key attr("JSON")
----------- --- ------------
1 "a" [1, 2, 3]
2 "b" true
3 "c" {"k1": "value", "k2": [1, 2], "k3...
In the purrr vignette, having to do %>% head
to shorten display
Recreate every key tidyjson verb with purrr.
Use purrr
in tidyjson internals wherever possible.
If spread_all
generates a name that already exists in the data frame, then throw a meaningful error about the name conflict.
Same as is done in https://github.com/hadley/purrr/blob/master/R/utils.R#L1
See if you can remove all subsequent calls to library(magrittr)
You can then get rid of rbind_tbl_json
in utils.R
Using this in the purrr vignette, so should export it. Need to make it clearly different from json_types.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.