Giter Site home page Giter Site logo

tidyjson's People

Contributors

abresler avatar adgaudio avatar colearendt avatar hadley avatar ramiromagno avatar vats-div avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tidyjson's Issues

Fix "no visible binding for global variable" notes in cmd_check()

Currently getting this

checking R code for possible problems ... NOTE
json_structure: no visible binding for global variable ‘level’
json_structure_arrays: no visible binding for global variable ‘type’
json_structure_arrays: no visible binding for global variable
  ‘document.id’
json_structure_arrays: no visible binding for global variable
  ‘child.id’
json_structure_arrays: no visible binding for global variable ‘level’
json_structure_arrays: no visible binding for global variable
  ‘parent.id’
... 9 lines ...
  ‘parent.id’
json_structure_objects: no visible binding for global variable ‘index’
json_structure_objects: no visible binding for global variable ‘key’
read_json: no visible global function definition for ‘tail’
should_json_structure_expand_more: no visible binding for global
  variable ‘level’
Undefined global functions or variables:
  child.id document.id index key level parent.id tail type
Consider adding
  importFrom("utils", "tail")
to your NAMESPACE file.

I believe this can be solved by avoiding non-standard evaluation, and using the _ version of dplyr functions instead.

Should spread_all discard scalar values from associated JSON?

Currently it leaves the JSON as is

'{"a": 1, "b": [1, 2, 3]}' %>% spread_all
#> # A tbl_json: 1 x 2 tibble with a "JSON" attribute
#>    `attr(., "JSON")` document.id     a
#>                <chr>       <int> <dbl>
#> 1 {"a":1,"b":[1,2...           1     1

Perhaps instead it should strip these away:

'{"a": 1, "b": [1, 2, 3]}' %>% spread_all
#> # A tbl_json: 1 x 2 tibble with a "JSON" attribute
#>    `attr(., "JSON")` document.id     a
#>                <chr>       <int> <dbl>
#> 1 {"b":[1,2,3]}                1     1

This makes sense since they are already captured in the tbl_json object, and it will make it easier to see that the next step should be enter_object and then gather_array.

Get dplyr::left_join (and others) to work

This fails, and there is no left_join_ method:

new <- '[1, 2, 3]' %>% gather_array("num") %>%
  left_join(data_frame(num = 1:3, letters = letters[1:3]), by = "num")

expect_is(new, "tbl_json")

Peel JSON one layer at a time?

Is it possible to use fromJSON to turn the JSON into lists one layer at a time, so that the JSON remains a string as you slowly unwind it?

This may be much slower, but would lead to a more natural implementation where the JSON remains a column of the data frame formatted as a character string and more easily printed.

dplyr::slice isn't filtering JSON appropriately

Filter works:

companies[1:5] %>% as.tbl_json %>% filter(document.id == 1) %>% attr("JSON") %>% length
#> [1] 1

but slice does not:

companies[1:5] %>% as.tbl_json %>% slice(1) %>% attr("JSON") %>% length
#> [1] 5

first argument to verbs should not be x

Causes this not to work:

'{"x": 1}' %>% spread_values(x = jstring("x"))
#> Error in UseMethod("as.tbl_json") :
#>  no applicable method for 'as.tbl_json' applied to an object of class "function"

Yet this works:

'{"x": 1}' %>% spread_values(y = jstring("x"))
#>   document.id y
#> 1           1 1

Create plot_json_graph

Should use json_structure and create an igraph object. Initial version of code is in visualization vignette.

Create json_schema

Should do the following:

  • Work like json_structure, but aggregate across many documents
  • Arrays should be collapsed into a union of their structures
  • Should keep a count of how often each structure appears
  • Should be able to visualize the result as a graph per the visualizing JSON vignette

Allow spread_values functions to work with unquoted paths

The following works:

'{"key": "value"}' %>% spread_values(key = jstring("key"))
#>   document.id   key
#> 1           1 value

but this does not:

'{"key": "value"}' %>% spread_values(key = jstring(key))
#> Error in as_function(.f, ...) : object 'key' not found

create spread_all to automatically spread all keys

Should work like:

'{"a": 1, "b": "x", "c": true}' %>% spread_all_values
  • Should not affect the state of the JSON object
  • Should work with nested objects
  • Should take a sep argument used to separate key names when objects are nested
  • Should just ignore arrays automatically
  • NULLs should be cast to NA

spread_all(recursive = FALSE) failing

issues %>% gather_array %>% spread_all(recursive = FALSE)
#> Error in `[.data.frame`(z, , final_columns, drop = FALSE) : 
#>   undefined columns selected

Increase the number of lines of JSON converted to strings in print.tbl_json

This will be very confusing to users:

> companies[1:5] %>% gather_keys %>% filter(is_json_object(.)) %>% gather_keys("key2")
#> # A tbl_json: 15 x 3 tibble with a "JSON" attribute
#>     `attr(., "JSON")` document.id   key            key2
#>                 <chr>       <int> <chr>           <chr>
#> 1  "52cdef7e4bab8b...           1   _id            $oid
#> 2  [[[150,22],"ass...           1 image available_sizes
#> 3                null           1 image     attribution
#> 4  "52cdef7f4bab8b...           2   _id            $oid
#> 5  [[[150,38],"ass...           2 image available_sizes
#> 6                null           2 image     attribution
#> 7  "52cdef7d4bab8b...           3   _id            $oid
#> 8  [[[150,36],"ass...           3 image available_sizes
#> 9                null           3 image     attribution
#> 10 "52cdef7d4bab8b...           4   _id            $oid
#> 11                ...           4 image available_sizes
#> 12                ...           4 image     attribution
#> 13                ...           5   _id            $oid
#> 14                ...           5 image available_sizes
#> 15                ...           5 image     attribution

How to treat nested arrays?

Nested arrays are difficult to work with. For example,

x <- '[[1, 2], 1]' %>% gather_array %>% json_types
x
#>   document.id array.index   type
#> 1           1           1  array
#> 2           1           2 number

At this point, there is no way to gather the next array unless we filter on type == 'array'.

x %>% gather_array("level2")
#> Error in gather_array(., "level2") : 1 records are not arrays
x %>% filter(type == "array") %>% gather_array("level2")
#>   document.id array.index  type level2
#> 1           1           1 array      1
#> 2           1           1 array      2

append_values_number works, but returns NA for the array, and recursive = TRUE doesn't work through the second level array. Further, it could be that the types are mixed.

Print tbl_json objects with truncated JSON string

tbl_json objects should print like tbl_df objects, except they should have an additional column at the end, titled something like attr("JSON"), that shows the first N characters of the concise JSON representation of the JSON attribute.

Something like:

document.id key attr("JSON")
----------- --- ------------
1           "a" [1, 2, 3]
2           "b" true
3           "c" {"k1": "value", "k2": [1, 2], "k3...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.