Giter Site home page Giter Site logo

Comments (5)

colearendt avatar colearendt commented on August 17, 2024

Thanks for reporting, and apologies for missing the SO post 😄

I think this is related to #95 and possibly others. enter_object() is the problem here, as it discards records that do not have that object (And in this case, empty is treated as "does not have").

There is a short-term solution in splitting up the pipeline and doing the left_join() yourself. Not ideal I don't think, but functional for now until we have a bit more time to think on this one 😄

library(dplyr)
library(tidyjson)


people <- c('{"age": 32, "name": [{"first": "Bob",   "last": "Smith", "middles":[{"middle1":"John", "middle2":"Rick"}]}]}',
            '{"age": 54, "name": [{"first": "Susan", "last": "Doe", "middles":[]}]}',
            '{"age": 18, "name": [{"first": "Ann",   "last": "Jones", "middles":[]}]}')

# does not work as we want
people %>% 
  spread_all() %>%  
  enter_object("name") %>% gather_array() %>% 
  spread_all() %>% select(-document.id,-array.index) %>% 
  enter_object("middles") %>% gather_array %>% 
  spread_all()  %>% select(-array.index) %>% 
  tbl_df()
#> # A tibble: 1 x 5
#>     age first last  middle1 middle2
#>   <dbl> <chr> <chr> <chr>   <chr>  
#> 1    32 Bob   Smith John    Rick

# prepare the JSON
prep <- people %>% 
  spread_all() %>% 
  enter_object("name") %>% 
  gather_array() %>% 
  spread_all() %>% 
  select(-array.index)

# extract the sub-object
middles <- prep %>% 
  enter_object("middles") %>%
  gather_array() %>% 
  spread_all() %>% 
  select("document.id", starts_with("middle")) %>%
  as_tibble()

# join together
prep %>% 
  as_tibble() %>%
  left_join(
    middles, by = c("document.id")
    )
#> # A tibble: 3 x 6
#>   document.id   age first last  middle1 middle2
#>         <int> <dbl> <chr> <chr> <chr>   <chr>  
#> 1           1    32 Bob   Smith John    Rick   
#> 2           2    54 Susan Doe   <NA>    <NA>   
#> 3           3    18 Ann   Jones <NA>    <NA>

from tidyjson.

colearendt avatar colearendt commented on August 17, 2024

And another approach that gets close (going tall first and avoiding enter_object()... but we'd really rather have a "list" column that you can then explore, rather than a character).

library(dplyr)
library(tidyr)
library(tidyjson)

people <- c('{"age": 32, "name": [{"first": "Bob",   "last": "Smith", "middles":[{"middle1":"John", "middle2":"Rick"}]}]}',
            '{"age": 54, "name": [{"first": "Susan", "last": "Doe", "middles":[]}]}',
            '{"age": 18, "name": [{"first": "Ann",   "last": "Jones", "middles":[]}]}')
people %>%
  spread_all() %>%
  enter_object("name") %>%
  gather_array() %>%
  gather_object() %>%
  append_values_string() %>%
  as_tibble() %>%
  spread(name, string)
#> # A tibble: 3 x 6
#>   document.id   age array.index first last  middles                             
#>         <int> <dbl>       <int> <chr> <chr> <chr>                               
#> 1           1    32           1 Bob   Smith "list(middle1 = \"John\", middle2 =…
#> 2           2    54           1 Susan Doe   NA                                  
#> 3           3    18           1 Ann   Jones NA

Created on 2019-12-15 by the reprex package (v0.2.1)

from tidyjson.

colearendt avatar colearendt commented on August 17, 2024

I had a few minutes, so I tried adding a simple append_values_list() that makes the latter a bit more plausible... enter_object() is a bit more complex to dig into and will need more thought. Column types are still a little weird here with list-cols.

devtools::install_github("colearendt/tidyjson@add_append_values")
library(dplyr)
library(tidyr)
library(purrr)
library(tidyjson)

people <- c('{"age": 32, "name": [{"first": "Bob",   "last": "Smith", "middles":[{"middle1":"John", "middle2":"Rick"}]}]}',
            '{"age": 54, "name": [{"first": "Susan", "last": "Doe", "middles":[]}]}',
            '{"age": 18, "name": [{"first": "Ann",   "last": "Jones", "middles":[]}]}')

structured <- people %>%
  spread_all() %>%
  enter_object("name") %>%
  gather_array() %>%
  gather_object() %>%
  append_values_list() %>% 
  as_tibble()

structured
#> # A tibble: 9 x 5
#>   document.id   age array.index name    list            
#>         <int> <dbl>       <int> <chr>   <list>          
#> 1           1    32           1 first   <chr [1]>       
#> 2           1    32           1 last    <chr [1]>       
#> 3           1    32           1 middles <named list [2]>
#> 4           2    54           1 first   <chr [1]>       
#> 5           2    54           1 last    <chr [1]>       
#> 6           2    54           1 middles <lgl [1]>       
#> 7           3    18           1 first   <chr [1]>       
#> 8           3    18           1 last    <chr [1]>       
#> 9           3    18           1 middles <lgl [1]>

spread_it_out <- structured %>% spread(name, list)
spread_it_out
#> # A tibble: 3 x 6
#>   document.id   age array.index first     last      middles         
#>         <int> <dbl>       <int> <list>    <list>    <list>          
#> 1           1    32           1 <chr [1]> <chr [1]> <named list [2]>
#> 2           2    54           1 <chr [1]> <chr [1]> <lgl [1]>       
#> 3           3    18           1 <chr [1]> <chr [1]> <lgl [1]>

spread_it_out %>% mutate(first = flatten_chr(first), last = flatten_chr(last))
#> # A tibble: 3 x 6
#>   document.id   age array.index first last  middles         
#>         <int> <dbl>       <int> <chr> <chr> <list>          
#> 1           1    32           1 Bob   Smith <named list [2]>
#> 2           2    54           1 Susan Doe   <lgl [1]>       
#> 3           3    18           1 Ann   Jones <lgl [1]>

Created on 2019-12-15 by the reprex package (v0.2.1)

from tidyjson.

msinjin avatar msinjin commented on August 17, 2024

Thanks for reporting, and apologies for missing the SO post 😄

All good. I really wasn't sure if it was a bug or a feature!

I think this is related to #95 and possibly others. enter_object() is the problem here, as it discards records that do not have that object (And in this case, empty is treated as "does not have").

Interesting. I had considered issue #95 while researching a solution, so that makes sense.

There is a short-term solution in splitting up the pipeline and doing the left_join() yourself. Not ideal I don't think, but functional for now until we have a bit more time to think on this one 😄

I had come up with a similar left_join() work around, which I included in the SO post. It gets the job done but doubles the execution time which isn't great because I'm dealing with GBs of JSON.

I'll try out your other ideas and see how it goes.

Thanks for the quick answer!

from tidyjson.

colearendt avatar colearendt commented on August 17, 2024

Another possible pattern I just added to the SO post.

library(tidyjson)
library(dplyr)
library(tidyr)

people <- c('{"age": 32, "name": [{"first": "Bob",   "last": "Smith", "middles":[{"middle1":"John", "middle2":"Rick"}]}]}',
            '{"age": 54, "name": [{"first": "Susan", "last": "Doe", "middles":[]}]}',
            '{"age": 18, "name": [{"first": "Ann",   "last": "Jones", "middles":[]}]}')

as_tbl_json(people) %>% spread_all() %>% enter_object("name") %>% gather_array("nameid") %>%
  gather_object("key") %>% 
  {bind_rows(
    filter(., key != "middles") %>% append_values_string("value"),
    filter(., key == "middles") %>% gather_array("middleid") %>% 
      select(-key) %>%
      gather_object("key") %>%
      append_values_string("value")
  )} %>%
  # drop tbl_json
  as_tibble() %>%
  select(-document.id, -nameid, -middleid) %>%
  # could also use tidyr::pivot_wider
  tidyr::spread(key, value)
#> # A tibble: 3 x 5
#>     age first last  middle1 middle2
#>   <dbl> <chr> <chr> <chr>   <chr>  
#> 1    18 Ann   Jones <NA>    <NA>   
#> 2    32 Bob   Smith John    Rick   
#> 3    54 Susan Doe   <NA>    <NA>

Created on 2020-06-28 by the reprex package (v0.3.0)

from tidyjson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.