Giter Site home page Giter Site logo

colearendt / cellist Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 59 KB

Making beautiful music out of cell-lists, list-columns, nested lists, and the like

Home Page: https://colearendt.github.io/cellist/

R 100.00%
r-package r nested-structures list

cellist's Introduction

cellist

The goal of cellist is to turn nested list columns into tidy data_frames.

Example

This is a basic example which shows you how to solve a common problem:

## basic example code

Key Functions

  • col_spec and related items (col_list, col_object, col_double, etc.)
  • guess_spec
  • spread_list
  • gather_list
  • inverse operations to rebuild the list?
  • mappings from json_schema objects to col_spec?
  • helpers to move from xml2 and jsonlite objects to nested lists

API

Should be able to do something like col_list(), col_list_spread(), col_list_gather()... Should also be able to nest specs in col_list... something like the following...

col_spec(
  list(
    d = col_double()
    , int = col_integer()
    , obj_raw = col_list()
    , obj_spread = col_list_spread(
      a = col_double
      , name = col_character()
    )
    , obj_gather = col_list_gather(
      b = col_integer()
      , name = col_character()
    )
    , arr_raw = col_list()
    , arr_spread = col_list_spread(
      1 = col_integer()
      , 2 = col_integer()
    )
    , arr_gather = col_list_gather(
      col_integer()
    )
  )
)

This API seems a little unweildy, but it seems that you would be able to pull out the collector functionality into a separate package and make it extensible (so the code is not defined for readr and tidylist)! These collectors make use of the name to do look-up by reference. This is not unlike readr, who also has a col_names parameter. The difference is that in this case, I think asked-for fields should be returned, even if not present.

The real power comes in something like guess_spec that will generate a spec for you... you could also conceive of generating a spec from a JSON Schema / XML schema object!

This also needs to be do-able by integer reference, i.e. list(1,"a","b") would grab the 1st object of a list, the first "a" key, and then the first "b" key.

Open Questions

  • How should we handle the existing list? Should we hold to "copy on reference" or should we modify the list using "do not repeat yourself"... pull the items out? Probaby pull the items out... some items are not reversible
  • maybe worth creating callbacks for naming the columns...? This at least for unnamed columns... named columns will be preserved? What about handling nested behavior, though... maybe a callback for that too

To Do

  • Create tests to define what the internal functionality should be doing (since I cannot keep it straight otherwise...)
  • Integrate purrr more natively
  • Handle name conflicts
  • col_spec needs to be changed to col_types and parsed by col_spec_standardise, much like readr does

cellist's People

Contributors

colearendt avatar

Stargazers

 avatar

Watchers

 avatar

cellist's Issues

Basic implementation of `gather_list`

In tidyjson, this was called gather_array or gather_keys or gather_object, depending on the context. The basic idea is that you gather the objects in a single row into many rows.

Basic implementation of `enter_key`

The analogous operation in tidyjson is to enter_object. It can be very useful for discarding the information that you do not care about. I wonder whether there should be a similar option for an array... i.e. "just take the first."

In any case, one of the difficult questions here is: what do you do with rows that do not have this key? I prefer to give that option to the user. They should be able to do either (and we should handle NULL / NA nicely)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.