The tibble's discuss from tidyverse

Seems like we will have more exotic objects in tbl_dfs in the near future. This poses a printing challenge. And whatever RStudio is doing in View() seems like a good idea. Here are two views of a tbl_df that has a bunch of tweets in it, stored as S4 status objects from the twitteR package. Could the regular print method behave more like View() and show less, to reduce the risk of obscuring other variables? Somewhat related to a question I posed on R-help and SO earlier this year.

knit_print.trunc_mat is not declared as S3 method in NAMESPACE

Carried over from dplyr.

[[ method for tbl_df doesn't work with i, j

Issue migrated from dplyr tidyverse/dplyr#1525. Updated to show the different error message produced by tibble.

BTW you're going to get a lot of issues about nibble if my auto-correct has its way.

This is handy for inspecting list-columns.

library(tibble)
x <- data_frame(a = 1:3, b = lapply(a, seq_len))
x[[3, 2]]
#> Error in .subset2(x, i): subscript out of bounds

Rename options

dplyr.print_min
dplyr.print_max
dplyr.width

See #3.

x[i, ] gives wrong results

> tibble::as_data_frame(iris)[1:5, ]
Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

Implement as_data_frame.default()

for tibble-unaware objects such as memisc::data.set.

as_data_frame.default <-
  function(x, ...) as_data_frame(as.data.frame(x, stringsAsFactors = FALSE, ....))

Track data_frame updates in dplyr

FYI Hadley did some refactoring of data_frame() functions in dplyr yesterday.

See commits starting with tidyverse/dplyr@3d254d6

FYI I have been wanting to split these functions out as well, so 👍 from me for this package!

Should enframe work more like bind_rows?

i.e.

x <- c(a = 1, b = 4, c = 10)
enframe(x)
enframe(x, .id = "name")

But I'm not sure how it would figure out the name of the first column. Maybe use the same principle as data_frame()?

enframe <- function(x, .name = deparse(substitute(x)), .id = NULL) {
  ...
}

(in the fullness of time that would use lazyeval:: expr_text() instead of deparse(substitute(x)))

Can create corrupt data frame with matrix indexing

bar <- data_frame(a = c("a", "b"))
foo <- bar[matrix(TRUE, nrow = 2, ncol = 1)]
foo

This is the root cause of tidyverse/dplyr#1798

Highlight significant digits

Moved from tidyverse/dplyr#897

I think the default display of tibbles could be improved if each column highlighted 3 (say) significant digits, by printing all other numbers in paler grey (in terminals that support colour). This makes tables of numbers easier to scan.

Change glimpse.tbl() to glimpse.tbl_df()

The implementation doesn't look like it works with non-data-frame sources.

Distinguish between "factor" and "ordered"

column type: <fctr> vs. <ord>.

Change dplyr to tibble

Documentation
Option names

Use tibble.width as default width in glimpse()

Default argument NULL, means get width from options.

Remove add_rownames()

The original version should remain in dplyr only, functions with the new naming convention don't touch the class of the object.

Full test coverage

Provide rbind method

That uses dplyr::bind_rows().

Moved from tidyverse/dplyr#1385

add_rownames enhancement

Fixes tidyverse/dplyr#1564.

@zhilongjia: This sounds reasonable, but I think there should be two separate functions. Would you like to contribute to this package?

is.data_frame()

or is_data_frame()? Useful for testing.

option to clean/normalize data.frames?

Fixes tidyverse/dplyr#1587.

@r2evans: Would you like to contribute to this package?

type_sum() for data frames

All methods should return a string with four or less characters, suitable for succinctly display column types.

Not quite what the current implementation does.

Idea: Limit height of trunc_mat() output

Contains table and extra information. The height of the table can be controlled precisely, but not the height of the extra information. (See #51 for updated output format.)

Specifically, the print_max option could be used as limit here: The new interpretation would be that at most 20 lines are printed, no matter what.

CC @lionel-.

Formatting of S3 classes

Currently doesn't work (at least) if class wraps atomic type.

> tibble::data_frame(hms = hms(1:3))
Source: local data frame [3 x 1]

    hms
  <dbl>
1     1
2     2
3     3

as_data_frame.tbl_df() should strip additional classes

Closes tidyverse/dplyr#1744.

CC @aphalo.

Columns labels?

I'm not sure whether this should be in tibble or if I should implement it in a separate package, with a class inheriting from tbl_df, but I'd be interested in the possibility to associate (longer) labels to column names/variables. That's useful for example for survey data where variables usually have a short name and a longer label (for example the wording of the question). These labels can be stored in an attribute of the data_table, but it could be useful to have methods both for attaching these labels and for retrieving/using them. What do you think?

should [[i, ]] be an error?

Shouldn't [[i, ]] be an error?

library(tibble)
iris[[ , 1]]
#> Error in `[[.data.frame`(iris, , 1): argument "..1" is missing, with no default
iris[[1, ]]
#> Error in `[[.data.frame`(iris, 1, ): argument "..2" is missing, with no default
as_data_frame(iris)[[ ,1]]
#> Error in `[[.tbl_df`(as_data_frame(iris), , 1): argument "i" is missing, with no default

Why does this "work"?

as_data_frame(iris)[[1, ]]
#> [1] 5.1

The plot thickens

library(tibble)
mtcars[["Lotus Europa", "mpg"]]
#> [1] 30.4

Why the message about a column?

as_data_frame(mtcars)[["Lotus Europa", "mpg"]]
#> Error: Unknown column 'Lotus Europa'

Don't print number of rows if all rows printed

e.g.

> data_frame(x = 1:4)
Source: local data frame [4 x 1]

      x
  (int)
1     1
2     2
3     3
4     4

would be better as

> data_frame(x = 1:4)
      x
  (int)
1     1
2     2
3     3
4     4

This normally doesn't matter, but it's useful for books where space is at a premium

Don't print ... if number of rows is NA on input but certain after calling head()

For SQL sources.

Add a cbind method for tbl_df?

I recently needed to do the equivalent of cbind(foo = 1:3, bar) where bar was a tbl_df, where I wanted foo to end up as the first column/variable in the resulting tbl_df.

The dplyr solutions suggested to me involved mutate() + select(..., everything()) or bind_cols(), which seems to void a reason for pulling the tibbles out of that package into this one.

Having this would be a nice usability boon for the pkg.

Add and use "width" argument to wrap()

For consistent output.

Migrate news from dplyr 0.4.3+

tidyverse/dplyr#1595 (comment)

Export matrixToDataFrame()

S3 dispatch on matrix class is not reliable, e.g., for factor matrices.

For tidyverse/tidyr#131.

NA printing

Fiddling with example related to tidyverse/readr#295, I realized that tbls don't indicate NAs very well. If this is intentional and some sort of 'least of all evils', just close this.

library(tibble)
(x <- frame_data(
  ~country, ~code,
  "Belize", "BZ",
  "Namibia", "NA",
  "Narnia", NA_character_
))
#> Source: local data frame [3 x 2]
#> 
#>   country  code
#>     <chr> <chr>
#> 1  Belize    BZ
#> 2 Namibia    NA
#> 3  Narnia    NA
as.data.frame(x)
#>   country code
#> 1  Belize   BZ
#> 2 Namibia   NA
#> 3  Narnia <NA>

Regression: Awkward output for zero-row tibbles

> data_frame(a=character())
Source: local data frame [0 x 1]

Variables
  not
  shown:
  a
  (chr).

expand_grid function as trimmed down version of expand.grid (similar to data_frame)

data_frame is a trimmed down version of data.frame. A analogous expand_grid to replace expand.grid would be great...

Inconsistency: src() but make_tbl()

There's tbl() with different semantics, hence make_tbl(). Think about renaming src() to make_src().

Provide as_data_frame method for tables

So you can easily take table(x) and turn it into a "nice" data frame.

tibble() and zero-row data frames

> tibble(~a, ~b)
Error in dots[[i]] : subscript out of bounds

This should probably return an empty data frame.

CC @kevinushey

Disallow row names in tibble?

Completely disallowing row names for tibbles is not an option anymore, this would break existing code. One thing we could do is to forbid setting row names on tibbles, or at least give a warning.

Rethink inheritance from data.frame

A tibble is not quite a data frame: Some operators have slightly different semantics. There's as.data.frame() for coercion, which currently behaves oddly by stripping derived classes. (On the other hand, this is what as.data.frame.data.frame() does, too.)

Functions that check is.data.frame() would now fail. Generics dispatching over data.frame won't dispatch tibbles anymore. This is easy to fix both by the caller (by calling as.data.frame()), and also by the implementer (by coercing or defining a tbl_df generic, or a default generic that calls as.data.frame()).

No change for functions that use duck-typing and don't check/coerce input arguments.

Can create 1d array variables

data_frame(x1 = array(1), x2 = rnorm(1))
#> Source: local data frame [1 x 2]
#> 
#>      x1        x2
#>   <dbl>     <dbl>
#> 1     1 0.8534246

I think this should be an error to be consistent with

data_frame(x1 = matrix(1, 1, 1), x2 = rnorm(1))

Supply pull requests for other packages

FR: Make a data frame from a (possibly named) vector or list

Here's something I do fairly often, mostly with a list, but sometimes with a vector: Initialize a data frame with that list or vector as a variable and, at the same time, promote its names to a proper variable. Or, perhaps, add a variable of row numbers. Why is it so important to add the names or row numbers? Because later you'll want to process with tidyr, i.e. with unnest() and/or spread().

I could point to some real uses if I need to really sell this. But hopefully this will just make sense. Or someone will tell me it's already easy to do? It is already easy, but perhaps worth making a function for.

library(tibble)

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

## wish it were easy to make the names a proper variable
data_frame(id = names(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

## where id can easily default to row number
data_frame(id = seq_along(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (int)   (list)
#> 1     1 <chr[1]>
#> 2     2 <chr[1]>
#> 3     3 <chr[1]>

Why is src() in this package?

Tibble vs tbl_df

If you're new to tibble/dplyr, it's a bit confusing to understand the difference between tibbles and tbl_df. To help reduce this confusion we might:

Add ?tibble and explanation the history & definition
Make obj_sum return "tibble" for tibbles (instead of "tbl_df")

tidyverse / tibble Goto Github PK

tibble's Issues

Recommend Projects

Recommend Topics

Recommend Org