tidyverse / tibble Goto Github PK
View Code? Open in Web Editor NEWA modern re-imagining of the data frame
Home Page: https://tibble.tidyverse.org/
License: Other
A modern re-imagining of the data frame
Home Page: https://tibble.tidyverse.org/
License: Other
Fixes tidyverse/dplyr#1523.
Seems like we will have more exotic objects in tbl_df
s in the near future. This poses a printing challenge. And whatever RStudio is doing in View()
seems like a good idea. Here are two views of a tbl_df
that has a bunch of tweets in it, stored as S4 status
objects from the twitteR
package. Could the regular print method behave more like View()
and show less, to reduce the risk of obscuring other variables? Somewhat related to a question I posed on R-help and SO earlier this year.
Carried over from dplyr.
Issue migrated from dplyr
tidyverse/dplyr#1525. Updated to show the different error message produced by tibble
.
BTW you're going to get a lot of issues about nibble
if my auto-correct has its way.
This is handy for inspecting list-columns.
library(tibble)
x <- data_frame(a = 1:3, b = lapply(a, seq_len))
x[[3, 2]]
#> Error in .subset2(x, i): subscript out of bounds
See #3.
> tibble::as_data_frame(iris)[1:5, ]
Source: local data frame [5 x 5]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
<dbl> <dbl> <dbl> <dbl> <fctr>
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
for tibble-unaware objects such as memisc::data.set.
as_data_frame.default <-
function(x, ...) as_data_frame(as.data.frame(x, stringsAsFactors = FALSE, ....))
FYI Hadley did some refactoring of data_frame()
functions in dplyr yesterday.
See commits starting with tidyverse/dplyr@3d254d6
FYI I have been wanting to split these functions out as well, so ๐ from me for this package!
i.e.
x <- c(a = 1, b = 4, c = 10)
enframe(x)
enframe(x, .id = "name")
But I'm not sure how it would figure out the name of the first column. Maybe use the same principle as data_frame()
?
enframe <- function(x, .name = deparse(substitute(x)), .id = NULL) {
...
}
(in the fullness of time that would use lazyeval:: expr_text()
instead of deparse(substitute(x))
)
bar <- data_frame(a = c("a", "b"))
foo <- bar[matrix(TRUE, nrow = 2, ncol = 1)]
foo
This is the root cause of tidyverse/dplyr#1798
Moved from tidyverse/dplyr#897
I think the default display of tibbles could be improved if each column highlighted 3 (say) significant digits, by printing all other numbers in paler grey (in terminals that support colour). This makes tables of numbers easier to scan.
The implementation doesn't look like it works with non-data-frame sources.
column type: <fctr>
vs. <ord>
.
Default argument NULL, means get width from options.
The original version should remain in dplyr only, functions with the new naming convention don't touch the class of the object.
That uses dplyr::bind_rows()
.
Moved from tidyverse/dplyr#1385
Fixes tidyverse/dplyr#1564.
@zhilongjia: This sounds reasonable, but I think there should be two separate functions. Would you like to contribute to this package?
or is_data_frame()? Useful for testing.
Fixes tidyverse/dplyr#1587.
@r2evans: Would you like to contribute to this package?
All methods should return a string with four or less characters, suitable for succinctly display column types.
Not quite what the current implementation does.
Contains table and extra information. The height of the table can be controlled precisely, but not the height of the extra information. (See #51 for updated output format.)
Specifically, the print_max option could be used as limit here: The new interpretation would be that at most 20 lines are printed, no matter what.
CC @lionel-.
Currently doesn't work (at least) if class wraps atomic type.
> tibble::data_frame(hms = hms(1:3))
Source: local data frame [3 x 1]
hms
<dbl>
1 1
2 2
3 3
Closes tidyverse/dplyr#1744.
CC @aphalo.
I'm not sure whether this should be in tibble
or if I should implement it in a separate package, with a class inheriting from tbl_df
, but I'd be interested in the possibility to associate (longer) labels to column names/variables. That's useful for example for survey data where variables usually have a short name and a longer label (for example the wording of the question). These labels can be stored in an attribute of the data_table
, but it could be useful to have methods both for attaching these labels and for retrieving/using them. What do you think?
Shouldn't [[i, ]]
be an error?
library(tibble)
iris[[ , 1]]
#> Error in `[[.data.frame`(iris, , 1): argument "..1" is missing, with no default
iris[[1, ]]
#> Error in `[[.data.frame`(iris, 1, ): argument "..2" is missing, with no default
as_data_frame(iris)[[ ,1]]
#> Error in `[[.tbl_df`(as_data_frame(iris), , 1): argument "i" is missing, with no default
Why does this "work"?
as_data_frame(iris)[[1, ]]
#> [1] 5.1
The plot thickens
library(tibble)
mtcars[["Lotus Europa", "mpg"]]
#> [1] 30.4
Why the message about a column?
as_data_frame(mtcars)[["Lotus Europa", "mpg"]]
#> Error: Unknown column 'Lotus Europa'
e.g.
> data_frame(x = 1:4)
Source: local data frame [4 x 1]
x
(int)
1 1
2 2
3 3
4 4
would be better as
> data_frame(x = 1:4)
x
(int)
1 1
2 2
3 3
4 4
This normally doesn't matter, but it's useful for books where space is at a premium
For SQL sources.
I recently needed to do the equivalent of cbind(foo = 1:3, bar)
where bar
was a tbl_df
, where I wanted foo
to end up as the first column/variable in the resulting tbl_df
.
The dplyr solutions suggested to me involved mutate()
+ select(..., everything())
or bind_cols()
, which seems to void a reason for pulling the tibbles out of that package into this one.
Having this would be a nice usability boon for the pkg.
For consistent output.
S3 dispatch on matrix class is not reliable, e.g., for factor matrices.
For tidyverse/tidyr#131.
Fiddling with example related to tidyverse/readr#295, I realized that tbl
s don't indicate NA
s very well. If this is intentional and some sort of 'least of all evils', just close this.
library(tibble)
(x <- frame_data(
~country, ~code,
"Belize", "BZ",
"Namibia", "NA",
"Narnia", NA_character_
))
#> Source: local data frame [3 x 2]
#>
#> country code
#> <chr> <chr>
#> 1 Belize BZ
#> 2 Namibia NA
#> 3 Narnia NA
as.data.frame(x)
#> country code
#> 1 Belize BZ
#> 2 Namibia NA
#> 3 Narnia <NA>
> data_frame(a=character())
Source: local data frame [0 x 1]
Variables
not
shown:
a
(chr).
data_frame
is a trimmed down version of data.frame
. A analogous expand_grid
to replace expand.grid
would be great...
There's tbl()
with different semantics, hence make_tbl()
. Think about renaming src()
to make_src()
.
So you can easily take table(x)
and turn it into a "nice" data frame.
> tibble(~a, ~b)
Error in dots[[i]] : subscript out of bounds
This should probably return an empty data frame.
CC @kevinushey
Completely disallowing row names for tibbles is not an option anymore, this would break existing code. One thing we could do is to forbid setting row names on tibbles, or at least give a warning.
A tibble is not quite a data frame: Some operators have slightly different semantics. There's as.data.frame()
for coercion, which currently behaves oddly by stripping derived classes. (On the other hand, this is what as.data.frame.data.frame()
does, too.)
Functions that check is.data.frame()
would now fail. Generics dispatching over data.frame
won't dispatch tibbles anymore. This is easy to fix both by the caller (by calling as.data.frame()
), and also by the implementer (by coercing or defining a tbl_df
generic, or a default
generic that calls as.data.frame()
).
No change for functions that use duck-typing and don't check/coerce input arguments.
data_frame(x1 = array(1), x2 = rnorm(1))
#> Source: local data frame [1 x 2]
#>
#> x1 x2
#> <dbl> <dbl>
#> 1 1 0.8534246
I think this should be an error to be consistent with
data_frame(x1 = matrix(1, 1, 1), x2 = rnorm(1))
Here's something I do fairly often, mostly with a list, but sometimes with a vector: Initialize a data frame with that list or vector as a variable and, at the same time, promote its names to a proper variable. Or, perhaps, add a variable of row numbers. Why is it so important to add the names or row numbers? Because later you'll want to process with tidyr
, i.e. with unnest()
and/or spread()
.
I could point to some real uses if I need to really sell this. But hopefully this will just make sense. Or someone will tell me it's already easy to do? It is already easy, but perhaps worth making a function for.
library(tibble)
x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')
## wish it were easy to make the names a proper variable
data_frame(id = names(x), thing = x)
#> Source: local data frame [3 x 2]
#>
#> id thing
#> (chr) (list)
#> 1 alpha <chr[1]>
#> 2 beta <chr[1]>
#> 3 gamma <chr[1]>
## where id can easily default to row number
data_frame(id = seq_along(x), thing = x)
#> Source: local data frame [3 x 2]
#>
#> id thing
#> (int) (list)
#> 1 1 <chr[1]>
#> 2 2 <chr[1]>
#> 3 3 <chr[1]>
If you're new to tibble/dplyr, it's a bit confusing to understand the difference between tibbles and tbl_df
. To help reduce this confusion we might:
?tibble
and explanation the history & definitionobj_sum
return "tibble" for tibbles (instead of "tbl_df")Currently, it's exported because print.tbl_xxx() from dplyr and backends need to access it to print dimensions themselves. Perhaps printing dimensions, and the other information (source type, grouping, ...) should be responsibility of tibble, too.
Fixes tidyverse/dplyr#1572.
Stripping spaces doesn't feel very consistent to me. Why strip only spaces and not other invisible white space characters? Why not strip other characters that are hard to type?
I think repair_names()
would be better off if it focussed only on missing, blank, and duplicated column names.
Not used anymore in dplyr, seems to use only basic Rcpp functionality.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.