Giter Site home page Giter Site logo

tibble's Introduction

tibble

R-CMD-check Codecov test coverage CRAN_Status_Badge Life cycle

Overview

A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not. Tibbles are data.frames that are lazy and surly: they do less (i.e. they don’t change variable names or types, and don’t do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code. Tibbles also have an enhanced print() method which makes them easier to use with large datasets containing complex objects.

If you are new to tibbles, the best place to start is the tibbles chapter in R for data science.

Installation

# The easiest way to get tibble is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just tibble:
install.packages("tibble")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/tibble")

Usage

library(tibble)

Create a tibble from an existing object with as_tibble():

data <- data.frame(a = 1:3, b = letters[1:3], c = Sys.Date() - 1:3)
data
#>   a b          c
#> 1 1 a 2023-10-07
#> 2 2 b 2023-10-06
#> 3 3 c 2023-10-05

as_tibble(data)
#> # A tibble: 3 × 3
#>       a b     c         
#>   <int> <chr> <date>    
#> 1     1 a     2023-10-07
#> 2     2 b     2023-10-06
#> 3     3 c     2023-10-05

This will work for reasonable inputs that are already data.frames, lists, matrices, or tables.

You can also create a new tibble from column vectors with tibble():

tibble(x = 1:5, y = 1, z = x^2 + y)
#> # A tibble: 5 × 3
#>       x     y     z
#>   <int> <dbl> <dbl>
#> 1     1     1     2
#> 2     2     1     5
#> 3     3     1    10
#> 4     4     1    17
#> 5     5     1    26

tibble() does much less than data.frame(): it never changes the type of the inputs (e.g. it keeps list columns as is), it never changes the names of variables, it only recycles inputs of length 1, and it never creates row.names(). You can read more about these features in vignette("tibble").

You can define a tibble row-by-row with tribble():

tribble(
  ~x, ~y,  ~z,
  "a", 2,  3.6,
  "b", 1,  8.5
)
#> # A tibble: 2 × 3
#>   x         y     z
#>   <chr> <dbl> <dbl>
#> 1 a         2   3.6
#> 2 b         1   8.5

Related work

The tibble print method draws inspiration from data.table, and frame. Like data.table::data.table(), tibble() doesn’t change column names and doesn’t use rownames.


Code of Conduct

Please note that the tibble project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.

tibble's People

Contributors

anhqle avatar batpigandme avatar davisvaughan avatar dholstius avatar echasnovski avatar gdequeiroz avatar github-actions[bot] avatar hadley avatar hannes avatar ilarischeinin avatar indrajeetpatil avatar jeffreyhanson avatar jennybc avatar jimhester avatar kevinushey avatar kevinykuo avatar krlmlr avatar lindbrook avatar lionel- avatar maelle avatar mgirlich avatar ncarchedi avatar patperry avatar rbjanis avatar romainfrancois avatar stufield avatar t-kalinowski avatar tappek avatar timtaylor avatar zhilongjia avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tibble's Issues

[[ method for tbl_df doesn't work with i, j

Issue migrated from dplyr tidyverse/dplyr#1525. Updated to show the different error message produced by tibble.

BTW you're going to get a lot of issues about nibble if my auto-correct has its way.

This is handy for inspecting list-columns.

library(tibble)
x <- data_frame(a = 1:3, b = lapply(a, seq_len))
x[[3, 2]]
#> Error in .subset2(x, i): subscript out of bounds

Implement as_data_frame.default()

for tibble-unaware objects such as memisc::data.set.

as_data_frame.default <-
  function(x, ...) as_data_frame(as.data.frame(x, stringsAsFactors = FALSE, ....))

Add a cbind method for tbl_df?

I recently needed to do the equivalent of cbind(foo = 1:3, bar) where bar was a tbl_df, where I wanted foo to end up as the first column/variable in the resulting tbl_df.

The dplyr solutions suggested to me involved mutate() + select(..., everything()) or bind_cols(), which seems to void a reason for pulling the tibbles out of that package into this one.

Having this would be a nice usability boon for the pkg.

Formatting of S3 classes

Currently doesn't work (at least) if class wraps atomic type.

> tibble::data_frame(hms = hms(1:3))
Source: local data frame [3 x 1]

    hms
  <dbl>
1     1
2     2
3     3

FR: Make a data frame from a (possibly named) vector or list

Here's something I do fairly often, mostly with a list, but sometimes with a vector: Initialize a data frame with that list or vector as a variable and, at the same time, promote its names to a proper variable. Or, perhaps, add a variable of row numbers. Why is it so important to add the names or row numbers? Because later you'll want to process with tidyr, i.e. with unnest() and/or spread().

I could point to some real uses if I need to really sell this. But hopefully this will just make sense. Or someone will tell me it's already easy to do? It is already easy, but perhaps worth making a function for.

library(tibble)

x <- list(alpha = 'horrible', beta = 'list', gamma = 'column')

## wish it were easy to make the names a proper variable
data_frame(id = names(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (chr)   (list)
#> 1 alpha <chr[1]>
#> 2  beta <chr[1]>
#> 3 gamma <chr[1]>

## where id can easily default to row number
data_frame(id = seq_along(x), thing = x)
#> Source: local data frame [3 x 2]
#> 
#>      id    thing
#>   (int)   (list)
#> 1     1 <chr[1]>
#> 2     2 <chr[1]>
#> 3     3 <chr[1]>

nicer printing of list columns

Seems like we will have more exotic objects in tbl_dfs in the near future. This poses a printing challenge. And whatever RStudio is doing in View() seems like a good idea. Here are two views of a tbl_df that has a bunch of tweets in it, stored as S4 status objects from the twitteR package. Could the regular print method behave more like View() and show less, to reduce the risk of obscuring other variables? Somewhat related to a question I posed on R-help and SO earlier this year.

screen shot 2016-03-03 at 11 05 47 am

screen shot 2016-03-03 at 11 06 19 am

Should enframe work more like bind_rows?

i.e.

x <- c(a = 1, b = 4, c = 10)
enframe(x)
enframe(x, .id = "name")

But I'm not sure how it would figure out the name of the first column. Maybe use the same principle as data_frame()?

enframe <- function(x, .name = deparse(substitute(x)), .id = NULL) {
  ...
}

(in the fullness of time that would use lazyeval:: expr_text() instead of deparse(substitute(x)))

Remove add_rownames()

The original version should remain in dplyr only, functions with the new naming convention don't touch the class of the object.

Idea: Limit height of trunc_mat() output

Contains table and extra information. The height of the table can be controlled precisely, but not the height of the extra information. (See #51 for updated output format.)

Specifically, the print_max option could be used as limit here: The new interpretation would be that at most 20 lines are printed, no matter what.

CC @lionel-.

should [[i, ]] be an error?

Shouldn't [[i, ]] be an error?

library(tibble)
iris[[ , 1]]
#> Error in `[[.data.frame`(iris, , 1): argument "..1" is missing, with no default
iris[[1, ]]
#> Error in `[[.data.frame`(iris, 1, ): argument "..2" is missing, with no default
as_data_frame(iris)[[ ,1]]
#> Error in `[[.tbl_df`(as_data_frame(iris), , 1): argument "i" is missing, with no default

Why does this "work"?

as_data_frame(iris)[[1, ]]
#> [1] 5.1

The plot thickens

library(tibble)
mtcars[["Lotus Europa", "mpg"]]
#> [1] 30.4

Why the message about a column?

as_data_frame(mtcars)[["Lotus Europa", "mpg"]]
#> Error: Unknown column 'Lotus Europa'

x[i, ] gives wrong results

> tibble::as_data_frame(iris)[1:5, ]
Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa

Can create 1d array variables

data_frame(x1 = array(1), x2 = rnorm(1))
#> Source: local data frame [1 x 2]
#> 
#>      x1        x2
#>   <dbl>     <dbl>
#> 1     1 0.8534246

I think this should be an error to be consistent with

data_frame(x1 = matrix(1, 1, 1), x2 = rnorm(1))

type_sum() for data frames

All methods should return a string with four or less characters, suitable for succinctly display column types.

Not quite what the current implementation does.

Repair names and whitespace

Stripping spaces doesn't feel very consistent to me. Why strip only spaces and not other invisible white space characters? Why not strip other characters that are hard to type?

I think repair_names() would be better off if it focussed only on missing, blank, and duplicated column names.

Rethink inheritance from data.frame

A tibble is not quite a data frame: Some operators have slightly different semantics. There's as.data.frame() for coercion, which currently behaves oddly by stripping derived classes. (On the other hand, this is what as.data.frame.data.frame() does, too.)

Functions that check is.data.frame() would now fail. Generics dispatching over data.frame won't dispatch tibbles anymore. This is easy to fix both by the caller (by calling as.data.frame()), and also by the implementer (by coercing or defining a tbl_df generic, or a default generic that calls as.data.frame()).

No change for functions that use duck-typing and don't check/coerce input arguments.

Tibble vs tbl_df

If you're new to tibble/dplyr, it's a bit confusing to understand the difference between tibbles and tbl_df. To help reduce this confusion we might:

  • Add ?tibble and explanation the history & definition
  • Make obj_sum return "tibble" for tibbles (instead of "tbl_df")

Don't print number of rows if all rows printed

e.g.

> data_frame(x = 1:4)
Source: local data frame [4 x 1]

      x
  (int)
1     1
2     2
3     3
4     4

would be better as

> data_frame(x = 1:4)
      x
  (int)
1     1
2     2
3     3
4     4

This normally doesn't matter, but it's useful for books where space is at a premium

Rethink exporting dim_desc()

Currently, it's exported because print.tbl_xxx() from dplyr and backends need to access it to print dimensions themselves. Perhaps printing dimensions, and the other information (source type, grouping, ...) should be responsibility of tibble, too.

Disallow row names in tibble?

Completely disallowing row names for tibbles is not an option anymore, this would break existing code. One thing we could do is to forbid setting row names on tibbles, or at least give a warning.

Columns labels?

I'm not sure whether this should be in tibble or if I should implement it in a separate package, with a class inheriting from tbl_df, but I'd be interested in the possibility to associate (longer) labels to column names/variables. That's useful for example for survey data where variables usually have a short name and a longer label (for example the wording of the question). These labels can be stored in an attribute of the data_table, but it could be useful to have methods both for attaching these labels and for retrieving/using them. What do you think?

NA printing

Fiddling with example related to tidyverse/readr#295, I realized that tbls don't indicate NAs very well. If this is intentional and some sort of 'least of all evils', just close this.

library(tibble)
(x <- frame_data(
  ~country, ~code,
  "Belize", "BZ",
  "Namibia", "NA",
  "Narnia", NA_character_
))
#> Source: local data frame [3 x 2]
#> 
#>   country  code
#>     <chr> <chr>
#> 1  Belize    BZ
#> 2 Namibia    NA
#> 3  Narnia    NA
as.data.frame(x)
#>   country code
#> 1  Belize   BZ
#> 2 Namibia   NA
#> 3  Narnia <NA>

Highlight significant digits

Moved from tidyverse/dplyr#897

I think the default display of tibbles could be improved if each column highlighted 3 (say) significant digits, by printing all other numbers in paler grey (in terminals that support colour). This makes tables of numbers easier to scan.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.