Giter Site home page Giter Site logo

datapasta's Introduction

datapasta 3.1.1 'Leave to Simmer'

r-universe status badge CRAN status. Downloads

The Goods

pow!

Introducing datapasta

datapasta is about reducing resistance associated with copying and pasting data to and from R. It is a response to the realisation that I often found myself using intermediate programs like Sublime to munge text into suitable formats. Addins and functions in datapasta support a wide variety of input and output situations, so it (probably) "just works". Hopefully tools in this package will remove such intermediate steps and associated frustrations from our data slinging workflows.

Prerequisites

  • Linux users will need to install either xsel or xclip. These applications provide an interface to X selections (clipboard-like).
    • For example: sudo apt-get install xsel - it's 72kb...
  • Windows and MacOS have nothing extra to do.

Installation

R Universe (preferred)

  1. install with R universe repo:
install.packages(
   "datapasta", 
   repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos")))
  1. Set the keyboard shortcuts using Tools -> Addins -> Browse Addins, then click Keyboard Shortcuts...

CRAN (outdated)

For now, no further versions of datapasta will be going to CRAN. There are some known bugs in the CRAN version that have been fixed in 3.1.1.

  1. install.packages("datapasta")

Usage

Use with RStudio

Getting data into source

At the moment this package contains these RStudio addins that paste data to the cursor:

  • tribble_paste which pastes a table as a nicely formatted call to tibble::tribble()
    • Recommend Ctrl + Shift + t as shortcut.
    • Table can be delimited with tab, comma, pipe or semicolon.
  • vector_paste which will paste delimited data as a vector definition, e.g. c("a", "b") etc.
    • Recommend Ctrl + Alt + Shift + v as shortcut.
  • vector_paste_vertical which will paste delimited data as a vertically formatted vector definition.
    • Recommend Ctrl + Shift + v as shortcut
    • example output:
c("Mint",
  "Fedora",
  "Debian",
  "Ubuntu",
  "OpenSUSE")
  • df_paste which pastes a table on the clipboard as a standard data.frame definition rather than a tribble call. This has certain advantages in the context of reproducible examples and educational posts. Many thanks to Jonathan Carroll for getting this rolling and coding the bulk of the feature.
    • Recommend Ctrl + Alt + Shift + d as shortcut.
  • dt_paste which is the same as df_paste, but for data.table.

Massaging data in source

There are two Addins that can help with creating and aligning data in your editor:

  • Fiddle Selection will perform magic on a selection. It can be used to:

    • Turn raw data delimited by any combination of commas, spaces, and newlines into a c() expression
    • Pivot a c() expr between horizontal and vertical layout.
    • Reflow messy tribble() and data.frame() exprs.
    • Recommend Ctrl +Shift + f as shortcut.
  • Toggle Vector Quotes will toggle a c() expr between all elements wrapped in "" and all bare unquoted form. Handy in combination with above to save mucho keystrokes.

    • Recommend Ctrl +Shift + q as shortcut.

Getting Data out of an R session

There are two R functions available that accept R objects and output formatted text for pasting to a reprex or other application:

  • dpasta accepts tibbles, data.frames, and vectors. Data is output in a format that matches in input class. Formatted text is pasted at the cursor.

  • dmdclip accepts the same inputs as dpasta but inserts the formatted text onto the clipboard, preceded by 4 spaces so that is can be as pasted as a preformatted block to Github, Stackoverflow etc.

Use with other editors

The only hard dependency of datapasta is readr for type guessing. All the above *paste functions can be called directly instead of as an addin, and will fall back to console output if the rstudioapi is not available.

On system without access to the clipboard (or without clipr installed) datapasta can still be used to output R objects from an R session. dpasta is probably the only function you care about in this scenario.

Custom Installation

datapasta imports clipr and rstudioapi so as to make installation smooth and easy for most users. If you wish to avoid installing an rstudioapi you will never use you can use:

  • install.packages("datapasta", dependencies = "Depends").
  • Followed by install.packages("clipr") to enable clipboard features.

Pitfalls

  • tribble_paste works well with CSVs, excel files, and html tables, but is currently brittle with respect to irregular table structures like merged cells or multi-line column headings. For some reason Wikipedia seems chock full of these. :(
  • Quoted csv data, where the quotes contain commas will not be parsed correctly.
  • Nested list columns have limited support with tribble_paste()/dpasta(). Nested lists of length 1 fail unless all are length 1 - It's complicated. You still get some output so it might be viable to fix and reflow with Fiddle Selection. Tread with caution.

Prior art

This package is made possible by mdlincon's clipr, and Hadley's packages tibble and readr (for data-type guessing). I especially appreciate clipr's thoughtful approach to the clipboard on Linux, which pretty much every other R clipboard package just nope'd out on.

Future developments

I am interested in expanding the types of objects supported by the output functions dpasta. I would also like to eventually have Fiddle Selection to pivot function calls and named vectors. Feel free to contribute your ideas to the open issues.

Bonus

0 to datapasta in 64 seconds via a video vignette:

Datapasta in 64 seconds

datapasta's People

Contributors

fishgal64 avatar gadenbuie avatar harrismcgehee avatar jonocarroll avatar markdly avatar mdlincoln avatar milesmcbain avatar njtierney avatar sharlagelfand avatar sowla avatar wkapga avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

datapasta's Issues

Does datapasta work with RStudio Server?

Hi
This is cool. However, at work we use RStudio Server. When I try pasting with RStudio Server I just get the following:

Clipboard is not available. Is xsel or xclip installed? Is DISPLAY set?

It this a technical limitation with remote execution? Will it only work on a local desktop?

Cheers

"invalid 'times' value" error on factor column?

It seems like datapasta is attempting to treat factors as Date times in a normal dataframe?
This is using datapasta 2.0.0

a1 = c("B1","B2","B3","B4","B5")
a2 = c("IT,GE,FB,AI","GE,AI","FB,IT,AI","GE,IT,FB","AI")
a12 = data.frame(a1,a2)
datapasta::tribble_paste(a12)
Error in strrep(" ", char_length - nchar(char_vec)) : 
  invalid 'times' value

Traceback:

> traceback()
12: strrep(" ", char_length - nchar(char_vec))
11: paste0(strrep(" ", char_length - nchar(char_vec)), char_vec)
10: pad_to(render_type(char_vec, char_type), char_length)
9: (function (char_vec, char_type, char_length) 
   {
       pad_to(render_type(char_vec, char_type), char_length)
   })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]])
8: mapply(render_type_pad_to, col, input_table_types, col_widths)
7: paste0(mapply(render_type_pad_to, col, input_table_types, col_widths), 
       ",")
6: paste0(paste0(mapply(render_type_pad_to, col, input_table_types, 
       col_widths), ","), collapse = " ")
5: paste0(strrep(" ", oc$indent_context + oc$nspc), paste0(paste0(mapply(render_type_pad_to, 
       col, input_table_types, col_widths), ","), collapse = " "), 
       "\n", collapse = "")
4: FUN(X[[i]], ...)
3: lapply(X = as.data.frame(t(input_table), stringsAsFactors = FALSE), 
       FUN = function(col) {
           paste0(strrep(" ", oc$indent_context + oc$nspc), paste0(paste0(mapply(render_type_pad_to, 
               col, input_table_types, col_widths), ","), collapse = " "), 
               "\n", collapse = "")
       })
2: tribble_construct(input_table, oc = output_context)
1: datapasta::tribble_paste(a12)

Tribble output for list columns with length(x) = 1 values

I noticed that the tribble call is incorrectly formatted when a list column contains values with a length of one:

library(tidyverse)  
library(datapasta)

tibble_with_list <- 
    tibble(ID = 1:5) %>% 
    mutate(LIST = map(ID, ~rep(LETTERS[.x], times = .x)))

tibble_with_list
#> # A tibble: 5 x 2
#>      ID LIST     
#>   <int> <list>   
#> 1     1 <chr [1]>
#> 2     2 <chr [2]>
#> 3     3 <chr [3]>
#> 4     4 <chr [4]>
#> 5     5 <chr [5]>

dpasta(tibble_with_list)
#> tibble::tribble(
#>   ~ID,                      ~LIST,
#>    1L,                          A,
#>    2L,                c("B", "B"),
#>    3L,           c("C", "C", "C"),
#>    4L,      c("D", "D", "D", "D"),
#>    5L, c("E", "E", "E", "E", "E")
#>   )

tibble::tribble(
  ~ID,                      ~LIST,
   1L,                          A,
   2L,                c("B", "B"),
   3L,           c("C", "C", "C"),
   4L,      c("D", "D", "D", "D"),
   5L, c("E", "E", "E", "E", "E")
  )
#> Error in extract_frame_data_from_dots(...): object 'A' not found

df_paste preserves factors but drops unseen levels

example:

head(iris) %>% df_paste()
data.frame(
   Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4),
    Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9),
   Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7),
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4),
        Species = as.factor(c("setosa", "setosa", "setosa", "setosa", "setosa",
                              "setosa"))
)

Part of me is happy with this since unused factor levels are usually annoying. However this may be unexpected for some people.

Error: strrep invalid "times" value

The following data frame can not be pasted as tribble:

df <- data.frame(stringsAsFactors=FALSE,
          X1 = c("b", "c", "C", "g", "g", "G", "L", "L", "L", "L", "I", "I",
                 "I", "I", "l", "l", "l", "l", "O", "S", "V", "Z", "a", "d",
                 "g", "m", "A", "A", "A", "W", "M", "M", "M", "M", "A", "B", "B",
                 "D", "D", "G", "H", "H", "K", "K", "L", "L", "N", "N", "V"),
          X2 = c("6", "(", NA, "q", "9", "6", "I", "l", "1", "|", "l", "1",
                 "|", "L", "I", "L", "1", "|", "0", "5", "U", "2", "ci", "cl",
                 "cj", "rn", "fi", "fl", "fI", "VV", "IVI", "lvl", "Ivl", "lvI",
                 "/\\", "l3", "I3", "I)", "l)", "", "I-I", "l-l", "l<", "l{",
                 "l_", "I_", "l\\l", "I\\I", "\\/")
)

datapasta::tribble_paste(df)
#> Error in strrep(" ", char_length - nchar(char_vec)) : 
#>   invalid 'times' value

interestingly enough vector_paste_vertical works on both X1 and X2

tibble as a dependency

I know, you dont actually depend on it for this code to execute, but for this to be usable, it would be nice to have tibble loaded and tribble() available. Maybe Suggests?

Inline replacement of rstudio selection when using addins.

The idea is that you could highlight an expression of variable and have the output of that converted to it's string literal form inline, replacing the original selection.

Need to consider the ramifications for the more esoteric classes e.g. nested lists potentially containing dates/times, s4 classes etc.

One cool usecase for this I would use all the time is to pivot vectors from horizontal layout e.g. c("a", "b", "c") to vertical e.g.:

c("a",
  "b",
  "c")

And vice-versa. Although now that I write that, I immediately want it for function arguments too, which will look quite different. Maybe this is a different package.

Handle columns that do not start with Latin characters

Right now tribble_paste() and df_paste() can't handle columns that start with non-Latin characters, e.g. it does

library(datapasta)
tribble_paste()
tibble::tribble(
            ~!!, ~2015, ~%, ~?a,
             1L,   "b", 3L, "D"
            )

df_paste()
data.frame(stringsAsFactors=FALSE,
          !! = c(1L),
        2015 = c("b"),
           % = c(3L),
          ?a = c("D")
)

but if you actually try to use those, it's not super successful -- you get errors in both cases because columns that don't start with Latin characters need to be surrounded by `` (I can't even run a reprex for it!).

If you think it's a good idea to add this in, I'm happy to take a stab at it (ok, I mean, I already started). I feel fairly confident in the tribble case since they can actually handle columns that don't start with Latin characters (by surrounding with ``), but less sure about the data.frame version since they can't. Here's my thought of how it'd look:

tribble_paste()
tibble::tribble(
                  ~`!!`, ~`2015`, ~`%`, ~`?a`,
                     1L,     "b",   3L,   "D"
                  )
#> # A tibble: 1 x 4
#>    `!!` `2015`   `%` `?a` 
#>   <int> <chr>  <int> <chr>
#> 1     1 b          3 D

df_paste()
data.frame(stringsAsFactors=FALSE,
                                          `!!` = c(1L),
                                        `2015` = c("b"),
                                           `%` = c(3L),
                                          `?a` = c("D"))
#>   X.. X2015 X. X.a
#> 1   1     b  3   D

For data.frame, the numeric case is ok (just starts with X) but non-alpha-numeric characters are replaced with .. I think having them start with X is fine but probably the . business is not.

I'd be interested to hear your thoughts on a workaround or implementation for this data.frame case -- I use tribble_paste() a ton so selfishly only really care about that one, but the extension to df_paste() is natural. Maybe it could throw an informative error instead of letting you try to create the data.frame in the first example and then forcing you to deal with whatever regular R error results.

again, I'm happy to actually implement it (yay tidy tools skills 🎉)

X11 support note

For Linux users, clipr will require either xclip or xsel to be installed, external to R. datapasta inherits this dependency, so you may wish to make a note of it in the README. clipr does throw its own error message noting this, but it can't hurt to mention it up front.

Error when pasting from Excel, when there are empty cells in the last column

I noticed that when you're trying to paste data from Excel, sometimes the "Text could not be parsed as table." error appears. This only happens when the data includes empty cells in the last column. (If there are empty cells in any other columns, it works well, with the empty cells converted to NAs.)

For example copy-pasting this table from Excel will give the error:

A B C
1 5 9
2 6  
3 7 10
4 8 11

The problem appears to be only from Excel. For instance if you copy-paste the above table from the browser, it works well. In Excel you can prevent the problem if you put a space instead of the empty cell, but obviously, ideally, it should work with the empty cell as well.

Escape backslashes

Current datapasta has issues with text containing backslashes.

Copying
\\my-server\DATA\libraries
and pasting with "paste as vector" gives
c("\\my-server\DATA\libraries")
which on execution gives
Error: '\D' is an unrecognized escape in character string starting ""\\my-server\D"

I think it can be fixed by using deparse().

By the way, would you consider having a length-one array paste as a scalar instead of vector? Or is this against your design ideas?

i.e. such that copying bcdef and pasting would result in "bcdef" instead of c("bcdef").

FR: Add data.table support

Following a discussion in rstudio community,

a datatable_paste and datatable_construct could be a great addition to complement df_paste / df_contruct and tribble_paste / tribble_construct

Number of rows argument in tribble_paste

Working with beginners, it could be useful to include an argument for number of rows to output, so if you wanted to only have 5 or 10 rows from a larger dataset to create an issue they are having. Could have this default to the maximum length of the dataframe?

Read tables from Stack Overflow

People in Stack Overflow often paste the output of the print() from R to provide an example data frame.
This results in something like this:

to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356

When pasting this inside quotes in readr::read_table, it's OK:

readr::read_table("to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356")

Yet, when using the addin "Paste as tribble", it results in:

tibble::tribble(
   ~to..RealAge,
   "513 59.608",
    "513 84.18",
    "0   85.23",
   "119 74.764",
   "116 65.356"
  )

Is this a feature you want to add? Is it easy to do? Do you want me to try to make a PR?

Handle empty lines before or after table without falling back to no separator.

Reported by @thijsfijen on Twittter.

In the case of an empty line immediately befre or after the table, guessing the separator fails. Add test and fix for:


X	Location	Min	Max
Partly cloudy.	Brisbane	19	29
Partly cloudy.	Brisbane Airport	18	27
Possible shower.	Beaudesert	15	30
Partly cloudy.	Chermside	17	29
Shower or two. Possible storm.	Gatton	15	32
Possible shower.	Ipswich	15	30
Partly cloudy.	Logan Central	18	29
Mostly sunny.	Manly	20	26
Partly cloudy.	Mount Gravatt	17	28
Possible shower.	Oxley	17	30
Partly cloudy.	Redcliffe	19	27

Vignette Example: Pasting a list as a horizontal vector

Hello,

I'm using datapasta 3.0.0 and I found an inconsistency while going through the package vignette.

The section "Pasting a list as a horizontal vector with vector_paste()" states that the "paste as vector" addin allows going from this:

Mint Fedora Debian Ubuntu OpenSUSE

to this:

c("Mint", "Fedora", "Debian", "Ubuntu", "OpenSUSE")

But when I select the space delimited text and use the "paste as vector" addin, I get this:

c("Mint Fedora Debian Ubuntu OpenSUSE")

Which is a vector, but not the one I was expecting. I'm not sure if this is the intended behavior. Text in the example doesn't say that spaces are valid separators, so maybe this is expected behavior and the example I referenced needs to be removed? The other 2 examples (separated by commas and newlines) paste into R as expected.

Peter

Line breaks in column headers not handled well

Copying "Double Dissolution Triggers" table here results in

tribble(
  ~Second.rejection.by,
  "the Senate	Bill",
  "18 June 2014	Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015	Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016	Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016	Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)

rather than

tribble(
  ~`Second rejection by the Senate`, ~Bill,
  "18 June 2014", "Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015", "Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)```

Guess the presence/absence of header row for tribble_paste()

I saw someone using datapasta to make quick plots from selected pieces of an excel workbook. It was a little annoying because they were just grabbing bare blocks of data which had the first row converted to a header when they pasted. They had to un-headerise the first row and make a new dummy one.

I reckon we can guess the first row is not a header if readr guesses types identically to the rest of the dataframe. Would fail in the special case of ALL character data. But, maybe some analysis of factors can avoid some cases of that too.

expression must be a one-length vector?

I'm trying to clean up an output from dput, and I'm getting an odd error with datapasta:

dframe <- structure(list(y = c(-0.551803287760631, -1.30494019324738, 0.00821236626893252, 
                               0.638916511414093, -0.816805971651003, 1.12037288852287), Reg_Date = structure(c(1420217760, 
                                                                                                                1420217760, 1420217880, 1420217880, 1420217880, 1420217880), class = c("POSIXct", 
                                                                                                                                                                                       "POSIXt"), tzone = "UTC"), Del_Date = structure(c(NA, NA, 1468065900, 
                                                                                                                                                                                                                                         1468065900, 1468065900, 1468065900), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                         ), tzone = "UTC"), days = c(1042L, 1042L, 554L, 554L, 554L, 554L
                                                                                                                                                                                                                                         ), Start_Date = structure(c(1420217880, 1420217880, 1420218180, 
                                                                                                                                                                                                                                                                     1420218180, 1420218180, 1420218180), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                                                     ), tzone = "UTC"), Stop_Date = structure(c(NA, NA, 1468065900, 
                                                                                                                                                                                                                                                                                                                1468065900, 1468065900, 1468065900), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                                                                                                ), tzone = "UTC"), group = c("A", "B", "C", "D", "E", "F")), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                           -6L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("y", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                      "Reg_Date", "Del_Date", "days", "Start_Date", "Stop_Date", "group"
                                                                                                                                                                                                                                                                                                                                                                                           ))
datapasta::tribble_paste(dframe)
#> Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone 'zone/tz/2017c.1.0/
#> zoneinfo/America/Chicago'
#> Error in switch(df_col_type, integer = 1, character = 2 + length(gregexpr(pattern = "(\"|')", : EXPR must be a length 1 vector

Even this example fails, which seems odd

datapasta::tribble_paste(iris)
#> Error in strrep(" ", char_length - nchar(char_vec)): invalid 'times' value

Both examples are on version 2.0.0

Handle quoted csv's that contain comma's in quotes.

The primitive parsing strategy does not currently respect things like:

name, age
"claus, santa",1746

Right now I do not have an easy way to handle it, since the delimiter has to be guessed.

assuming the guessing algorithm can be tweaked to accurately guess "," as the separator, a switch could be used to make a call to read.csv() instead of read.table which would handle the quotes.

Emacs front end for datapasta

Is this possible? From the manual and my poking around, it didn't seem to be. But maybe I missed something?

If it is not possible to use these functions outside RStudio, could I ask this as a feature request please?

Thank you!

_format() versions of _paste() functions

To help out non-Rstudio users we output the formatted data structures to the clipboard. While still returning the formatted output as per the _paste() functions.

Pasting from Excel to tribble (locale Polish) decimal point ",", system delimiter ";"

Pasting from Excel to tribble (locale Polish: decimal point ",", system delimiter ";" ) turns the numeric column into text:
tribble(
~A, ~B, ~C, ~D,
3, "7,4", 5, 5,
5, "9", 8, 5,
10, "9", 3, 10,
2, "7", 9, 5,
10, "7", 2, 7,
10, "10", 2, 10,
1, "7", 4, 9
)
Can the add-in be modified either to accept some settings or read them from the system?
Great plugin! Big thanks for addressing copy paste functinality!

pasting into a dataframe WITHOUT printing the tibble

Hello there! This is a really nice package. I wonder if there is a way to

  • copy data from excel
  • paste directing into a tibble

without showing the tibble in the R script. Indeed, my copypasted data is pretty big and the resulting tibble is very large. I just want to store it in a proper tibble directly

Can datapasta do that?
Thanks!

Odd string encoding?

This could be more of an base-r issue, but it seems like even when I try to get data from rvest and use datapasta datapasta::tribble_paste(foo4) to get a pasteable tibble I get some really odd text with red dots on it in the values column.

I've tried it with and without specifying encoding in read_html (from rvest package), not sure what is going on here.
I've attached a photo for you so you can see:

image

sample_data <- tibble::tibble(
  id = c(390639,99472,361258,360716),
  name = c("pollyanna-eleanor-with-vanilla-beans","brickstone-apa","penrose-taproom-ipa","revolution-rev-pils"),
  link = c("https://www.ratebeer.com/beer/pollyanna-eleanor-with-vanilla-beans/390639/",
           "https://www.ratebeer.com/beer/brickstone-apa/99472/",
           "https://www.ratebeer.com/beer/penrose-taproom-ipa/361258/",
           "https://www.ratebeer.com/beer/revolution-rev-pils/360716/"
  )
)
##Right off the gate the URL's have 'unknown encoding'
Encoding(sample_data$link)
[1] "unknown" "unknown" "unknown" "unknown"

get_brewer_stats1 <- function(x){
  read_html(x) %>%
    html_nodes(xpath = '//*[@id="container"]/div[2]/div[2]/div[2]') %>%
    html_text()
}

foo4 <- sample_data %>%
  mutate(., value = map_chr(sample_data$link, get_brewer_stats1))
##in the terminal you can see the red blocks, and when you copy/paste...maybe
##because no encoding is specified when read_html
datapasta::tribble_paste(foo4)
pasted_data <- tibble::tribble(
     ~id,                                   ~name,                                                                         ~link,                                                                                           ~value,
  390639,  "pollyanna-eleanor-with-vanilla-beans",  "https://www.ratebeer.com/beer/pollyanna-eleanor-with-vanilla-beans/390639/",  "RATINGS: 4   MEAN: 3.83/5.0   WEIGHTED AVG: 3.39/5   IBU: 35   EST. CALORIES: 204   ABV: 6.8%",
   99472,                        "brickstone-apa",                         "https://www.ratebeer.com/beer/brickstone-apa/99472/",                           "RATINGS: 89   WEIGHTED AVG: 3.64/5   EST. CALORIES: 188   ABV: 6.25%",
  361258,                   "penrose-taproom-ipa",                   "https://www.ratebeer.com/beer/penrose-taproom-ipa/361258/",   "RATINGS: 8   MEAN: 3.7/5.0   WEIGHTED AVG: 3.45/5   IBU: 85   EST. CALORIES: 213   ABV: 7.1%",
  360716,                   "revolution-rev-pils",                   "https://www.ratebeer.com/beer/revolution-rev-pils/360716/",   "RATINGS: 34   MEAN: 3.47/5.0   WEIGHTED AVG: 3.42/5   IBU: 50   EST. CALORIES: 150   ABV: 5%"
  )
#Returns UTF-8
Encoding(pasted_data$value)
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8"

My session info is below.. Maybe it's because I'm on Debian?

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rlang_0.1.4.9000 selectr_0.3-1    rvest_0.3.2      xml2_1.1.1       clipr_0.4.0.9000 bindrcpp_0.2    
 [7] forcats_0.2.0    stringr_1.2.0    dplyr_0.7.4      purrr_0.2.4      readr_1.1.1      tidyr_0.7.2     
[13] tibble_1.3.4     ggplot2_2.2.1    tidyverse_1.2.0  magrittr_1.5    

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.3 reshape2_1.4.2   haven_1.1.0      lattice_0.20-35  colorspace_1.3-2 htmltools_0.3.6 
 [7] yaml_2.1.14      XML_3.98-1.9     foreign_0.8-69   glue_1.2.0.9000  withr_2.1.0.9000 modelr_0.1.1    
[13] readxl_1.0.0     bindr_0.1        plyr_1.8.4       munsell_0.4.3    gtable_0.2.0     cellranger_1.1.0
[19] devtools_1.13.4  evaluate_0.10.1  psych_1.7.8      memoise_1.1.0    knitr_1.17       callr_1.0.0.9000
[25] curl_3.0         parallel_3.4.2   broom_0.4.2      Rcpp_0.12.13     backports_1.1.1  scales_0.5.0    
[31] debugme_1.1.0    jsonlite_1.5     mnormt_1.5-5     hms_0.3          digest_0.6.12    stringi_1.1.5   
[37] processx_2.0.0.1 rprojroot_1.2    grid_3.4.2       cli_1.0.0        tools_3.4.2      lazyeval_0.2.1  
[43] crayon_1.3.4     whisker_0.3-2    pkgconfig_2.0.1  datapasta_2.0.0  reprex_0.1.1     lubridate_1.7.1 
[49] rmarkdown_1.7    assertthat_0.2.0 httr_1.3.1       rstudioapi_0.7   R6_2.2.2         nlme_3.1-131    
[55] compiler_3.4.2  

Please let me know if there is anything else you need from me. I'm running this inside a docker container, so you need access to it, I can definitely supply. Thanks so much for your package.

Lines ending all in commas create separator guess failure

Data like this

a,b,c
1,2,
3,4,

Will cause sep guesser to incorrectly avoid choosing ",". When splitting and counting the columns the lines that end in a comma are counted one less, leading to a mismatch between header and row length.

The separator is only guessed on a sample, so the solution could be to inject NA at the end of rows that terminate in a comma. Injecting whitespace would also work.

What to do about types?

I think automatic type guessing is a potentially destructive process and shouldn't be the default. Some way to enable it might be convenient for numeric data.

Some ways to provide this facility:

  • Use some kind of option to turn on type guessing, which is set via a function call.
  • Provide a helper method for character tibbles that can guess the type of column and mutate it.

Alternately to guessing type, we could create a helper function that implements a provided type spec, kind of like the col classes shorthand from readr.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.