milesmcbain / datapasta Goto Github PK

View Code? Open in Web Editor NEW

887.0 26.0 58.0 1.8 MB

On top of spaghetti, all covered in cheese....

Home Page: https://milesmcbain.github.io/datapasta/

License: Other

R 100.00%

r copypaste excel addin clipboard tibble

datapasta's Introduction

datapasta 3.1.1 'Leave to Simmer'

The Goods

Introducing datapasta

datapasta is about reducing resistance associated with copying and pasting data to and from R. It is a response to the realisation that I often found myself using intermediate programs like Sublime to munge text into suitable formats. Addins and functions in datapasta support a wide variety of input and output situations, so it (probably) "just works". Hopefully tools in this package will remove such intermediate steps and associated frustrations from our data slinging workflows.

Prerequisites

Linux users will need to install either xsel or xclip. These applications provide an interface to X selections (clipboard-like).
- For example: sudo apt-get install xsel - it's 72kb...
Windows and MacOS have nothing extra to do.

Installation

R Universe (preferred)

install with R universe repo:

install.packages(
   "datapasta", 
   repos = c(mm = "https://milesmcbain.r-universe.dev", getOption("repos")))

Set the keyboard shortcuts using Tools -> Addins -> Browse Addins, then click Keyboard Shortcuts...

CRAN (outdated)

For now, no further versions of datapasta will be going to CRAN. There are some known bugs in the CRAN version that have been fixed in 3.1.1.

install.packages("datapasta")

Usage

Use with RStudio

Getting data into source

At the moment this package contains these RStudio addins that paste data to the cursor:

tribble_paste which pastes a table as a nicely formatted call to tibble::tribble()
- Recommend Ctrl + Shift + t as shortcut.
- Table can be delimited with tab, comma, pipe or semicolon.
vector_paste which will paste delimited data as a vector definition, e.g. c("a", "b") etc.
- Recommend Ctrl + Alt + Shift + v as shortcut.
vector_paste_vertical which will paste delimited data as a vertically formatted vector definition.
- Recommend Ctrl + Shift + v as shortcut
- example output:

c("Mint",
  "Fedora",
  "Debian",
  "Ubuntu",
  "OpenSUSE")

df_paste which pastes a table on the clipboard as a standard data.frame definition rather than a tribble call. This has certain advantages in the context of reproducible examples and educational posts. Many thanks to Jonathan Carroll for getting this rolling and coding the bulk of the feature.
- Recommend Ctrl + Alt + Shift + d as shortcut.
dt_paste which is the same as df_paste, but for data.table.

Massaging data in source

There are two Addins that can help with creating and aligning data in your editor:

Fiddle Selection will perform magic on a selection. It can be used to:
- Turn raw data delimited by any combination of commas, spaces, and newlines into a c() expression
- Pivot a c() expr between horizontal and vertical layout.
- Reflow messy tribble() and data.frame() exprs.
- Recommend Ctrl +Shift + f as shortcut.
Toggle Vector Quotes will toggle a c() expr between all elements wrapped in "" and all bare unquoted form. Handy in combination with above to save mucho keystrokes.
- Recommend Ctrl +Shift + q as shortcut.

Getting Data out of an R session

There are two R functions available that accept R objects and output formatted text for pasting to a reprex or other application:

dpasta accepts tibbles, data.frames, and vectors. Data is output in a format that matches in input class. Formatted text is pasted at the cursor.
dmdclip accepts the same inputs as dpasta but inserts the formatted text onto the clipboard, preceded by 4 spaces so that is can be as pasted as a preformatted block to Github, Stackoverflow etc.

Use with other editors

The only hard dependency of datapasta is readr for type guessing. All the above *paste functions can be called directly instead of as an addin, and will fall back to console output if the rstudioapi is not available.

On system without access to the clipboard (or without clipr installed) datapasta can still be used to output R objects from an R session. dpasta is probably the only function you care about in this scenario.

Custom Installation

datapasta imports clipr and rstudioapi so as to make installation smooth and easy for most users. If you wish to avoid installing an rstudioapi you will never use you can use:

install.packages("datapasta", dependencies = "Depends").
Followed by install.packages("clipr") to enable clipboard features.

Pitfalls

tribble_paste works well with CSVs, excel files, and html tables, but is currently brittle with respect to irregular table structures like merged cells or multi-line column headings. For some reason Wikipedia seems chock full of these. :(
Quoted csv data, where the quotes contain commas will not be parsed correctly.
Nested list columns have limited support with tribble_paste()/dpasta(). Nested lists of length 1 fail unless all are length 1 - It's complicated. You still get some output so it might be viable to fix and reflow with Fiddle Selection. Tread with caution.

Prior art

This package is made possible by mdlincon's clipr, and Hadley's packages tibble and readr (for data-type guessing). I especially appreciate clipr's thoughtful approach to the clipboard on Linux, which pretty much every other R clipboard package just nope'd out on.

Future developments

I am interested in expanding the types of objects supported by the output functions dpasta. I would also like to eventually have Fiddle Selection to pivot function calls and named vectors. Feel free to contribute your ideas to the open issues.

Bonus

0 to datapasta in 64 seconds via a video vignette:

datapasta's People

Contributors

Stargazers

Watchers

Forkers

njtierney jonocarroll harrismcgehee shabbychef wkapga nemochina2008 markdly applied-statistic-using-r mehmetaergun juadiegaitan anishsingh20 privefl mhamine radovankavicky gapdata federicoandreis gnetsanet baifengbai apoorv74 cuulee xtmgah nanaakwasiabayieboateng aryans09 1010101012101 guhjy flatl1neapt sowla erinsteiner-noaa jun-lizst ruixiangliu khameelbm saso008 mkim0710 benjaminwolfe gadenbuie j450h1 funcodingpanda kojimizu rcleoni snanalyst lauratrainyecologist richardgao1992 catherian-cat lenamax2355 hirothreading cjabradshaw nischalshrestha seifudd arrendi owain-s pherephobia icodein wurli rakeit

datapasta's Issues

Pasting from Excel to tribble (locale Polish) decimal point ",", system delimiter ";"

Pasting from Excel to tribble (locale Polish: decimal point ",", system delimiter ";" ) turns the numeric column into text:
tribble(
~A, ~B, ~C, ~D,
3, "7,4", 5, 5,
5, "9", 8, 5,
10, "9", 3, 10,
2, "7", 9, 5,
10, "7", 2, 7,
10, "10", 2, 10,
1, "7", 4, 9
)
Can the add-in be modified either to accept some settings or read them from the system?
Great plugin! Big thanks for addressing copy paste functinality!

Determine reason for `dpasta` failure with this dataset

Google drive link to dataset: https://drive.google.com/open?id=0B7688WPR38x2N2J5WkxQQ3hzT2M

pasting into a dataframe WITHOUT printing the tibble

Hello there! This is a really nice package. I wonder if there is a way to

copy data from excel
paste directing into a tibble

without showing the tibble in the R script. Indeed, my copypasted data is pretty big and the resulting tibble is very large. I just want to store it in a proper tibble directly

Can datapasta do that?
Thanks!

Number of rows argument in tribble_paste

Working with beginners, it could be useful to include an argument for number of rows to output, so if you wanted to only have 5 or 10 rows from a larger dataset to create an issue they are having. Could have this default to the maximum length of the dataframe?

Try catch in tribble_paste catches xsel/xclip not found error on Linux

Instead of a message to install xsel or xclip when absent, the user gets a message that the clipboard failed to parse as a table.

df_paste preserves factors but drops unseen levels

example:

head(iris) %>% df_paste()
data.frame(
   Sepal.Length = c(5.1, 4.9, 4.7, 4.6, 5, 5.4),
    Sepal.Width = c(3.5, 3, 3.2, 3.1, 3.6, 3.9),
   Petal.Length = c(1.4, 1.4, 1.3, 1.5, 1.4, 1.7),
    Petal.Width = c(0.2, 0.2, 0.2, 0.2, 0.2, 0.4),
        Species = as.factor(c("setosa", "setosa", "setosa", "setosa", "setosa",
                              "setosa"))
)

Part of me is happy with this since unused factor levels are usually annoying. However this may be unexpected for some people.

Odd string encoding?

This could be more of an base-r issue, but it seems like even when I try to get data from rvest and use datapasta datapasta::tribble_paste(foo4) to get a pasteable tibble I get some really odd text with red dots on it in the values column.

I've tried it with and without specifying encoding in read_html (from rvest package), not sure what is going on here.
I've attached a photo for you so you can see:

sample_data <- tibble::tibble(
  id = c(390639,99472,361258,360716),
  name = c("pollyanna-eleanor-with-vanilla-beans","brickstone-apa","penrose-taproom-ipa","revolution-rev-pils"),
  link = c("https://www.ratebeer.com/beer/pollyanna-eleanor-with-vanilla-beans/390639/",
           "https://www.ratebeer.com/beer/brickstone-apa/99472/",
           "https://www.ratebeer.com/beer/penrose-taproom-ipa/361258/",
           "https://www.ratebeer.com/beer/revolution-rev-pils/360716/"
  )
)
##Right off the gate the URL's have 'unknown encoding'
Encoding(sample_data$link)
[1] "unknown" "unknown" "unknown" "unknown"

get_brewer_stats1 <- function(x){
  read_html(x) %>%
    html_nodes(xpath = '//*[@id="container"]/div[2]/div[2]/div[2]') %>%
    html_text()
}

foo4 <- sample_data %>%
  mutate(., value = map_chr(sample_data$link, get_brewer_stats1))
##in the terminal you can see the red blocks, and when you copy/paste...maybe
##because no encoding is specified when read_html
datapasta::tribble_paste(foo4)
pasted_data <- tibble::tribble(
     ~id,                                   ~name,                                                                         ~link,                                                                                           ~value,
  390639,  "pollyanna-eleanor-with-vanilla-beans",  "https://www.ratebeer.com/beer/pollyanna-eleanor-with-vanilla-beans/390639/",  "RATINGS: 4   MEAN: 3.83/5.0   WEIGHTED AVG: 3.39/5   IBU: 35   EST. CALORIES: 204   ABV: 6.8%",
   99472,                        "brickstone-apa",                         "https://www.ratebeer.com/beer/brickstone-apa/99472/",                           "RATINGS: 89   WEIGHTED AVG: 3.64/5   EST. CALORIES: 188   ABV: 6.25%",
  361258,                   "penrose-taproom-ipa",                   "https://www.ratebeer.com/beer/penrose-taproom-ipa/361258/",   "RATINGS: 8   MEAN: 3.7/5.0   WEIGHTED AVG: 3.45/5   IBU: 85   EST. CALORIES: 213   ABV: 7.1%",
  360716,                   "revolution-rev-pils",                   "https://www.ratebeer.com/beer/revolution-rev-pils/360716/",   "RATINGS: 34   MEAN: 3.47/5.0   WEIGHTED AVG: 3.42/5   IBU: 50   EST. CALORIES: 150   ABV: 5%"
  )
#Returns UTF-8
Encoding(pasted_data$value)
[1] "UTF-8" "UTF-8" "UTF-8" "UTF-8"

My session info is below.. Maybe it's because I'm on Debian?

sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rlang_0.1.4.9000 selectr_0.3-1    rvest_0.3.2      xml2_1.1.1       clipr_0.4.0.9000 bindrcpp_0.2    
 [7] forcats_0.2.0    stringr_1.2.0    dplyr_0.7.4      purrr_0.2.4      readr_1.1.1      tidyr_0.7.2     
[13] tibble_1.3.4     ggplot2_2.2.1    tidyverse_1.2.0  magrittr_1.5    

loaded via a namespace (and not attached):
 [1] tidyselect_0.2.3 reshape2_1.4.2   haven_1.1.0      lattice_0.20-35  colorspace_1.3-2 htmltools_0.3.6 
 [7] yaml_2.1.14      XML_3.98-1.9     foreign_0.8-69   glue_1.2.0.9000  withr_2.1.0.9000 modelr_0.1.1    
[13] readxl_1.0.0     bindr_0.1        plyr_1.8.4       munsell_0.4.3    gtable_0.2.0     cellranger_1.1.0
[19] devtools_1.13.4  evaluate_0.10.1  psych_1.7.8      memoise_1.1.0    knitr_1.17       callr_1.0.0.9000
[25] curl_3.0         parallel_3.4.2   broom_0.4.2      Rcpp_0.12.13     backports_1.1.1  scales_0.5.0    
[31] debugme_1.1.0    jsonlite_1.5     mnormt_1.5-5     hms_0.3          digest_0.6.12    stringi_1.1.5   
[37] processx_2.0.0.1 rprojroot_1.2    grid_3.4.2       cli_1.0.0        tools_3.4.2      lazyeval_0.2.1  
[43] crayon_1.3.4     whisker_0.3-2    pkgconfig_2.0.1  datapasta_2.0.0  reprex_0.1.1     lubridate_1.7.1 
[49] rmarkdown_1.7    assertthat_0.2.0 httr_1.3.1       rstudioapi_0.7   R6_2.2.2         nlme_3.1-131    
[55] compiler_3.4.2

Please let me know if there is anything else you need from me. I'm running this inside a docker container, so you need access to it, I can definitely supply. Thanks so much for your package.

Use clipr::clipr_available() to test for availability of clipboard

Give specific error message if clipboard not available.

Unmatched single quotes lead to parsing errors

This data

a,b,c
this,is,testing
now,you're,testing

Will produce a tribble with no rows. Need to investigate further but the solution probably involves escaping ' and potentially ".

Error: strrep invalid "times" value

The following data frame can not be pasted as tribble:

df <- data.frame(stringsAsFactors=FALSE,
          X1 = c("b", "c", "C", "g", "g", "G", "L", "L", "L", "L", "I", "I",
                 "I", "I", "l", "l", "l", "l", "O", "S", "V", "Z", "a", "d",
                 "g", "m", "A", "A", "A", "W", "M", "M", "M", "M", "A", "B", "B",
                 "D", "D", "G", "H", "H", "K", "K", "L", "L", "N", "N", "V"),
          X2 = c("6", "(", NA, "q", "9", "6", "I", "l", "1", "|", "l", "1",
                 "|", "L", "I", "L", "1", "|", "0", "5", "U", "2", "ci", "cl",
                 "cj", "rn", "fi", "fl", "fI", "VV", "IVI", "lvl", "Ivl", "lvI",
                 "/\\", "l3", "I3", "I)", "l)", "(¬", "I-I", "l-l", "l<", "l{",
                 "l_", "I_", "l\\l", "I\\I", "\\/")
)

datapasta::tribble_paste(df)
#> Error in strrep(" ", char_length - nchar(char_vec)) : 
#>   invalid 'times' value

interestingly enough vector_paste_vertical works on both X1 and X2

`could not find function "tribble_format"`

Hello.

Thank you for the package!

I seem to be missing tribble_format? (I am running the latest version with clipr installed, but outside RStudio).

Thank you.

Add test skips for clipboard operations in non-interactive sessions.

Avoids a problem on the CRAN end. Discussed in mdlincoln/clipr#30

Handle empty lines before or after table without falling back to no separator.

Reported by @thijsfijen on Twittter.

In the case of an empty line immediately befre or after the table, guessing the separator fails. Add test and fix for:


X	Location	Min	Max
Partly cloudy.	Brisbane	19	29
Partly cloudy.	Brisbane Airport	18	27
Possible shower.	Beaudesert	15	30
Partly cloudy.	Chermside	17	29
Shower or two. Possible storm.	Gatton	15	32
Possible shower.	Ipswich	15	30
Partly cloudy.	Logan Central	18	29
Mostly sunny.	Manly	20	26
Partly cloudy.	Mount Gravatt	17	28
Possible shower.	Oxley	17	30
Partly cloudy.	Redcliffe	19	27

Strip whitetspace from natural language lists.

It would be nice if:
Mint, Fedora, Debian, Ubuntu, OpenSUSE
would paste without the whitespace, instead of:
c("Mint", " Fedora", " Debian", " Ubuntu", " OpenSUSE")

remove dependency on R > 3.3.0 via backports?

Presumably the dependency on 3.3.0 is for the strrep function, which is available from the backports package. Is there other functionality required?

Investigate using lukas-rokka/readrGuess for delimiter guessing.

Graceful error handling when table parse fails.

A try catch maybe? That kind of structure would also be a way to try different separators.

Add function wrapper for inline output reprex hack

Implement a funciton wrapper for this:
https://twitter.com/MilesMcBain/status/820473995787051010

Add a base64 paste

Based on https://gist.github.com/noamross/0cb3708e72c4f18c5ab747a1609468b7

Make a slick gif.

You know you want to.

Escape backslashes

Current datapasta has issues with text containing backslashes.

Copying
\\my-server\DATA\libraries
and pasting with "paste as vector" gives
c("\\my-server\DATA\libraries")
which on execution gives
Error: '\D' is an unrecognized escape in character string starting ""\\my-server\D"

I think it can be fixed by using deparse().

By the way, would you consider having a length-one array paste as a scalar instead of vector? Or is this against your design ideas?

i.e. such that copying bcdef and pasting would result in "bcdef" instead of c("bcdef").

Handle quoted csv's that contain comma's in quotes.

The primitive parsing strategy does not currently respect things like:

name, age
"claus, santa",1746

Right now I do not have an easy way to handle it, since the delimiter has to be guessed.

assuming the guessing algorithm can be tweaked to accurately guess "," as the separator, a switch could be used to make a call to read.csv() instead of read.table which would handle the quotes.

Error when pasting from Excel, when there are empty cells in the last column

I noticed that when you're trying to paste data from Excel, sometimes the "Text could not be parsed as table." error appears. This only happens when the data includes empty cells in the last column. (If there are empty cells in any other columns, it works well, with the empty cells converted to NAs.)

For example copy-pasting this table from Excel will give the error:

A	B	C
1	5	9
2	6
3	7	10
4	8	11

The problem appears to be only from Excel. For instance if you copy-paste the above table from the browser, it works well. In Excel you can prevent the problem if you put a space instead of the empty cell, but obviously, ideally, it should work with the empty cell as well.

Lines ending all in commas create separator guess failure

Data like this

a,b,c
1,2,
3,4,

Will cause sep guesser to incorrectly avoid choosing ",". When splitting and counting the columns the lines that end in a comma are counted one less, leading to a mismatch between header and row length.

The separator is only guessed on a sample, so the solution could be to inject NA at the end of rows that terminate in a comma. Injecting whitespace would also work.

Tribble output for list columns with length(x) = 1 values

I noticed that the tribble call is incorrectly formatted when a list column contains values with a length of one:

library(tidyverse)  
library(datapasta)

tibble_with_list <- 
    tibble(ID = 1:5) %>% 
    mutate(LIST = map(ID, ~rep(LETTERS[.x], times = .x)))

tibble_with_list
#> # A tibble: 5 x 2
#>      ID LIST     
#>   <int> <list>   
#> 1     1 <chr [1]>
#> 2     2 <chr [2]>
#> 3     3 <chr [3]>
#> 4     4 <chr [4]>
#> 5     5 <chr [5]>

dpasta(tibble_with_list)
#> tibble::tribble(
#>   ~ID,                      ~LIST,
#>    1L,                          A,
#>    2L,                c("B", "B"),
#>    3L,           c("C", "C", "C"),
#>    4L,      c("D", "D", "D", "D"),
#>    5L, c("E", "E", "E", "E", "E")
#>   )

tibble::tribble(
  ~ID,                      ~LIST,
   1L,                          A,
   2L,                c("B", "B"),
   3L,           c("C", "C", "C"),
   4L,      c("D", "D", "D", "D"),
   5L, c("E", "E", "E", "E", "E")
  )
#> Error in extract_frame_data_from_dots(...): object 'A' not found

expression must be a one-length vector?

I'm trying to clean up an output from dput, and I'm getting an odd error with datapasta:

dframe <- structure(list(y = c(-0.551803287760631, -1.30494019324738, 0.00821236626893252, 
                               0.638916511414093, -0.816805971651003, 1.12037288852287), Reg_Date = structure(c(1420217760, 
                                                                                                                1420217760, 1420217880, 1420217880, 1420217880, 1420217880), class = c("POSIXct", 
                                                                                                                                                                                       "POSIXt"), tzone = "UTC"), Del_Date = structure(c(NA, NA, 1468065900, 
                                                                                                                                                                                                                                         1468065900, 1468065900, 1468065900), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                         ), tzone = "UTC"), days = c(1042L, 1042L, 554L, 554L, 554L, 554L
                                                                                                                                                                                                                                         ), Start_Date = structure(c(1420217880, 1420217880, 1420218180, 
                                                                                                                                                                                                                                                                     1420218180, 1420218180, 1420218180), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                                                     ), tzone = "UTC"), Stop_Date = structure(c(NA, NA, 1468065900, 
                                                                                                                                                                                                                                                                                                                1468065900, 1468065900, 1468065900), class = c("POSIXct", "POSIXt"
                                                                                                                                                                                                                                                                                                                ), tzone = "UTC"), group = c("A", "B", "C", "D", "E", "F")), row.names = c(NA, 
                                                                                                                                                                                                                                                                                                                                                                                           -6L), class = c("tbl_df", "tbl", "data.frame"), .Names = c("y", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                      "Reg_Date", "Del_Date", "days", "Start_Date", "Stop_Date", "group"
                                                                                                                                                                                                                                                                                                                                                                                           ))
datapasta::tribble_paste(dframe)
#> Warning in as.POSIXlt.POSIXct(x, tz): unknown timezone 'zone/tz/2017c.1.0/
#> zoneinfo/America/Chicago'
#> Error in switch(df_col_type, integer = 1, character = 2 + length(gregexpr(pattern = "(\"|')", : EXPR must be a length 1 vector

Even this example fails, which seems odd

datapasta::tribble_paste(iris)
#> Error in strrep(" ", char_length - nchar(char_vec)): invalid 'times' value

Both examples are on version 2.0.0

Inline replacement of rstudio selection when using addins.

The idea is that you could highlight an expression of variable and have the output of that converted to it's string literal form inline, replacing the original selection.

Need to consider the ramifications for the more esoteric classes e.g. nested lists potentially containing dates/times, s4 classes etc.

One cool usecase for this I would use all the time is to pivot vectors from horizontal layout e.g. c("a", "b", "c") to vertical e.g.:

c("a",
  "b",
  "c")

And vice-versa. Although now that I write that, I immediately want it for function arguments too, which will look quite different. Maybe this is a different package.

What to do about types?

I think automatic type guessing is a potentially destructive process and shouldn't be the default. Some way to enable it might be convenient for numeric data.

Some ways to provide this facility:

Use some kind of option to turn on type guessing, which is set via a function call.
Provide a helper method for character tibbles that can guess the type of column and mutate it.

Alternately to guessing type, we could create a helper function that implements a provided type spec, kind of like the col classes shorthand from readr.

Output newline after end of vector_paste_vertical

Works nicer for console output.

Reformat an existing tribble() call

If I've got a pre-existing tribble() call, does this package help me to format it nicely, i.e. line up the commas?

Add stringsAsFactors = FALSE to df_paste()

Read tables from Stack Overflow

People in Stack Overflow often paste the output of the print() from R to provide an example data frame.
This results in something like this:

to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356

When pasting this inside quotes in readr::read_table, it's OK:

readr::read_table("to  RealAge
513 59.608
513 84.18
0   85.23
119 74.764
116 65.356")

Yet, when using the addin "Paste as tribble", it results in:

tibble::tribble(
   ~to..RealAge,
   "513 59.608",
    "513 84.18",
    "0   85.23",
   "119 74.764",
   "116 65.356"
  )

Is this a feature you want to add? Is it easy to do? Do you want me to try to make a PR?

FR: Add data.table support

Following a discussion in rstudio community,

a datatable_paste and datatable_construct could be a great addition to complement df_paste / df_contruct and tribble_paste / tribble_construct

IDEA: Transform dput output from clipboard to tibble, dataframe or data.table

This is based on this question
https://community.rstudio.com/t/convert-dput-output-clipboard-to-a-data-frame-or-tibble-tribble/16050/6

The idea would be to copy a dput output and be able to use *_paste family function.

would it be in the scope of this 📦 ?
There seems to be other solution currently existing (read.so 📦 , wrapr 📦 )

Quote toggling doesn't work if the vector elements have spaces

Minimal breaking example:

c(Name,Location,Part of)

Add warning for tribble_format encountering factors.

Cannot neatly represent factors in a tribble call. Currently prints as a character, but should do so with a warning.

Does datapasta work with RStudio Server?

Hi
This is cool. However, at work we use RStudio Server. When I try pasting with RStudio Server I just get the following:

Clipboard is not available. Is xsel or xclip installed? Is DISPLAY set?

It this a technical limitation with remote execution? Will it only work on a local desktop?

Cheers

X11 support note

For Linux users, clipr will require either xclip or xsel to be installed, external to R. datapasta inherits this dependency, so you may wish to make a note of it in the README. clipr does throw its own error message noting this, but it can't hurt to mention it up front.

Update vignette for df_paste().

Determining the indent context of a selection does not work correctly

Uses the end of the selection. Should use the beginning. Leads to examples like this in vector_paste_vertical():

c("Mint",
                                         "Fedora",
                                         "Debian",
                                         "Ubuntu",
                                         "OpenSUSE")

Handle columns that do not start with Latin characters

Right now tribble_paste() and df_paste() can't handle columns that start with non-Latin characters, e.g. it does

library(datapasta)
tribble_paste()
tibble::tribble(
            ~!!, ~2015, ~%, ~?a,
             1L,   "b", 3L, "D"
            )

df_paste()
data.frame(stringsAsFactors=FALSE,
          !! = c(1L),
        2015 = c("b"),
           % = c(3L),
          ?a = c("D")
)

but if you actually try to use those, it's not super successful -- you get errors in both cases because columns that don't start with Latin characters need to be surrounded by `` (I can't even run a reprex for it!).

If you think it's a good idea to add this in, I'm happy to take a stab at it (ok, I mean, I already started). I feel fairly confident in the tribble case since they can actually handle columns that don't start with Latin characters (by surrounding with ``), but less sure about the data.frame version since they can't. Here's my thought of how it'd look:

tribble_paste()
tibble::tribble(
                  ~`!!`, ~`2015`, ~`%`, ~`?a`,
                     1L,     "b",   3L,   "D"
                  )
#> # A tibble: 1 x 4
#>    `!!` `2015`   `%` `?a` 
#>   <int> <chr>  <int> <chr>
#> 1     1 b          3 D

df_paste()
data.frame(stringsAsFactors=FALSE,
                                          `!!` = c(1L),
                                        `2015` = c("b"),
                                           `%` = c(3L),
                                          `?a` = c("D"))
#>   X.. X2015 X. X.a
#> 1   1     b  3   D

For data.frame, the numeric case is ok (just starts with X) but non-alpha-numeric characters are replaced with .. I think having them start with X is fine but probably the . business is not.

I'd be interested to hear your thoughts on a workaround or implementation for this data.frame case -- I use tribble_paste() a ton so selfishly only really care about that one, but the extension to df_paste() is natural. Maybe it could throw an informative error instead of letting you try to create the data.frame in the first example and then forcing you to deal with whatever regular R error results.

again, I'm happy to actually implement it (yay tidy tools skills 🎉)

tibble as a dependency

I know, you dont actually depend on it for this code to execute, but for this to be usable, it would be nice to have tibble loaded and tribble() available. Maybe Suggests?

Vignette Example: Pasting a list as a horizontal vector

Hello,

I'm using datapasta 3.0.0 and I found an inconsistency while going through the package vignette.

The section "Pasting a list as a horizontal vector with vector_paste()" states that the "paste as vector" addin allows going from this:

Mint Fedora Debian Ubuntu OpenSUSE

to this:

c("Mint", "Fedora", "Debian", "Ubuntu", "OpenSUSE")

But when I select the space delimited text and use the "paste as vector" addin, I get this:

c("Mint Fedora Debian Ubuntu OpenSUSE")

Which is a vector, but not the one I was expecting. I'm not sure if this is the intended behavior. Text in the example doesn't say that spaces are valid separators, so maybe this is expected behavior and the example I referenced needs to be removed? The other 2 examples (separated by commas and newlines) paste into R as expected.

Peter

Guess the presence/absence of header row for tribble_paste()

I saw someone using datapasta to make quick plots from selected pieces of an excel workbook. It was a little annoying because they were just grabbing bare blocks of data which had the first row converted to a header when they pasted. They had to un-headerise the first row and make a new dummy one.

I reckon we can guess the first row is not a header if readr guesses types identically to the rest of the dataframe. Would fail in the special case of ALL character data. But, maybe some analysis of factors can avoid some cases of that too.

Line breaks in column headers not handled well

Copying "Double Dissolution Triggers" table here results in

tribble(
  ~Second.rejection.by,
  "the Senate	Bill",
  "18 June 2014	Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015	Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016	Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016	Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)

rather than

tribble(
  ~`Second rejection by the Senate`, ~Bill,
  "18 June 2014", "Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015", "Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)```

Emacs front end for datapasta

Is this possible? From the manual and my poking around, it didn't seem to be. But maybe I missed something?

If it is not possible to use these functions outside RStudio, could I ask this as a feature request please?

Thank you!

_format() versions of _paste() functions

To help out non-Rstudio users we output the formatted data structures to the clipboard. While still returning the formatted output as per the _paste() functions.

"invalid 'times' value" error on factor column?

It seems like datapasta is attempting to treat factors as Date times in a normal dataframe?
This is using datapasta 2.0.0

a1 = c("B1","B2","B3","B4","B5")
a2 = c("IT,GE,FB,AI","GE,AI","FB,IT,AI","GE,IT,FB","AI")
a12 = data.frame(a1,a2)
datapasta::tribble_paste(a12)
Error in strrep(" ", char_length - nchar(char_vec)) : 
  invalid 'times' value

Traceback:

> traceback()
12: strrep(" ", char_length - nchar(char_vec))
11: paste0(strrep(" ", char_length - nchar(char_vec)), char_vec)
10: pad_to(render_type(char_vec, char_type), char_length)
9: (function (char_vec, char_type, char_length) 
   {
       pad_to(render_type(char_vec, char_type), char_length)
   })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]])
8: mapply(render_type_pad_to, col, input_table_types, col_widths)
7: paste0(mapply(render_type_pad_to, col, input_table_types, col_widths), 
       ",")
6: paste0(paste0(mapply(render_type_pad_to, col, input_table_types, 
       col_widths), ","), collapse = " ")
5: paste0(strrep(" ", oc$indent_context + oc$nspc), paste0(paste0(mapply(render_type_pad_to, 
       col, input_table_types, col_widths), ","), collapse = " "), 
       "\n", collapse = "")
4: FUN(X[[i]], ...)
3: lapply(X = as.data.frame(t(input_table), stringsAsFactors = FALSE), 
       FUN = function(col) {
           paste0(strrep(" ", oc$indent_context + oc$nspc), paste0(paste0(mapply(render_type_pad_to, 
               col, input_table_types, col_widths), ","), collapse = " "), 
               "\n", collapse = "")
       })
2: tribble_construct(input_table, oc = output_context)
1: datapasta::tribble_paste(a12)

Update no clipboard error message with instructions for clipboard packages install. Like Clipr.

Why NA-NA header in this blog post example

https://newsandnumbers.org/2019/01/12/quickly-import-and-export-data-from-r-with-datapasta/