Giter Site home page Giter Site logo

piper's Introduction

pipeR

Linux Build Status Windows Build status codecov.io CRAN Version

pipeR provides various styles of function chaining methods:

  • Pipe operator
  • Pipe object
  • pipeline function

Each of them represents a distinct pipeline model but they share almost a common set of features. A value can be piped to the next expression

  • As the first unnamed argument of the function
  • As dot symbol (.) in the expression
  • As a named variable defined by a formula
  • For side-effect that carries over the input to the next
  • For assignment that saves an intermediate value

The syntax is designed to make the pipeline more readable and friendly to a wide variety of operations.

pipeR Tutorial is a highly recommended complete guide to pipeR.

This document is also translated into 日本語 (by @hoxo_m).

Installation

Install the latest development version from GitHub:

devtools::install_github("renkun-ken/pipeR")

Install from CRAN:

install.packages("pipeR")

Getting started

The following code is an example written in traditional approach:

It basically performs bootstrap on mpg values in built-in dataset mtcars and plots its density function estimated by Gaussian kernel.

plot(density(sample(mtcars$mpg, size = 10000, replace = TRUE), 
  kernel = "gaussian"), col = "red", main="density of mpg (bootstrap)")

The code is deeply nested and can be hard to read and maintain. In the following examples, the traditional code is rewritten by Pipe operator, Pipe() function and pipeline() function, respectively.

  • Operator-based pipeline
mtcars$mpg %>>%
  sample(size = 10000, replace = TRUE) %>>%
  density(kernel = "gaussian") %>>%
  plot(col = "red", main = "density of mpg (bootstrap)")
  • Object-based pipeline (Pipe())
Pipe(mtcars$mpg)$
  sample(size = 10000, replace = TRUE)$
  density(kernel = "gaussian")$
  plot(col = "red", main = "density of mpg (bootstrap)")
  • Argument-based pipeline
pipeline(mtcars$mpg,
  sample(size = 10000, replace = TRUE),
  density(kernel = "gaussian"),
  plot(col = "red", main = "density of mpg (bootstrap)"))
  • Expression-based pipeline
pipeline({
  mtcars$mpg
  sample(size = 10000, replace = TRUE)
  density(kernel = "gaussian")
  plot(col = "red", main = "density of mpg (bootstrap)")  
})

Usage

%>>%

Pipe operator %>>% basically pipes the left-hand side value forward to the right-hand side expression which is evaluated according to its syntax.

Pipe to first-argument of function

Many R functions are pipe-friendly: they take some data by the first argument and transform it in a certain way. This arrangement allows operations to be streamlined by pipes, that is, one data source can be put to the first argument of a function, get transformed, and put to the first argument of the next function. In this way, a chain of commands are connected, and it is called a pipeline.

On the right-hand side of %>>%, whenever a function name or call is supplied, the left-hand side value will always be put to the first unnamed argument to that function.

rnorm(100) %>>%
  plot
rnorm(100) %>>%
  plot(col="red")

Sometimes the value on the left is needed at multiple places. One can use . to represent it anywhere in the function call.

rnorm(100) %>>%
  plot(col="red", main=length(.))

There are situations where one calls a function in a namespace with ::. In this case, the call must end up with ().

rnorm(100) %>>%
  stats::median()
  
rnorm(100) %>>%
  graphics::plot(col = "red")

Pipe to . in an expression

Not all functions are pipe-friendly in every case: You may find some functions do not take your data produced by a pipeline as the first argument. In this case, you can enclose your expression by {} or () so that %>>% will use . to represent the value on the left.

mtcars %>>%
  { lm(mpg ~ cyl + wt, data = .) }
mtcars %>>%
  ( lm(mpg ~ cyl + wt, data = .) )

Pipe by formula as lambda expression

Sometimes, it may look confusing to use . to represent the value being piped. For example,

mtcars %>>%
  (lm(mpg ~ ., data = .))

Although it works perfectly, it may look ambiguous if . has several meanings in one line of code.

%>>% accepts lambda expression to direct its piping behavior. Lambda expression is characterized by a formula enclosed within (), for example, (x ~ f(x)). It contains a user-defined symbol to represent the value being piped and the expression to be evaluated.

mtcars %>>%
  (df ~ lm(mpg ~ ., data = df))
mtcars %>>%
  subset(select = c(mpg, wt, cyl)) %>>%
  (x ~ plot(mpg ~ ., data = x))

Pipe for side effect

In a pipeline, one may be interested not only in the final outcome but sometimes also in intermediate results. To print, plot or save the intermediate results, it must be a side-effect to avoid breaking the mainstream pipeline. For example, calling plot() to draw scatter plot returns NULL, and if one directly calls plot() in the middle of a pipeline, it would break the pipeline by changing the subsequent input to NULL.

One-sided formula that starts with ~ indicates that the right-hand side expression will only be evaluated for its side-effect, its value will be ignored, and the input value will be returned instead.

mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ cat("rows:",nrow(.),"\n")) %>>%   # cat() returns NULL
  summary
mtcars %>>%
  subset(mpg >= quantile(mpg, 0.05) & mpg <= quantile(mpg, 0.95)) %>>%
  (~ plot(mpg ~ wt, data = .)) %>>%    # plot() returns NULL
  (lm(mpg ~ wt, data = .)) %>>%
  summary()

With ~, side-effect operations can be easily distinguished from mainstream pipeline.

An easier way to print the intermediate value it to use (? expr) syntax like asking question.

mtcars %>>% 
  (? ncol(.)) %>>%
  summary

Pipe with assignment

In addition to printing and plotting, one may need to save an intermediate value to the environment by assigning the value to a variable (symbol).

If one needs to assign the value to a symbol, just insert a step like (~ symbol), then the input value of that step will be assigned to symbol in the current environment.

mtcars %>>%
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary

If the input value is not directly to be saved but after some transformation, then one can use =, <-, or more natural -> to specify a lambda expression to tell what to be saved (thanks @yanlinlin82 for suggestion).

mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary
mtcars %>>%
  (~ summary(.) -> summ) %>>%
  
mtcars %>>%
  (~ summ <- summary(.)) %>>%

An easier way to saving intermediate value that is to be further piped is to use (symbol = expression) syntax:

mtcars %>>%
  (~ summ = summary(.)) %>>%  # side-effect assignment
  (lm_mtcars = lm(formula = mpg ~ wt + cyl, data = .)) %>>%  # continue piping
  summary

or (expression -> symbol) syntax:

mtcars %>>%
  (~ summary(.) -> summ) %>>%  # side-effect assignment
  (lm(formula = mpg ~ wt + cyl, data = .) -> lm_mtcars) %>>%  # continue piping
  summary

Extract element from an object

x %>>% (y) means extracting the element named y from object x where y must be a valid symbol name and x can be a vector, list, environment or anything else for which [[]] is defined, or S4 object.

mtcars %>>%
  (lm(mpg ~ wt + cyl, data = .)) %>>%
  (~ lm_mtcars) %>>%
  summary %>>%
  (r.squared)

Compatibility

library(dplyr)
mtcars %>>%
  filter(mpg <= mean(mpg)) %>>%  
  select(mpg, wt, cyl) %>>%
  (~ plot(.)) %>>%
  (model = lm(mpg ~ wt + cyl, data = .)) %>>%
  (summ = summary(.)) %>>%
  (coefficients)
library(ggvis)
mtcars %>>%
  ggvis(~mpg, ~wt) %>>%
  layer_points()
library(rlist)
1:100 %>>%
  list.group(. %% 3) %>>%
  list.mapv(g ~ mean(g))

Pipe()

Pipe() creates a Pipe object that supports light-weight chaining without any external operator. Typically, start with Pipe() and end with $value or [] to extract the final value of the Pipe.

Pipe object provides an internal function .(...) that work exactly in the same way with x %>>% (...), and it has more features than %>>%.

NOTE: .() does not support assignment with = but supports ~, <- and ->.

Piping

Pipe(rnorm(1000))$
  density(kernel = "cosine")$
  plot(col = "blue")
Pipe(mtcars)$
  .(mpg)$
  summary()
Pipe(mtcars)$
  .(~ summary(.) -> summ)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)

Subsetting and extracting

pmtcars <- Pipe(mtcars)
pmtcars[c("mpg","wt")]$
  lm(formula = mpg ~ wt)$
  summary()
pmtcars[["mpg"]]$mean()

Assigning values

plist <- Pipe(list(a=1,b=2))
plist$a <- 0
plist$b <- NULL

Side effect

Pipe(mtcars)$
  .(? ncol(.))$
  .(~ plot(mpg ~ ., data = .))$    # side effect: plot
  lm(formula = mpg ~ .)$
  .(~ lm_mtcars)$                  # side effect: assign
  summary()$

Compatibility

  • Working with dplyr:
Pipe(mtcars)$
  filter(mpg >= mean(mpg))$
  select(mpg, wt, cyl)$
  lm(formula = mpg ~ wt + cyl)$
  summary()$
  .(coefficients)$
  value
  • Working with ggvis:
Pipe(mtcars)$
  ggvis(~ mpg, ~ wt)$
  layer_points()
  • Working with rlist:
Pipe(1:100)$
  list.group(. %% 3)$
  list.mapv(g ~ mean(g))$
  value

pipeline()

pipeline() provides argument-based and expression-based pipeline evaluation mechanisms. Its behavior depends on how its arguments are supplied. If only the first argument is supplied, it expects an expression enclosed in {} in which each line represents a pipeline step. If, instead, multiple arguments are supplied, it regards each argument as a pipeline step. For all pipeline steps, the expressions will be transformed to be connected by %>>% so that they behave exactly the same.

One notable difference is that in pipeline()'s argument or expression, the special symbols to perform specially defined pipeline tasks (e.g. side-effect) does not need to be enclosed within () because no operator priority issues arise as they do in using %>>%.

pipeline({
  mtcars
  lm(formula = mpg ~ cyl + wt)
  ~ lmodel
  summary
  ? .$r.squared
  coef
})

Thanks @hoxo_m for the idea presented in this post.

License

This package is under MIT License.

piper's People

Contributors

briandiggs avatar gitter-badger avatar hoxo-m avatar renkun-ken avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

piper's Issues

Better debugging experience

It may be hard to debug code in which pipe operators are used. Is there a way to improve the debugging experience with the package?

Pipe does not work well with data.table using dplyr

Pipe(mtcars)$
  filter(mpg <= mean(mpg))$
  head(1)
$value : data.frame 
------
   mpg cyl disp  hp drat   wt  qsec vs am gear carb
1 18.7   8  360 175 3.15 3.44 17.02  0  0    3    2

However, when mtcars is converted to data.table, the dplyr functions no longer work.

Pipe(mtcars)$
  as.data.table()$
  filter(mpg <= mean(mpg))$
  head(1)
Error in `[.data.frame`(x, i, j) : object 'mpg' not found
Pipe(mtcars)$
  as.data.table()$
  mutate(mpg1 = mpg * 2)$
  head(1)
Error in `:=`(mpg1, mpg * 2) : 
  Check that is.data.table(DT) == TRUE. Otherwise, := and `:=`(...) are defined for use in j, once only and in particular ways. See help(":=").

But this works.

Pipe(mtcars)$
  as.data.table()$
  .(mutate(.,p = mpg))$
  head(1)
$value : data.table data.frame 
------
   mpg cyl disp  hp drat   wt  qsec vs am gear carb  p
1:  21   6  160 110  3.9 2.62 16.46  0  1    4    4 21

While at the same time %>>% works fine without any problem.

It seems that dplyr functions somehow mistakenly take input as data frame but try to use data.table-specific functions to manipulate the object, which cause errors.

Supported assignment syntax

The following syntax are supported:

Pipe with assignment

x %>>% (p <- f(.))
x %>>% (f(.) -> p)
# p <- f(x); p

Assignment as side-effect

x %>>% (~ p <- f(.))
x %>>% (~ f(.) -> p)
# p <- f(x); x

Using =

x %>>% (p = f(.))  # p <- f(x); p
x %>>% (~ p = f(.))  # p <- f(x); x

Subsetting for Pipe does not work with data.table in local environment

> library(pipeR)
> library(data.table)
data.table 1.9.2  For help type: help("data.table")
> z <- Pipe(data.table(x=1:3,y=rnorm(3),key="x"))
> z
$value : data.table data.frame 
------
   x          y
1: 1  1.2629543
2: 2 -0.3262334
3: 3  1.3297993
> local({i <- 1; z[J(i)]})
Error in eval(expr, envir, enclos) : object 'i' not found

Remove deprecated symbols

Since version 0.4, %:>% and %|>% are deprecated. To make the transition smooth, these operators are marked deprecated in documentation and give warnings when used.

In some future version, they will be eventually removed.

Add environment-based pipe object

Pipe <- function(value = NULL) {
  push <- function(fun,...) {
    fun <- match.fun(fun)
    Pipe(fun(value,...))
  }
  eval <- function(expr) {
    Pipe(base::eval(substitute(expr),list(.=value),sys.call()))
  }
  lambda <- function(x, expr) {
    Pipe(base::eval(substitute(expr),
      setnames(list(value),as.character.default(substitute(x))),
      sys.call()))
  }
  finish <- function() {
    invisible(value)
  }
  environment()
}

This allows the following code:

Pipe(sample(letters,6,replace = T))$
    push(paste,collaspe="")$
    push("==","rstats")$
    value

And benchmark test shows that

`%>%` <- magrittr::`%>%`
microbenchmark::microbenchmark(a={
  sample(letters,6,replace = T) %>%
    paste(collapse = "") %>%
    "=="("rstats")
},b={
  sample(letters,6,replace = T) %>>%
    paste(collapse = "") %>>%
    "=="("rstats")
},c={
  Pipe(sample(letters,6,replace = T))$
    push(paste,collaspe="")$
    push("==","rstats")$
    value
})
Unit: microseconds
 expr     min      lq  median      uq     max neval
    a 262.740 266.640 268.898 274.440 405.604   100
    b  22.169  23.401  25.043  28.327  87.854   100
    c  20.527  22.169  24.017  26.685  43.517   100

Syntax to support for expression enclosed by parentheses

# element extracting
x %>>% (m)                  # x[["m"]]

# expression evaluation
x %>>% (fun(.))             # fun(x)
x %>>% (p ~ fun(p))         # fun(x)

# evaluation + assignment
x %>>% (fun(.) ~ y)         # y <- fun(x)    #### NOT TO SUPPORT
x %>>% (m ~ fun(m) ~ y)     # y <- fun(x)    #### NOT TO SUPPORT

# side effect
x %>>% (~ y)                # y <- x
x %>>% (~ fun(.))           # fun(x); x

# side effect: assignment
x %>>% (~ y)                # y <- x
x %>>% (~ fun(.) ~ y)       # y <- fun(x); x
x %>>% (~ m ~ fun(m) ~ y)   # y <- fun(x); x

# question (also side effect)
x %>>% (? fun(.))           # print(fun(x)); x

Multiple symbols to represent piped object

In R, . can be directly used in the name of a symbol, but it can also represent other things. For example, . in formula y ~ . represents variables other than y in a data frame. The following expression can be unambiguously evaluated but may be ambiguous to read:

df <- data.frame(x=rnorm(100),y=rnorm(100),z=rnorm(100))
df %>>% lm(z~., data=.)

. in the formula works in correct manner but may look ambiguous. A solution is to allow other variables to represent the piped object too. If lambda expression does not come in, .. may be an easy one.

pipeR does not work well with qplot()

Here goes the example of problem:

library(ggplot2)
mtcars %>>% qplot(mpg, wt, data = .)
Don't know how to automatically pick scale for object of type data.frame. Defaulting to continuous
Error: Aesthetics must either be length one, or the same length as the dataProblems:.

However, "%>%" in "magrittr" works fine:

library(magrittr)
mtcars %>% qplot(mpg, wt, data = .)

Two mechanisms to determine the function to call using $ with Pipe

Consider the following example:

> library(pipeR)
> f <- function(x) cat("global f")
> Pipe(0)$f()
global f
> pf <- Pipe(0)$f
> pf()
global f
> local({f <- function(x) cat("local f"); pf()})
global f
> local({f <- function(x) cat("local f"); Pipe(0)$f()})
local f
> local({f <- function(x) cat("local f"); pf2 <<- Pipe(0)$f})
> pf2()
local f

Function is determined when $ is called, rather than when the function is called. This design avoids potential confusion if the local environment accidentally contains a function having the same name.

A more careful consideration should be done to determine which design is more reasonable: the static calling mechanism (which function to call is determined when $ is called), or the dynamic calling mechanism (which function to call is determined only when the function is being called).

Anonymous function not supported

ld <- rnorm(100) %>% abs %>% log %>% function(.) mean(.,trim = 0.1)

produces the following error:

Error in eval(expr, envir, enclos) : 
  invalid formal argument list for "function" 
3 eval(expr, envir, enclos) 
2 eval(call) at pipeR.R#23
1 rnorm(100) %>% abs %>% log %>% function(.) mean(., trim = 0.1) 

Add the feature of composing functions

Composing functions should be allowed.

logdiff <- log %>% diff
lgplot <- log %>% diff %>% plot(col="red")
lgplot2 <- (function(x) x^2) %>% log %>% diff %>>% plot(.,col="red")

Does not work with functions that misuse non-standard evaluation

> rnorm(100) %>>%
+   arima0(order = c(1,0,1)) %>>%
+   predict(5)
Error in data - as.matrix(xreg) %*% coefs[-(1L:narma)] : 
  non-numeric argument to binary operator

arima0() uses non-standard evaluation in its implementation but seems to misuse it while arima() does not have the problem.

> rnorm(100) %>>%
+   arima(order = c(1,0,1)) %>>%
+   predict(5)
$pred
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 0.04664216 0.04791444 0.04895654 0.04981012 0.05050928

$se
Time Series:
Start = 101 
End = 105 
Frequency = 1 
[1] 1.070690 1.072904 1.074386 1.075380 1.076045

Avoid inline substitute in %>>%

Inline substitute leads to messy text in the output resulted from call-using functions.

For example,

> 1:10 %>>% plot

will produce a plot in which the x label is 1:10. If the pipeline gets longer, the x label can be a mess.

Original function name is lost using Pipe $

The current implementation in Pipe $ operator directly uses the function body to build call, which results in verbose output if it includes the call information.

For example,

> Pipe(mtcars)$lm(formula = mpg ~ .)
$value : lm 
------

Call:
(function (formula, data, subset, weights, na.action, method = "qr", 
    model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, 
    contrasts = NULL, offset, ...) 
{
    ret.x <- x
    ret.y <- y
    cl <- match.call()
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data", "subset", "weights", "na.action", 
        "offset"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- quote(stats::model.frame)
    mf <- eval(mf, parent.frame())
    if (method == "model.frame") 
        return(mf)
    else if (method != "qr") 
        warning(gettextf("method = '%s' is not supported. Using 'qr'", 
            method), domain = NA)
    mt <- attr(mf, "terms")
    y <- model.response(mf, "numeric")
    w <- as.vector(model.weights(mf))
    if (!is.null(w) && !is.numeric(w)) 
        stop("'weights' must be a numeric vector")
    offset <- as.vector(model.offset(mf))
    if (!is.null(offset)) {
        if (length(offset) != NROW(y)) 
            stop(gettextf("number of offsets is %d, should equal %d (number of observations)", 
                length(offset), NROW(y)), domain = NA)
    }
    if (is.empty.model(mt)) {
        x <- NULL
        z <- list(coefficients = if (is.matrix(y)) matrix(, 0, 
            3) else numeric(), residuals = y, fitted.values = 0 * 
            y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w != 
            0) else if (is.matrix(y)) nrow(y) else length(y))
        if (!is.null(offset)) {
            z$fitted.values <- offset
            z$residuals <- y - offset
        }
    }
    else {
        x <- model.matrix(mt, mf, contrasts)
        z <- if (is.null(w)) 
            lm.fit(x, y, offset = offset, singular.ok = singular.ok, 
                ...)
        else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, 
            ...)
    }
    class(z) <- c(if (is.matrix(y)) "mlm", "lm")
    z$na.action <- attr(mf, "na.action")
    z$offset <- offset
    z$contrasts <- attr(x, "contrasts")
    z$xlevels <- .getXlevels(mt, mf)
    z$call <- cl
    z$terms <- mt
    if (model) 
        z$model <- mf
    if (ret.x) 
        z$x <- x
    if (ret.y) 
        z$y <- y
    if (!qr) 
        z$qr <- NULL
    z
})(formula = mpg ~ ., data = value)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt         qsec  
   12.30337     -0.11144      0.01334     -0.02148      0.78711     -3.71530      0.82104  
         vs           am         gear         carb  
    0.31776      2.52023      0.65541     -0.19942  

Naming

If you indeed think there is a need for different pipe implementations, I think you should choose names not conflicting with magrittr. There is already quite some usage of %>% around, and such naming conflict is a potential source of confusion and irritation. If the two should co-exist I think they should be easily differentiated...

Consider value piping for (name) expression

Consider modifying the rule as follows:

list(a=1,b=2) %>>% (a)

if a name is enclosed within () like (a), then it means to get element a from lhs. This allows the following usage:

mtcars %>>%
  (lm(mpg ~ wt + cyl, data = .)) %>>%
  (coefficients)
(Intercept)          wt         cyl 
  39.686261   -3.190972   -1.507795 

By contrast, {} is more clearly distinguished from () that {} only evaluates the inner expression with . as piped object, and no other functionality. Therefore

a <- 2
list(a=1) %>>% (a)    # 1
list(a=1) %>>% {a}    # 2

Remove naked function calling in free piping

The current version allows free pipe operator %>>% to pipe object to a naked function name, which makes valid to use both %>% and %>>% to work with naked function name.

However, this makes the two symbol interchangeable in this case, which should not be considered useful. To keep the features of each symbol simple and clear, this feature should be removed in the next version.

Not compatible with `...`

A reproducible example:

library(pipeR)

fun1 <- function(x,...) {
  plot(x,col="red",...)
}

rnorm(100) %>% fun1(type="l")

fun2 <- function(n,...) {
  rnorm(n) %>% fun1(...)
}

fun2(10,type="l")

produces

Error in eval(expr, envir, enclos) : '...' used in an incorrect context
Called from: rnorm(n) %>% fun1(...)

Consider to deprecate fun() inside Pipe object

Since .() is introduced to handle dot-piping, lambda-piping as well as element extraction, fun() looks more redundant even though it is clearer to only handle dot and lambda piping. And fun() may override external function with the same name, which may lead to potential ambiguity.

Consider to deprecate the use of fun() in a future version.

Consider syntax for assigning intermediate value to symbol

It's a common demand that an intermediate result be assigned to a symbol in the current environment (often global environment) for further use. This clearly is one type of side effect that the current environment is changed.

Currently, there's no easy syntax that supports the assignment operation but manually call assign() like

mtcars %>>%
  subset(mpg <= mean(mpg)) %>>%
  (~ assign("x", ., envir = .GlobalEnv)) %>>%
  plot

The code works but it is only easy for global environment or some named environment. For local environment, it does not work with parent.frame().

Consider a syntax that derives from side-effect syntax that performs assignment operation like this.

Consider value piping for Pipe

Consider supporting the following piping mechanism for Pipe(x)$y

  1. Look for any y in Pipe object environment ($value, $fun)
  2. If x in Pipe is S4, get slot y from x.
  3. If x in Pipe is list or vector, get element named y from x.
  4. Otherwise, look for function y in parent.frame()

This allows the following usage:

> Pipe(mtcars)$fun(lm(mpg~.,data=.))$summary()$fstatistic
$value : numeric 
------
   value    numdf    dendf 
13.93246 10.00000 21.00000 

Incorrect order in questionmark output

mtcars %>>% 
  subset(vs == 1, c(mpg, cyl, wt)) %>>%
  (? nrow(.)) %>>%
  (? data ~ ncol(data)) %>>%
  summary
? data ~ ncol(data)
? nrow(.)
[1] 14
[1] 3
      mpg             cyl              wt       
 Min.   :17.80   Min.   :4.000   Min.   :1.513  
 1st Qu.:21.40   1st Qu.:4.000   1st Qu.:2.001  
 Median :22.80   Median :4.000   Median :2.623  
 Mean   :24.56   Mean   :4.571   Mean   :2.611  
 3rd Qu.:29.62   3rd Qu.:5.500   3rd Qu.:3.209  
 Max.   :33.90   Max.   :6.000   Max.   :3.460  

Find better way to present the rules

Since all rules are carefully designed and all features carefully introduced, we need better way to present the rules and features in better way that is more natural and intuitive.

Option to not print Pipe

I am really enjoying this package and the Pipe function. One thing I noticed is that whenever I use Pipe, it explicitly prints the word Pipe in my R console. While this is a useful message, I want to be able to suppress it. Is that possible currently, or can an option be added?

Add the syntax only for side effect

Consider the following syntax:

x %>>% (~ expr)         # evaluate expr with . = x and return x
x %>>% ((m) ~ expr)     # evaluate expr with m = x and return x
mtcars %>>%
  (~ cat("Number of columns:",ncol(.),"\n")) %>>%
  (mpg) %>>%
  summary
Number of columns: 11 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 

or

mtcars %>>%
  ((x) ~ cat("Number of columns:",ncol(x),"\n")) %>>%
  (mpg) %>>%
  summary
Number of columns: 11 
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 

where (~ expr) or ((x) ~ expr) indicates that the output of this will be ignored and the input will be returned, thus only for side effect (only one side is stressed in the formula, also looks like expr is evaluated as a side branch)

Note that all syntax in () automatically applies to .() in Pipe, therefore,

Pipe(mtcars)$
  .(~ cat("Number of columns:",ncol(.),"\n"))$
  .(mpg)$
  summary()
Number of columns: 11 
$value : summaryDefault table 
------
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.42   19.20   20.09   22.80   33.90 

Add syntax for questioning

Consider the following syntax:

1:10 %>>% (? expr)

which prints the intermediate expression and returns x. It acts just like asking for the value of the lambda expression expr and continues piping with input x.

A demo:

> set.seed(100)
> rnorm(100) %>>%
+   (? mean(.)) %>>%
+   (? median(.)) %>>%
+   (? summary(.)) %>>%
+   plot(col="red")
? mean(.)
[1] 0.002912563
? median(.)
[1] -0.0594199
? summary(.)
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
-2.272000 -0.608800 -0.059420  0.002913  0.655900  2.582000 

Cannot perform function assignment with assignment operator

The following code works:

> set.seed(0)
> numbers <- 1:5
> letters %>>%
+   sample(length(numbers)) %>>%
+   (~ . -> names(numbers))
[1] "x" "g" "i" "n" "t"
> numbers
x g i n t 
1 2 3 4 5 

But the following does not work.

> letters %>>%
+   sample(length(numbers)) %>>%
+   (~ names(numbers) <- .)
Error in expr[[1L]] : object of type 'symbol' is not subsettable
> letters %>>%
+   sample(length(numbers)) %>>%
+   (names(numbers) <- .)
Error in expr[[1L]] : object of type 'symbol' is not subsettable
> letters %>>%
+   sample(length(numbers)) %>>%
+   (names(numbers) = .)
Error in expr[[1L]] : object of type 'symbol' is not subsettable

Performance

This issue addresses the performance of the operators defined in this package compared with both traditional approach and implementation in magrittr package.

library(microbenchmark)
library(pipeR)
library(magrittr)
make <- function(first,then,op,level) {
  levels <- paste(rep(then,level),collapse = op)
  code <- paste(first,levels,sep = op)
  parse(text=code)
}
test <- function(n,level,times) {
  pipeR1 <- make("rnorm(n)","c(rnorm(n))","%>>%",level)
  pipeR2 <- make("rnorm(n)","c(.,rnorm(n))","%:>%",level)
  pipeR3 <- make("rnorm(n)","(x~c(x,rnorm(n)))","%|>%",level)
  magrittr1 <- make("rnorm(n)","c(rnorm(n))","%>%",level)
  magrittr2 <- make("rnorm(n)","c(.,rnorm(n))","%>%",level)
  magrittr3 <- make("rnorm(n)","l(x -> c(x,rnorm(n)))","%>%",level)
  microbenchmark(null={
    x <- rnorm(n)
    for(i in 1:level) {
      x <- c(x,rnorm(n))
    }
    x
  },pipeR1=eval(pipeR1),
    pipeR2=eval(pipeR2),
    pipeR3=eval(pipeR3),
    magrittr1=eval(magrittr1),
    magrittr2=eval(magrittr2),
    magrittr3=eval(magrittr3),
    times=times)
}

Some results are listed below.

Add documentation for the lazy evaluation feature of Pipe

Pipe is lazily evaluated.

In ordinary cases where the Pipe object is directly printed out, the whole chain is evaluated immediately.

> Pipe(rnorm(100))$plot(col="red")

If the Pipe object is assigned to a symbol, for example,

> p <- Pipe(rnorm(100))$plot(col="red")

the chain of commands is not evaluated until p is printed or explicit evaluated like

> p
... some plot is produced ...

This lazy-evaluation feature of Pipe allows continuation of piping without evaluation. Consider working with ggvis.

> p <- Pipe(mtcars)$ggvis(~ mpg, ~ wt)
> p$layer_points()
$value : ggvis 
... a scatter plot is produced ...
> p$layer_bars()
$value : ggvis 
... a bar plot is produced ...

I(x) evaluates x first and pass x to parentheses evaluation

I(x) provides a mechanism for meta piping, that is, evaluate the expression in I() first and put the result to ... in x %>>% (...).

> formula1 <- x ~ x + 2
> formula2 <- x ~ x * 2
> 1:10 %>>% I(if(mean(.) >= 5) formula1 else formula2)
 [1]  3  4  5  6  7  8  9 10 11 12

Therefore, the current implementation of I() is not enough. Use pipe_dot() in I().

Check how pipeR works with debugging facilities

Currently, the side-effect syntax can be used for debugging with browser().

Suppose the original code:

mtcars %>>%
  subset(mpg <= quantile(mpg, 0.95)) %>>%
  lm(formula = mpg ~ .) %>>%
  summary %>>%
  (r.squared)

To debug this code, just insert (~ browser()) to the pipeline after the line one needs to debug. At the browser environment, one only needs to type . to see the input of that line. For example,

mtcars %>>%
  subset(mpg <= quantile(mpg, 0.95)) %>>%
  lm(formula = mpg ~ .) %>>%
  (~ browser()) %>>%
  summary %>>%
  (r.squared)

The debugging looks like

Browse[1]> .

Call:
lm(formula = mpg ~ ., data = .)

Coefficients:
(Intercept)          cyl         disp           hp         drat           wt  
   29.13001     -0.84475      0.01751     -0.02851      0.26103     -3.70579  
       qsec           vs           am         gear         carb  
    0.15113      0.15242     -0.19900      1.05332      0.10792  

Browse[1]> Q
> 

One can also specify parameters of browser() to perform conditional browsing. For example,

mtcars %>>%
  subset(mpg <= quantile(mpg, 0.95)) %>>%
  lm(formula = mpg ~ wt + cyl) %>>%
  (~ browser(expr = summary(.)$r.squared >= 0.9)) %>>%
  plot()

To help identify the browser location, print something ahead.

mtcars %>>%
  subset(mpg <= quantile(mpg, 0.95)) %>>%
  lm(formula = mpg ~ wt + cyl) %>>%
  (~ print("debugging")) %>>%
  (~ browser()) %>>%
  plot()

or

mtcars %>>%
  subset(mpg <= quantile(mpg, 0.95)) %>>%
  lm(formula = mpg ~ wt + cyl) %>>%
  (? ~ "debugging") %>>% # print expression only
  (~ browser()) %>>%
  plot()

Define I() to support symbolic input

Consider the following code:

> p <- x ~ x + 1
> 1:10 %>>% I(p)
 [1]  2  3  4  5  6  7  8  9 10 11

I(p) here indicates p should be evaluated first to get the formula and then evaluate it within (...).

For Pipe(), define I() within the environment.

> Pipe(1:10)$I(p)
$value : numeric 
------
 [1]  2  3  4  5  6  7  8  9 10 11
> q <- quote(. + 1)
> Pipe(1:10)$I(q)
$value : numeric 
------
 [1]  2  3  4  5  6  7  8  9 10 11
> m <- quote(a)
> Pipe(list(a=1,b=2))$I(m)
$value : numeric 
------
[1] 1

Incompatible with switch statement

centre <- function(x,type) {
  type %>%
    switch(mean=mean(x),median=median(x))
}
> centre(rnorm(100),"mean")
# Error in mean(x) : object 'x' not found

When the %> operator is eliminated, the function works correctly. The same thing also happens to %>>%. It can be some environment parenting issue.

Consider to deprecate lambda expression in forms of "x -> expr"

Lambda expression in forms of x -> expr is a legacy of previous versions, which is conflicted with assignment operations. For example,

z <- new.env()
z %>>% (.$a <- 1) # does not work

Consider to deprecate this form of lambda expression and only support x ~ expr which has more features like side-effect-only piping as suggested in #30.

Does not work with update()

Working example:

fit1 <- lm(mpg~.,data=mtcars)
update(fit1,mpg~cyl)

Does not work with %>>%

library(pipeR)
fit2 <- mtcars %>>% lm(mpg~.,data=.)
update(fit2,mpg~cyl)

But works with

library(pipeR)
fit2 <- mtcars %>>% lm(mpg~.,data=mtcars)
update(fit2,mpg~cyl)

Easy lambda expression

A more F#-like pipeline operator %|>% may be defined so that the following code can be allowed:

rnorm(100) %|>% (x -> plot(x))

In the code above, %|>% interprets -> as the connector between a customized symbol x and the expression it goes to. -> does not mean assign here. Is it a good idea? One issue is that it may not be compatible with formatR packages that, by default, will put all -> to <-, which in this case makes the code harder to read.

value vs magrittr?

Out of curiosity, are you familiar with the magrittr package. Considering it has an already robust implementation for piping and is incorporated into Hadley's dplyr and ggvis packages it seems that your dev effort could be better off rollingyour additional ideas into that package?see here for a link to the package.

You could also take a look how he handled lambdas and aliases.

Operator should also work with Pipe object

Currently, %>>% does not seamlessly work with Pipe object (directly with its inner value). Consider to make them compatible with each other so that they can co-work like

> Pipe(1:3) %>>% c(4)
$value : numeric 
------
[1] 1 2 3 4
> Pipe(rnorm(10)) %>>% summary()
$value : summaryDefault table 
------
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
-2.0540 -0.9229  0.1231 -0.2102  0.5831  0.9954 

This will allow the resulted Pipe persist the feature of command chaining with $.

> z <- Pipe(1:5) %>>% (.^2)
> z
$value : numeric 
------
[1]  1  4  9 16 25
> z$mean()
$value : numeric 
------
[1] 11

However, it is not recommended to use both piping in one pipeline, and benchmark tests show that this feature may notably lower the performance of operator by up to 40% but the performance loss is insignificant in practical use for data manipulation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.