2025-01-12 04:36:52 +08:00

211 lines
7.8 KiB
Plaintext

---
title: "Introducing magrittr"
author: Stefan Milton Bache
date: November, 2014
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introducing magrittr}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
---
```{r include = FALSE}
library(magrittr)
options(scipen = 3)
knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
```
# Abstract
The *magrittr* (to be pronounced with a sophisticated french accent) package has
two aims: decrease development time and improve readability and maintainability
of code. Or even shortr: make your code smokin' (puff puff)!
To achieve its humble aims, *magrittr* (remember the accent) provides a new
"pipe"-like operator, `%>%`, with which you may pipe a value forward into an
expression or function call; something along the lines of ` x %>% f `, rather
than ` f(x)`. This is not an unknown feature elsewhere; a prime example is the
`|>` operator used extensively in `F#` (to say the least) and indeed this
-- along with Unix pipes -- served as a motivation for developing the magrittr
package.
This vignette describes the main features of *magrittr* and demonstrates
some features which have been added since the initial release.
# Introduction and basics
At first encounter, you may wonder whether an operator such as `%>%` can really
be all that beneficial; but as you may notice, it semantically changes your
code in a way that makes it more intuitive to both read and write.
Consider the following example, in which the `mtcars` dataset shipped with
R is munged a little:
```{r}
library(magrittr)
car_data <-
mtcars %>%
subset(hp > 100) %>%
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2)) %>%
transform(kpl = mpg %>% multiply_by(0.4251)) %>%
print
```
We start with a value, here `mtcars` (a `data.frame`). From there, we extract a
subset, aggregate the information based on the number of cylinders, and then
transform the dataset by adding a variable for kilometers per liter as a
supplement to miles per gallon. Finally we print the result before assigning
it. Note how the code is arranged in the logical order of how you think about
the task: data->transform->aggregate, which is also the same order as the code
will execute. It's like a recipe -- easy to read, easy to follow!
A horrific alternative would be to write:
```{r}
car_data <-
transform(aggregate(. ~ cyl,
data = subset(mtcars, hp > 100),
FUN = function(x) round(mean(x), 2)),
kpl = mpg*0.4251)
```
There is a lot more clutter with parentheses, and the mental task of deciphering
the code is more challenging---particularly if you did not write it yourself.
Note also how "building" a function on the fly for use in `aggregate` is very
simple in *magrittr*: rather than an actual value as the left-hand side in
the pipeline, just use the placeholder. This is also very useful in R's
`*apply` family of functions.
Granted, you may make the second example better, perhaps throw in a few
temporary variables (which is often avoided to some degree when using
*magrittr*), but one often sees cluttered lines like the ones presented.
And here is another selling point: suppose I want to quickly add another step
somewhere in the process. This is very easy to do in the pipeline version, but
a little more challenging in the "standard" example.
The combined example shows a few neat features of the pipe (which it is not):
1. By default the left-hand side (LHS) will be *piped in* as the first argument of
the function appearing on the right-hand side (RHS). This is the case in the
`subset` and `transform` expressions.
2. `%>%` may be used in a nested fashion, e.g. it may appear in expressions within
arguments. This is illustrated in the `mpg` to `kpl` conversion.
3. When the LHS is needed at a position other than the first, one can use
the dot,`'.'`, as placeholder. This is shown in the `aggregate` expression.
4. The dot in e.g. a formula is *not* confused with a placeholder, which is
utilized in the `aggregate` expression.
5. Whenever only *one* argument (the LHS) is needed, one can omit the empty
parentheses. This is shown in the call to `print` (which also returns its
argument). Here, `LHS %>% print()`, or even `LHS %>% print(.)` would also work.
6. A pipeline with a dot (`.`) as the LHS will create a unary function. This is
used to define the aggregator function.
One feature, which was not demonstrated above is piping into *anonymous
functions*, or *lambdas*. This is possible using standard function definitions,
e.g.:
```{r, eval = FALSE}
car_data %>%
(function(x) {
if (nrow(x) > 2)
rbind(head(x, 1), tail(x, 1))
else x
})
```
However, *magrittr* also allows a short-hand notation:
```{r}
car_data %>%
{
if (nrow(.) > 0)
rbind(head(., 1), tail(., 1))
else .
}
```
Since all right-hand sides are really "body expressions" of unary functions,
this is only the natural extension of the simple right-hand side expressions.
Of course, longer and more complex functions can be made using this approach.
In the first example, the anonymous function is enclosed in parentheses.
Whenever you want to use a function- or call-generating statement as right-hand
side, parentheses are used to evaluate the right-hand side before piping takes
place.
Another, less useful example is:
```{r}
1:10 %>% (substitute(f(), list(f = sum)))
```
# Additional pipe operators
*magrittr* also provides three related pipe operators. These are not as common
as `%>%` but they become useful in special cases.
The "tee" pipe, `%T>%` works like `%>%`, except it returns the left-hand
side value, and not the result of the right-hand side operation. This is useful
when a step in a pipeline is used for its side-effect (printing, plotting,
logging, etc.). As an example (where the actual plot is omitted here):
```{r, fig.keep='none'}
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>% # plot usually does not return anything.
colSums
```
The "exposition" pipe, `%$%` exposes the names within the left-hand side
object to the right-hand side expression. Essentially, it is a short-hand for
using the `with` functions (and the same left-hand side objects are accepted).
This operator is handy when functions do not themselves have a data argument, as
for example `lm` and `aggregate` do. Here are a few examples as illustration:
```{r, eval = FALSE}
iris %>%
subset(Sepal.Length > mean(Sepal.Length)) %$%
cor(Sepal.Length, Sepal.Width)
data.frame(z = rnorm(100)) %$%
ts.plot(z)
```
Finally, the "assignment" pipe `%<>%` can be used as the first
pipe in a chain. The effect will be that the result of the pipeline is assigned
to the left-hand side object, rather than returning the result as usual. It is
essentially shorthand notation for expressions like `foo <- foo %>% bar %>% baz`,
which boils down to `foo %<>% bar %>% baz`. Another example is:
```{r, eval = FALSE}
iris$Sepal.Length %<>% sqrt
```
The `%<>%` can be used whenever `expr <- ...` makes sense, e.g.
* `x %<>% foo %>% bar`
* `x[1:10] %<>% foo %>% bar`
* `x$baz %<>% foo %>% bar`
# Aliases
In addition to the `%>%`-operator, *magrittr* provides some aliases for other
operators which make operations such as addition or multiplication fit well
into the *magrittr*-syntax. As an example, consider:
```{r}
rnorm(1000) %>%
multiply_by(5) %>%
add(5) %>%
{
cat("Mean:", mean(.),
"Variance:", var(.), "\n")
head(.)
}
```
which could be written in more compact form as:
```{r, results = 'hide'}
rnorm(100) %>% `*`(5) %>% `+`(5) %>%
{
cat("Mean:", mean(.), "Variance:", var(.), "\n")
head(.)
}
```
To see a list of the aliases, execute e.g. `?multiply_by`.
# Development
The *magrittr* package is also available in a development version at the
GitHub development page:
[github.com/tidyverse/magrittr](https://github.com/tidyverse/magrittr).