213 lines
5.7 KiB
Plaintext
213 lines
5.7 KiB
Plaintext
|
---
|
||
|
title: "Lazyeval: a new approach to NSE"
|
||
|
date: "`r Sys.Date()`"
|
||
|
output: rmarkdown::html_vignette
|
||
|
vignette: >
|
||
|
%\VignetteIndexEntry{Lazyeval: a new approach to NSE}
|
||
|
%\VignetteEngine{knitr::rmarkdown}
|
||
|
%\usepackage[utf8]{inputenc}
|
||
|
---
|
||
|
|
||
|
```{r, echo = FALSE}
|
||
|
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
|
||
|
rownames(mtcars) <- NULL
|
||
|
```
|
||
|
|
||
|
This document outlines my previous approach to non-standard evaluation (NSE). You should avoid it unless you are working with an older version of dplyr or tidyr.
|
||
|
|
||
|
There are three key ideas:
|
||
|
|
||
|
* Instead of using `substitute()`, use `lazyeval::lazy()` to capture both expression
|
||
|
and environment. (Or use `lazyeval::lazy_dots(...)` to capture promises in `...`)
|
||
|
|
||
|
* Every function that uses NSE should have a standard evaluation (SE) escape
|
||
|
hatch that does the actual computation. The SE-function name should end with
|
||
|
`_`.
|
||
|
|
||
|
* The SE-function has a flexible input specification to make it easy for people
|
||
|
to program with.
|
||
|
|
||
|
## `lazy()`
|
||
|
|
||
|
The key tool that makes this approach possible is `lazy()`, an equivalent to `substitute()` that captures both expression and environment associated with a function argument:
|
||
|
|
||
|
```{r}
|
||
|
library(lazyeval)
|
||
|
f <- function(x = a - b) {
|
||
|
lazy(x)
|
||
|
}
|
||
|
f()
|
||
|
f(a + b)
|
||
|
```
|
||
|
|
||
|
As a complement to `eval()`, the lazy package provides `lazy_eval()` that uses the environment associated with the lazy object:
|
||
|
|
||
|
```{r}
|
||
|
a <- 10
|
||
|
b <- 1
|
||
|
lazy_eval(f())
|
||
|
lazy_eval(f(a + b))
|
||
|
```
|
||
|
|
||
|
The second argument to lazy eval is a list or data frame where names should be looked up first:
|
||
|
|
||
|
```{r}
|
||
|
lazy_eval(f(), list(a = 1))
|
||
|
```
|
||
|
|
||
|
`lazy_eval()` also works with formulas, since they contain the same information as a lazy object: an expression (only the RHS is used by convention) and an environment:
|
||
|
|
||
|
```{r}
|
||
|
lazy_eval(~ a + b)
|
||
|
h <- function(i) {
|
||
|
~ 10 + i
|
||
|
}
|
||
|
lazy_eval(h(1))
|
||
|
```
|
||
|
|
||
|
## Standard evaluation
|
||
|
|
||
|
Whenever we need a function that does non-standard evaluation, always write the standard evaluation version first. For example, let's implement our own version of `subset()`:
|
||
|
|
||
|
```{r}
|
||
|
subset2_ <- function(df, condition) {
|
||
|
r <- lazy_eval(condition, df)
|
||
|
r <- r & !is.na(r)
|
||
|
df[r, , drop = FALSE]
|
||
|
}
|
||
|
|
||
|
subset2_(mtcars, lazy(mpg > 31))
|
||
|
```
|
||
|
|
||
|
`lazy_eval()` will always coerce it's first argument into a lazy object, so a variety of specifications will work:
|
||
|
|
||
|
```{r}
|
||
|
subset2_(mtcars, ~mpg > 31)
|
||
|
subset2_(mtcars, quote(mpg > 31))
|
||
|
subset2_(mtcars, "mpg > 31")
|
||
|
```
|
||
|
|
||
|
Note that quoted called and strings don't have environments associated with them, so `as.lazy()` defaults to using `baseenv()`. This will work if the expression is self-contained (i.e. doesn't contain any references to variables in the local environment), and will otherwise fail quickly and robustly.
|
||
|
|
||
|
## Non-standard evaluation
|
||
|
|
||
|
With the SE version in hand, writing the NSE version is easy. We just use `lazy()` to capture the unevaluated expression and corresponding environment:
|
||
|
|
||
|
```{r}
|
||
|
subset2 <- function(df, condition) {
|
||
|
subset2_(df, lazy(condition))
|
||
|
}
|
||
|
subset2(mtcars, mpg > 31)
|
||
|
```
|
||
|
|
||
|
This standard evaluation escape hatch is very important because it allows us to implement different NSE approaches. For example, we could create a subsetting function that finds all rows where a variable is above a threshold:
|
||
|
|
||
|
```{r}
|
||
|
above_threshold <- function(df, var, threshold) {
|
||
|
cond <- interp(~ var > x, var = lazy(var), x = threshold)
|
||
|
subset2_(df, cond)
|
||
|
}
|
||
|
above_threshold(mtcars, mpg, 31)
|
||
|
```
|
||
|
|
||
|
Here we're using `interp()` to modify a formula. We use the value of `threshold` and the expression in by `var`.
|
||
|
|
||
|
## Scoping
|
||
|
|
||
|
Because `lazy()` captures the environment associated with the function argument, we automatically avoid a subtle scoping bug present in `subset()`:
|
||
|
|
||
|
```{r}
|
||
|
x <- 31
|
||
|
f1 <- function(...) {
|
||
|
x <- 30
|
||
|
subset(mtcars, ...)
|
||
|
}
|
||
|
# Uses 30 instead of 31
|
||
|
f1(mpg > x)
|
||
|
|
||
|
f2 <- function(...) {
|
||
|
x <- 30
|
||
|
subset2(mtcars, ...)
|
||
|
}
|
||
|
# Correctly uses 31
|
||
|
f2(mpg > x)
|
||
|
```
|
||
|
|
||
|
`lazy()` has another advantage over `substitute()` - by default, it follows promises across function invocations. This simplifies the casual use of NSE.
|
||
|
|
||
|
```{r, eval = FALSE}
|
||
|
x <- 31
|
||
|
g1 <- function(comp) {
|
||
|
x <- 30
|
||
|
subset(mtcars, comp)
|
||
|
}
|
||
|
g1(mpg > x)
|
||
|
#> Error: object 'mpg' not found
|
||
|
```
|
||
|
|
||
|
```{r}
|
||
|
g2 <- function(comp) {
|
||
|
x <- 30
|
||
|
subset2(mtcars, comp)
|
||
|
}
|
||
|
g2(mpg > x)
|
||
|
```
|
||
|
|
||
|
Note that `g2()` doesn't have a standard-evaluation escape hatch, so it's not suitable for programming with in the same way that `subset2_()` is.
|
||
|
|
||
|
## Chained promises
|
||
|
|
||
|
Take the following example:
|
||
|
|
||
|
```{r}
|
||
|
library(lazyeval)
|
||
|
f1 <- function(x) lazy(x)
|
||
|
g1 <- function(y) f1(y)
|
||
|
|
||
|
g1(a + b)
|
||
|
```
|
||
|
|
||
|
`lazy()` returns `a + b` because it always tries to find the top-level promise.
|
||
|
|
||
|
In this case the process looks like this:
|
||
|
|
||
|
1. Find the object that `x` is bound to.
|
||
|
2. It's a promise, so find the expr it's bound to (`y`, a symbol) and the
|
||
|
environment in which it should be evaluated (the environment of `g()`).
|
||
|
3. Since `x` is bound to a symbol, look up its value: it's bound to a promise.
|
||
|
4. That promise has expression `a + b` and should be evaluated in the global
|
||
|
environment.
|
||
|
5. The expression is not a symbol, so stop.
|
||
|
|
||
|
Occasionally, you want to avoid this recursive behaviour, so you can use `follow_symbol = FALSE`:
|
||
|
|
||
|
```{r}
|
||
|
f2 <- function(x) lazy(x, .follow_symbols = FALSE)
|
||
|
g2 <- function(y) f2(y)
|
||
|
|
||
|
g2(a + b)
|
||
|
```
|
||
|
|
||
|
Either way, if you evaluate the lazy expression you'll get the same result:
|
||
|
|
||
|
```{r}
|
||
|
a <- 10
|
||
|
b <- 1
|
||
|
|
||
|
lazy_eval(g1(a + b))
|
||
|
lazy_eval(g2(a + b))
|
||
|
```
|
||
|
|
||
|
Note that the resolution of chained promises only works with unevaluated objects. This is because R deletes the information about the environment associated with a promise when it has been forced, so that the garbage collector is allowed to remove the environment from memory in case it is no longer used. `lazy()` will fail with an error in such situations.
|
||
|
|
||
|
```{r, error = TRUE, purl = FALSE}
|
||
|
var <- 0
|
||
|
|
||
|
f3 <- function(x) {
|
||
|
force(x)
|
||
|
lazy(x)
|
||
|
}
|
||
|
|
||
|
f3(var)
|
||
|
```
|