2025-01-12 04:36:52 +08:00

287 lines
8.1 KiB
Plaintext

---
title: "Introduction to glue"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to glue}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: sentence
---
```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```
The glue package contains functions for string interpolation: gluing together character strings and R code.
```{r}
library(glue)
```
## Gluing and interpolating
`glue()` can be used to glue together pieces of text:
```{r}
glue("glue ", "some ", "text ", "together")
```
But glue's real power comes with `{}`: anything inside of `{}` is evaluated and pasted into the string.
This makes it easy to interpolate variables:
```{r}
name <- "glue"
glue("We are learning how to use the {name} R package.")
```
As well as more complex expressions:
```{r}
release_date <- as.Date("2017-06-13")
glue("Release was on a {format(release_date, '%A')}.")
```
## Control of line breaks
`glue()` honors the line breaks in its input:
```{r}
glue("
A formatted string
Can have multiple lines
with additional indention preserved
"
)
```
The example above demonstrates some other important facts about the pre-processing of the template string:
- An empty first or last line is automatically trimmed.
- Leading whitespace that is common across all lines is trimmed.
The elimination of common leading whitespace is advantageous, because you aren't forced to choose between indenting your code normally and getting the output you actually want.
This is easier to appreciate when you have `glue()` inside a function body (this example also shows an alternative way of styling the end of a `glue()` call):
```{r}
foo <- function() {
glue("
A formatted string
Can have multiple lines
with additional indention preserved")
}
foo()
```
On the other hand, what if you don't want a line break in the output, but you also like to limit the length of lines in your source code to, e.g., 80 characters?
The first option is to use `\\` to break the template string into multiple lines, without getting line breaks in the output:
```{r}
release_date <- as.Date("2017-06-13")
glue("
The first version of the glue package was released on \\
a {format(release_date, '%A')}.")
```
This comes up fairly often when an expression to evaluate inside `{}` takes up more characters than its result, i.e. `format(release_date, '%A')` versus `Tuesday`.
A second way to achieve the same result is to break the template into individual pieces, which are then concatenated.
```{r}
glue(
"The first version of the glue package was released on ",
"a {format(release_date, '%A')}."
)
```
If you want an explicit newline at the start or end, include an extra empty line.
```{r}
# no leading or trailing newline
x <- glue("
blah
")
unclass(x)
# both a leading and trailing newline
y <- glue("
blah
")
unclass(y)
```
We use `unclass()` above to make it easier to see the absence and presence of the newlines, i.e. to reveal the literal `\n` escape sequences.
`glue()` and friends generally return a glue object, which is a character vector with the S3 class `"glue"`.
The `"glue"` class exists primarily for the sake of a print method, which displays the natural formatted result of a glue string.
Most of the time this is *exactly* what the user wants to see.
The example above happens to be an exception, where we really do want to see the underlying string representation.
Here's another example to drive home the difference between printing a glue object and looking at its string representation.
`as.character()` is a another way to do this that is arguably more expressive.
```{r}
x <- glue('
abc
" }
xyz')
class(x)
x
unclass(x)
as.character(x)
```
## Delimiters
By default, code to be evaluated goes inside `{}` in a glue string.
If want a literal curly brace in your string, double it:
```{r}
glue("The name of the package is {name}, not {{name}}.")
```
Sometimes it's just more convenient to use different delimiters altogether, especially if the template text comes from elsewhere or is subject to external requirements.
Consider this example where we want to interpolate the function name into a code snippet that defines a function:
```{r}
fn_def <- "
<<NAME>> <- function(x) {
# imagine a function body here
}"
glue(fn_def, NAME = "my_function", .open = "<<", .close = ">>")
```
In this glue string, `{` and `}` have very special meaning.
If we forced ourselves to double them, suddenly it doesn't look like normal R code anymore.
Using alternative delimiters is a nice option in cases like this.
## Where glue looks for values
By default, `glue()` evaluates the code inside `{}` in the caller environment:
```{r, eval = FALSE}
glue(..., .envir = parent.frame())
```
So, for a top-level `glue()` call, that means the global environment.
```{r}
x <- "the caller environment"
glue("By default, `glue()` evaluates code in {x}.")
```
But you can provide more narrowly scoped values by passing them to `glue()` in `name = value` form:
```{r}
x <- "the local environment"
glue(
"`glue()` can access values from {x} or from {y}. {z}",
y = "named arguments",
z = "Woo!"
)
```
If the relevant data lives in a data frame (or list or environment), use `glue_data()` instead:
```{r}
mini_mtcars <- head(cbind(model = rownames(mtcars), mtcars))
rownames(mini_mtcars) <- NULL
glue_data(mini_mtcars, "{model} has {hp} hp.")
```
`glue_data()` is very natural to use with the pipe:
```{r, eval = getRversion() >= "4.1.0"}
mini_mtcars |>
glue_data("{model} gets {mpg} miles per gallon.")
```
Returning to `glue()`, recall that it defaults to evaluation in the caller environment.
This has happy implications inside a `dplyr::mutate()` pipeline.
The data-masking feature of `mutate()` means the columns of the target data frame are "in scope" for a `glue()` call:
```r
library(dplyr)
mini_mtcars |>
mutate(note = glue("{model} gets {mpg} miles per gallon.")) |>
select(note, cyl, disp)
#> note cyl disp
#> 1 Mazda RX4 gets 21 miles per gallon. 6 160
#> 2 Mazda RX4 Wag gets 21 miles per gallon. 6 160
#> 3 Datsun 710 gets 22.8 miles per gallon. 4 108
#> 4 Hornet 4 Drive gets 21.4 miles per gallon. 6 258
#> 5 Hornet Sportabout gets 18.7 miles per gallon. 8 360
#> 6 Valiant gets 18.1 miles per gallon. 6 225
```
## SQL
glue has explicit support for constructing SQL statements.
Use backticks to quote identifiers.
Normal strings and numbers are quoted appropriately for your backend.
```{r}
con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
colnames(iris) <- gsub("[.]", "_", tolower(colnames(iris)))
DBI::dbWriteTable(con, "iris", iris)
var <- "sepal_width"
tbl <- "iris"
num <- 2
val <- "setosa"
glue_sql("
SELECT {`var`}
FROM {`tbl`}
WHERE {`tbl`}.sepal_length > {num}
AND {`tbl`}.species = {val}
", .con = con)
```
`glue_sql()` can be used in conjunction with parameterized queries using `DBI::dbBind()` to provide protection for SQL Injection attacks.
```{r}
sql <- glue_sql("
SELECT {`var`}
FROM {`tbl`}
WHERE {`tbl`}.sepal_length > ?
", .con = con)
query <- DBI::dbSendQuery(con, sql)
DBI::dbBind(query, list(num))
DBI::dbFetch(query, n = 4)
DBI::dbClearResult(query)
```
`glue_sql()` can be used to build up more complex queries with interchangeable sub queries.
It returns `DBI::SQL()` objects which are properly protected from quoting.
```{r}
sub_query <- glue_sql("
SELECT *
FROM {`tbl`}
", .con = con)
glue_sql("
SELECT s.{`var`}
FROM ({sub_query}) AS s
", .con = con)
```
If you want to input multiple values for use in SQL IN statements put `*` at the end of the value and the values will be collapsed and quoted appropriately.
```{r}
glue_sql("SELECT * FROM {`tbl`} WHERE sepal_length IN ({vals*})",
vals = 1, .con = con)
glue_sql("SELECT * FROM {`tbl`} WHERE sepal_length IN ({vals*})",
vals = 1:5, .con = con)
glue_sql("SELECT * FROM {`tbl`} WHERE species IN ({vals*})",
vals = "setosa", .con = con)
glue_sql("SELECT * FROM {`tbl`} WHERE species IN ({vals*})",
vals = c("setosa", "versicolor"), .con = con)
```