691 lines
33 KiB
Plaintext
Raw Normal View History

2025-01-12 00:52:51 +08:00
%% LyX 2.2.1 created this file. For more info, see http://www.lyx.org/.
%% Do not edit unless you really know what you are doing.
\documentclass{article}
\usepackage[sc]{mathpazo}
\renewcommand{\sfdefault}{lmss}
\renewcommand{\ttdefault}{lmtt}
\usepackage[T1]{fontenc}
\usepackage{geometry}
\geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}
\setcounter{secnumdepth}{2}
\setcounter{tocdepth}{2}
\usepackage{url}
\usepackage[authoryear]{natbib}
\usepackage[unicode=true,pdfusetitle,
bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2,
breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false]
{hyperref}
\hypersetup{
pdfstartview={XYZ null null 1}}
\usepackage{breakurl}
\makeatletter
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\providecommand{\LyX}{\texorpdfstring%
{L\kern-.1667em\lower.25em\hbox{Y}\kern-.125emX\@}
{LyX}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
\renewcommand{\textfraction}{0.05}
\renewcommand{\topfraction}{0.8}
\renewcommand{\bottomfraction}{0.8}
\renewcommand{\floatpagefraction}{0.75}
\usepackage[buttonsize=1em]{animate}
\makeatother
\begin{document}
<<setup, include=FALSE, cache=FALSE>>=
library(knitr)
## set global chunk options
opts_chunk$set(fig.path='figure/manual-', cache.path='cache/manual-', fig.align='center', fig.show='hold', par=TRUE)
## I use = but I can replace it with <-; set code/output width to be 68
options(formatR.arrow=TRUE, width=68, digits=4)
## tune details of base graphics (https://yihui.org/knitr/hooks/)
knit_hooks$set(par=function(before, options, envir){
if (before && options$fig.show!='none') par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3)
})
@
\title{knitr: A General-Purpose Tool for Dynamic Report Generation in R}
\author{Yihui Xie}
\maketitle
The original paradigm of literate programming was brought forward
mainly for software development, or specifically, to mix source code
(for computer) and documentation (for human) together. Early systems
include \href{http://www.literateprogramming.com/web.pdf}{WEB} and
\href{http://www.cs.tufts.edu/~nr/noweb/}{Noweb}; Sweave \citep{leisch2002}
was derived from the latter, but it is less focused on documenting
software, instead it is mainly used for reproducible data analysis
and generating statistical reports. The \textbf{knitr} package \citep{R-knitr}
is following the steps of Sweave. For this manual, I assume readers
have some background knowledge of Sweave to understand the technical
details; for a reference of available options, hooks and demos, see
the package homepage \url{https://yihui.org/knitr/}.
\section{Hello World}
A natural question is why to reinvent the wheel. The short answer
is that extending Sweave by hacking \textsf{SweaveDrivers.R} in the
\textbf{utils} package is a difficult job to me. Many features in
\textbf{knitr} come naturally as users would have expected. Figure
\ref{fig:cars-demo} is a simple demo of some features of \textbf{knitr}.
\begin{figure}
<<cars-demo,dev='tikz',fig.width=4,fig.height=2.8,out.width='.45\\textwidth',message=FALSE,cache=TRUE>>=
fit=lm(dist~speed,data=cars) # linear regression
par(mar=c(4, 4, 1, .1), mgp=c(2,1,0))
with(cars,plot(speed,dist,panel.last=abline(fit)))
text(10,100,'$Y = \\beta_0 + \\beta_1x + \\epsilon$')
library(ggplot2)
qplot(speed, dist, data=cars)+geom_smooth()
@
\caption{\label{fig:cars-demo}A simple demo of possible output in \textbf{knitr}:
(1) multiple plots per chunk; (2) no need to \emph{print()} objects
in \textbf{ggplot2}; (3) device size is $4\times2.8$ (inches) but
output size is adjusted to \texttt{.45\textbackslash{}textwidth} in
chunk options; (4) base graphics and \textbf{ggplot2} can sit side
by side; (5) use the \emph{tikz()} device in \textbf{tikzDevice} by
setting chunk option \texttt{dev='tikz'} (hence can write native \protect\LaTeX{}
expressions in R plots); (6) code highlighting.}
\end{figure}
I would have chosen to hide the R code if this were a real report,
but here I show the code just for the sake of demonstration. If we
type \emph{qplot()} in R, we get a plot, and the same thing happens
in \textbf{knitr}. If we draw two plots in the code, \textbf{knitr}
will show two plots and we do not need to tell it how many plots are
there in the code in advance. If we set \texttt{out.width='.49\textbackslash{}\textbackslash{}textwidth'}
in chunk options, we get it in the final output document. If we say
\texttt{fig.align='center'}, the plots are centered. That's it. Many
enhancements and new features will be introduced later. If you come
from the Sweave land, you can take a look at the page of transition
first: \url{https://yihui.org/knitr/demo/sweave/}.
\section{Design}
The flow of processing an input file is similar to Sweave, and two
major differences are that \textbf{knitr} provides more flexibility
to the users to customize the processing, and has many built-in options
such as the support to a wide range of graphics devices and cache.
Below is a brief description of the process:
\begin{enumerate}
\item \textbf{knitr} takes an input file and automatically determines an
appropriate set of \href{https://yihui.org/knitr/patterns/}{patterns}
to use if they are not provided in advance (e.g. \textsf{file.Rnw}
will use \texttt{knit\_patterns\$get('rnw')});
\item a set of output \href{https://yihui.org/knitr/hooks/}{hooks} will
also be set up automatically according to the filename extension (e.g.
use \LaTeX{} environments or HTML elements to wrap up R results);
\item the input file is read in and split into pieces consisting of R code
chunks and normal texts; the former will be executed one after the
other, and the latter may contain global chunk options or inline R
code;
\item for each chunk, the code is evaluated using the \textbf{evaluate}
package \citep{R-evaluate}, and the results may be filtered according
to chunk options (e.g. \texttt{echo=FALSE} will remove the R source
code)
\begin{enumerate}
\item if \texttt{cache=TRUE} for this chunk, \textbf{knitr} will first check
if there are previously cached results under the cache directory before
really evaluating the chunk; if cached results exist and this code
chunk has not been changed since last run (use MD5 sum to verify),
the cached results will be (lazy-) loaded, otherwise new cache will
be built; if a cached chunk depends on other chunks (see the \texttt{dependson}
\href{https://yihui.org/knitr/options/}{option}) and any one of these
chunks has changed, this chunk must be forcibly updated (old cache
will be purged)
\item there are six types of possible output from \textbf{evaluate}, and
their classes are \texttt{character} (normal text output), \texttt{source}
(source code), \texttt{warning}, \texttt{message}, \texttt{error}
and \texttt{recordedplot}; an internal S3 generic function \emph{wrap()}
is used to deal with different types of output, using output hooks
defined in the object \texttt{knit\_hooks}
\item note plots are recorded as R objects before they are really saved
to files, so graphics devices will not be opened unless plots have
really been produced in a chunk
\item a code chunk is evaluated in a separate empty environment with the
global environment as its parent, and all the objects in this environment
after the evaluation will be saved if \texttt{cache=TRUE}
\item chunk hooks can be run before and/or after a chunk
\end{enumerate}
\item for normal texts, \textbf{knitr} will find inline R code (e.g. in
\texttt{\textbackslash{}Sexpr\{\}}) and evaluate it; the output is
wrapped by the \texttt{inline} hook;
\end{enumerate}
The hooks play important roles in \textbf{knitr}: this package makes
almost everything accessible to the users. Consider the following
extremely simple example which may demonstrate this freedom:
<<simple-example>>=
1+1
@
There are two parts in the final output: the source code \texttt{1
+ 1} and the output \texttt{{[}1{]} 2}; the comment characters \texttt{\#\#}
are from the default chunk option \texttt{comment}. Users may define
a hook function for the source code like this to use the \texttt{lstlisting}
environment:
<<hook-source, eval=FALSE>>=
knit_hooks$set(source = function(x, options) {
paste('\\begin{lstlisting}\n', x, '\\end{lstlisting}\n', sep = '')
})
@
Similarly we can put other types of output into other environments.
There is no need to hack at \textsf{Sweave.sty} for \textbf{knitr}
and you can put the output in any environments. What is more, the
output hooks make \textbf{knitr} ready for other types of output,
and a typical one is HTML (there are built-in hooks). The website
has provided many examples demonstrating the flexibility of the output.
\section{Features}
The \textbf{knitr} package borrowed features such as tikz graphics
and cache from \textbf{pgfSweave} and \textbf{cacheSweave} respectively,
but the implementations are different. New features like code reference
from an external R script as well as output customization are also
introduced. The feature of hook functions in Sweave is re-implemented
and hooks have new usage now. There are several other small features
which are motivated from my everyday use of Sweave. For example, a
progress bar is provided when knitting a file so we roughly know how
long we still need to wait; output from inline R code (e.g. \texttt{\textbackslash{}Sexpr\{x{[}1{]}\}})
is automatically formatted in \TeX{} math notation (like \Sexpr{123456789})
if the result is numeric. You may check out a number of specific manuals
dedicated to specific features such as graphics in the website: \url{https://yihui.org/knitr/demo/}.
\subsection{Code Decoration}
The \textbf{highr} package \citep{R-highr} is used to highlight R
code, and the \textbf{formatR} package \citep{R-formatR} is used
to reformat R code (like \texttt{keep.source=FALSE} in Sweave but
will also try to retain comments). For \LaTeX{} output, the \textbf{framed}
package is used to decorate code chunks with a light gray background.
If this \LaTeX{} package is not found in the system, a version will
be copied directly from \textbf{knitr}. The prompt characters are
removed by default because they mangle the R source code in the output
and make it difficult to copy R code. The R output is masked in comments
by default based on the same rationale. It is easy to revert to the
output with prompts (set option \texttt{prompt=TRUE}), and you will
quickly realize the inconvenience to the readers if they want to copy
and run the code in the output document:
<<stupid-prompts, prompt=TRUE, comment=NA, highlight=FALSE>>=
x=rnorm(5)
x
var(x)
@
The example below shows the effect of \texttt{tidy=TRUE/FALSE}:
<<tidy-no, eval=FALSE, tidy=FALSE>>=
## option tidy=FALSE
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
@
<<tidy-yes, eval=FALSE, tidy=TRUE>>=
## option tidy=TRUE
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
@
Note \texttt{=} is replaced by \texttt{<-} because \texttt{options('formatR.arrow')}
was set to be \texttt{TRUE} in this document; see the documentation
of \emph{tidy.source()} in \textbf{formatR} for details.
Many highlighting themes can be used in \textbf{knitr}, which are
borrowed from the \textbf{highlight} package by \href{http://www.andre-simon.de/}{Andre Simon}\footnote{not the R package mentioned before; for a preview of these themes,
see \url{http://www.andre-simon.de/dokuwiki/doku.php?id=theme_examples}}; it is also possible to use themes from \url{http://www.eclipsecolorthemes.org/}
by providing a theme id to \textbf{knitr}\footnote{many thanks to \href{https://github.com/ramnathv}{Ramnath Vaidyanathan}
for the work on themes}. See \texttt{?knit\_theme} for details.
\subsection{Graphics}
Graphics is an important part of reports, and several enhancements
have been made in \textbf{knitr}. For example, grid graphics may not
need to be explicitly printed as long as the same code can produce
plots in R (in some cases, however, they have to be printed, e.g.
in a loop, because you have to do so in an R terminal).
\subsubsection{Graphical Devices}
Over a long time, a frequently requested feature for Sweave was the
support for other graphics devices, which has been implemented since
R 2.13.0. Instead of using logical options like \texttt{png} or \texttt{jpeg}
(this list can go on and on), \textbf{knitr} uses a single option
\texttt{dev} (like \texttt{grdevice} in Sweave) which has support
for more than 20 devices. For instance, \texttt{dev='png'} will use
the \emph{png()} device, and \texttt{dev='CairoJPEG'} uses the \emph{CairoJPEG()}
device in the \textbf{Cairo} package (it has to be installed first,
of course). If none of these devices is satisfactory, you can provide
the name of a customized device function, which must have been defined
before it is called.
\subsubsection{Plot Recording}
As mentioned before, all the plots in a code chunk are first recorded
as R objects and then ``replayed'' inside a graphical device to
generate plot files. The \textbf{evaluate} package will record plots
per \emph{expression} basis, in other words, the source code is split
into individual complete expressions and \textbf{evaluate} will examine
possible plot changes in snapshots after each single expression has
been evaluated. For example, the code below consists of three expressions,
out of which two are related to drawing plots, therefore \textbf{evaluate}
will produce two plots by default:
<<low-level-plots, fig.keep='all', dev='tikz', fig.width=2.5, fig.height=2.5, out.width='.3\\textwidth', cache=TRUE>>=
par(mar=c(3,3,.1,.1))
plot(1:10, ann=FALSE,las=1)
text(5,9,'mass $\\rightarrow$ energy\n$E=mc^2$')
@
This brings a significant difference with traditional tools in R for
dynamic report generation, since low-level plotting changes can also
be recorded. The option \texttt{fig.keep} controls which plots to
keep in the output; \texttt{fig.keep='all'} will keep low-level changes
as separate plots; by default (\texttt{fig.keep='high'}), \textbf{knitr}
will merge low-level plot changes into the previous high-level plot,
like most graphics devices do. This feature may be useful for teaching
R graphics step by step. Note, however, low-level plotting commands
in a single expression (a typical case is a loop) will not be recorded
accumulatively, but high-level plotting commands, regardless of where
they are, will always be recorded. For example, this chunk will only
produce 2 plots instead of 21 plots because there are 2 complete expressions:
<<low-plot-loop, eval=FALSE>>=
plot(0,0,type='n',ann=FALSE)
for(i in seq(0, 2*pi,length=20)) points(cos(i),sin(i))
@
But this will produce 20 plots as expected:
<<high-plot-loop, eval=FALSE>>=
for(i in seq(0, 2*pi,length=20)) {plot(cos(i),sin(i),xlim=c(-1,1),ylim=c(-1,1))}
@
As I showed in the beginning of this manual, it is straightforward
to let \textbf{knitr} keep all the plots in a chunk and insert them
into the output document, so we no longer need the \texttt{cat('\textbackslash{}\textbackslash{}includegraphics\{\}')}
trick.
We can discard all previous plots and keep the last one only by \texttt{fig.keep='last'},
or keep only the first plot by \texttt{fig.keep='first'}, or discard
all plots by \texttt{fig.keep='none'}.
\subsubsection{Plot Rearrangement}
The option \texttt{fig.show} can decide whether to hold all plots
while evaluating the code and ``flush'' all of them to the end of
a chunk (\texttt{fig.show='hold'}), or just insert them to the place
where they were created (by default \texttt{fig.show='asis'}). Here
is an example of \texttt{fig.show='asis'}:
<<fig-hold, fig.show='asis', dev='pdf', fig.width=6, fig.height=4, out.width='.35\\linewidth'>>=
contour(volcano) # contour lines
filled.contour(volcano) # fill contour plot with colors
@
Beside \texttt{hold} and \texttt{asis}, the option \texttt{fig.show}
can take a third value: \texttt{animate}, which makes it possible
to insert animations into the output document. In \LaTeX{}, the package
\textbf{animate} is used to put together image frames as an animation.
For animations to work, there must be more than one plot produced
in a chunk. The option \texttt{interval} controls the time interval
between animation frames; by default it is 1 second. Note you have
to add \texttt{\textbackslash{}usepackage\{animate\}} in the \LaTeX{}
preamble, because \textbf{knitr} will not add it automatically. Animations
in the PDF output can only be viewed in Adobe Reader.
As a simple demonstration, here is a \href{http://en.wikipedia.org/wiki/Mandelbrot_set}{Mandelbrot animation}
taken from the \textbf{animation} package \citep{R-animation}; note
the PNG device is used because PDF files are too large. You should
be able to see the animation immediately with Acrobat Reader since
it was set to play automatically:
<<animate-demo, fig.show='animate', dev='png', out.width='.45\\linewidth', interval=.5, aniopts='controls,loop,autoplay', cache=TRUE>>=
library(animation)
demo('Mandelbrot', echo = FALSE, package = 'animation')
@
\subsubsection{Plot Size}
The \texttt{fig.width} and \texttt{fig.height} options specify the
size of plots in the graphics device, and the real size in the output
document can be different (see \texttt{out.width} and \texttt{out.height}).
When there are multiple plots per chunk, it is possible to arrange
more than one plot per line in \LaTeX{} \textendash{} just specify
\texttt{out.width} to be less than half of the current line width,
e.g. \texttt{out.width='.49\textbackslash{}\textbackslash{}linewidth'}.
\subsubsection{The tikz Device}
Beside PDF, PNG and other traditional R graphical devices, \textbf{knitr}
has special support to tikz graphics via the \textbf{tikzDevice} package
\citep{R-tikzDevice}, which is similar to \textbf{pgfSweave}. If
we set the chunk option \texttt{dev='tikz'}, the \emph{tikz()} device
in \textbf{tikzDevice} will be used to save plots. Options \texttt{sanitize}
and \texttt{external} are related to the tikz device: see the documentation
of \emph{tikz()} for details. Note \texttt{external=TRUE} in \textbf{knitr}
has a different meaning with \textbf{pgfSweave} \textendash{} it means
\texttt{standAlone=TRUE} in \emph{tikz()}, and the tikz graphics output
will be compiled to PDF \emph{immediately} after it is created, so
the ``externalization'' does not depend on the \textbf{tikz} package;
to maintain consistency in (font) styles, \textbf{knitr} will read
the preamble of the input document and use it in the tikz device.
At the moment, I'm not sure if this is a faithful way to externalize
tikz graphics, but I have not seen any problems so far. The assumption
to make, however, is that you declare all the styles in the preamble;
\textbf{knitr} is agnostic of \emph{local} style changes in the body
of the document.
Below is an example taken from StackOverflow\footnote{\url{http://stackoverflow.com/q/8190087/559676}};
we usually have to write R code like this to obtain a math expression
$\mathrm{d}\mathbf{x}_{t}=\alpha[(\theta-\mathbf{x}_{t})\mathrm{d}t+4]\mathrm{d}B_{t}$
in R graphics:
<<math-expr-R, eval=FALSE>>=
qplot(1:10, 1:10) + opts(title = substitute(paste(d *
bolditalic(x)[italic(t)] == alpha * (theta - bolditalic(x)[italic(t)]) *
d * italic(t) + lambda * d * italic(B)[italic(t)]), list(lambda = 4)))
@
With the tikz device, it is both straightforward and more beautiful:
<<math-expr-tikz, dev='tikz', fig.width=5, fig.height=3, out.width='.55\\linewidth', cache=TRUE, message=FALSE>>=
library(ggplot2)
qplot(1:10, 1:10) +
labs(title = sprintf('$\\mathrm{d}\\mathbf{x}_{t} = \\alpha[(\\theta - \\mathbf{x}_{t})\\mathrm{d}t + %d]\\mathrm{d}B_{t}$', 4))
@
The advantage of tikz graphics is the consistency of styles\footnote{Users are encouraged to read the vignette of \textbf{tikzDevice},
which is the most beautiful vignette I have ever seen in R packages:
\url{http://cran.r-project.org/web/packages/tikzDevice/vignettes/tikzDevice.pdf}}, and one disadvantage is that \LaTeX{} may not be able to handle
too large tikz files (it can run out of memory). For example, an R
plot with tens of thousands of graphical elements may fail to compile
in \LaTeX{} if we use the tikz device. In such cases, we can switch
to the PDF or PNG device, or reconsider our decision on the type of
plots, e.g., a scatter plot with millions of points is usually difficult
to read, and a contour plot or a hexagon plot showing the 2D density
can be a better alternative (they are smaller in size).
The graphics manual contains more detailed information and you can
check it out in the \href{https://yihui.org/knitr/demo/graphics/}{website}.
\subsection{Cache}
The feature of cache is not a new idea \textendash{} both \textbf{cacheSweave}
and \textbf{weaver} have implemented it based on Sweave, with the
former using \textbf{filehash} and the latter using \textsf{.RData}
images; \textbf{cacheSweave} also supports lazy-loading of objects
based on \textbf{filehash}. The \textbf{knitr} package directly uses
internal base R functions to save (\emph{tools:::makeLazyLoadDB()})
and lazy-load objects (\emph{lazyLoad()}). These functions are either
undocumented or marked as internal, but as far as I understand, they
are the tools to implement lazy-loading for packages. The \textbf{cacheSweave}
vignette has clearly explained lazy-loading, and roughly speaking,
lazy-loading means an object will not be really loaded into memory
unless it is really used somewhere. This is very useful for cache;
sometimes we read a large object and cache it, then take a subset
for analysis and this subset is also cached; in the future, the initial
large object will not be loaded into R if our computation is only
based on the object of its subset.
The paths of cache files are determined by the chunk option \texttt{cache.path};
by default all cache files are created under a directory \textsf{cache}
relative to the current working directory, and if the option value
contains a directory (e.g. \texttt{cache.path='cache/abc-'}), cache
files will be stored under that directory (automatically created if
it does not exist). The cache is invalidated and purged on any changes
to the code chunk, including both the R code and chunk options\footnote{One exception is the \texttt{include} option, which is not cached
because \texttt{include=TRUE/FALSE} does not affect code evaluation;
meanwhile, the value \texttt{getOption('width')} is also cached, so
if you change this option, the cache will also be invalidated (this
option affects the width of text output)}; this means previous cache files of this chunk are removed (filenames
are identified by the chunk label). Unlike \textbf{pgfSweave}, cache
files will never accumulate since old cache files will always be removed
in \textbf{knitr}. Unlike \textbf{weaver} or \textbf{cacheSweave},
\textbf{knitr} will try to preserve these side-effects:
\begin{enumerate}
\item printed results: meaning that any output of a code chunk will be loaded
into the output document for a cached chunk, although it is not really
evaluated. The reason is \textbf{knitr} also cache the output of a
chunk as a character string. Note this means graphics output is also
cached since it is part of the output. It has been a pain for me for
a long time to have to lose output to gain cache;
\item loaded packages: after the evaluation of each cached chunk, the list
of packages used in the current R session is written to a file under
the cache path named \textsf{\_\_packages}; next time if a cached
chunk needs to be rebuilt, these packages will be loaded first. The
reasons for caching package names are, it can be slow to load some
packages, and a package might be loaded in a previous cached chunk
which is not available to the next cached chunk when only the latter
needs to be rebuilt. Note this only applies to cached chunks, and
for uncached chunks, you must always use \emph{library()} to load
packages explicitly;
\end{enumerate}
Although \textbf{knitr} tries to keep some side-effects, there are
still other types of side-effects like setting \emph{par()} or \emph{options()}
which are not cached. Users should be aware of these special cases,
and make sure to clearly separate the code which is not meant to be
cached to other chunks which are not cached, e.g., set all global
options in the first chunk of a document and do not cache that chunk.
Sometimes a cached chunk may need to use objects from other cached
chunks, which can bring a serious problem \textendash{} if objects
in previous chunks have changed, this chunk will not be aware of the
changes and will still use old cached results, unless there is a way
to detect such changes from other chunks. There is an option called
\texttt{dependson} in \textbf{cacheSweave} which does this job. We
can explicitly specify which other chunks this chunk depends on by
setting an option like \texttt{dependson='chunkA;chunkB'} or equivalently
\texttt{dependson=c('chunkA', 'chunkB')}. Each time the cache of a
chunk is rebuilt, all other chunks which depend on this chunk will
lose cache, hence their cache will be rebuilt as well.
Another way to specify the dependencies among chunks is to use the
chunk option \texttt{autodep} and the function \emph{dep\_auto()}.
This is an experimental feature borrowed from \textbf{weaver} which
frees us from setting chunk dependencies manually. The basic idea
is, if a latter chunk uses any objects created from a previous chunk,
the latter chunk is said to depend on the previous one. The function
\emph{findGlobals()} in the \textbf{codetools} package is used to
find out all global objects in a chunk, and according to its documentation,
the result is an approximation. Global objects roughly mean the ones
which are not created locally, e.g. in the expression \texttt{function()
\{y <- x\}}, \texttt{x} should be a global object, whereas \texttt{y}
is local. Meanwhile, we also need to save the list of objects created
in each cached chunk, so that we can compare them to the global objects
in latter chunks. For example, if chunk A created an object \texttt{x}
and chunk B uses this object, chunk B must depend on A, i.e. whenever
A changes, B must also be updated. When \texttt{autodep=TRUE}, \textbf{knitr}
will write out the names of objects created in a cached chunk as well
as those global objects in two files named \textsf{\_\_objects} and
\textsf{\_\_globals} respectively; later we can use the function \emph{dep\_auto()}
to analyze the object names to figure out the dependencies automatically.
See \url{https://yihui.org/knitr/demo/cache/} for examples.
Yet another way to specify dependencies is \emph{dep\_prev()}: this
is a conservative approach which sets the dependencies so that a cached
chunk will depend on all its previous chunks, i.e. whenever a previous
chunk is updated, all later chunks will be updated accordingly.
\subsection{Code Externalization}
It can be more convenient to write R code in a separate file, rather
than mixing it into a \LaTeX{} document; for example, we can run
R code successively in a pure R script from one chunk to the other
without jumping through other texts. Since I prefer using \LyX{}
to write reports, Sweave is even more inconvenient because I have
to recompile the whole document each time, even if I only want to
know the results of a single chunk. Therefore \textbf{knitr} introduced
the feature of code externalization to a separate R script. Currently
the setting is like this: the R script also uses chunk labels (marked
in the form \texttt{\#\# -{}-{}-{}- chunk-label} by default); if the
code chunk in the input document is empty, \textbf{knitr} will match
its label with the label in the R script to input external R code.
For example, suppose this is a code chunk labelled as \texttt{Q1}
in an R script named \textsf{homework1-xie.R} which is under the same
directory as the Rnw document:
<<ext-r-code, eval=FALSE>>=
## ---- Q1 ---------------------
gcd = function(m, n) {
while ((r <- m %% n) != 0) {
m = n; n = r
}
n
}
@
In the Rnw document, we can first read the script using the function
\emph{read\_chunk()}:
<<read-chunk, eval=FALSE>>=
read_chunk('homework1-xie.R')
@
This is usually done in an early chunk, and we can use the chunk \texttt{Q1}
later in the Rnw document:
<<use-ext-chunk, echo=FALSE, comment=NA>>=
cat('<<Q1, echo=TRUE, tidy=TRUE>>=','@',sep='\n')
@
Different documents can read the same R script, so the R code can
be reusable across different input documents.
\subsection{Evaluation of Chunk Options\label{subsec:conditional}}
By default \textbf{knitr} uses a new syntax to parse chunk options:
it treats them as function arguments instead of a text string to be
split to obtain option values. This gives the user much more power
than the old syntax; we can pass arbitrary R objects to chunk options
besides simple ones like \texttt{TRUE}/\texttt{FALSE}, numbers and
character strings. The page \url{https://yihui.org/knitr/demo/sweave/}
has given two examples to show the advantages of the new syntax. Here
we show yet another useful application.
Before \textbf{knitr} 0.3, there was a feature named ``conditional
evaluation''\footnote{request from \url{https://plus.google.com/u/0/116405544829727492615/posts/43WrRUffjzK}}.
The idea is, instead of setting chunk options \texttt{eval} and \texttt{echo}
to be \texttt{TRUE} or \texttt{FALSE} (constants), their values can
be controlled by global variables in the current R session. This enables
\textbf{knitr} to conditionally evaluate code chunks according to
variables. For example, here we assign \texttt{TRUE} to a variable
\texttt{dothis}:
<<cond-variable>>=
dothis=TRUE
@
In the next chunk, we set chunk options \texttt{eval=dothis} and \texttt{echo=!dothis},
both are valid R expressions since the variable \texttt{dothis} exists.
As we can see, the source code is hidden, but it was indeed evaluated:
<<cond-out1, eval=dothis, echo=!dothis>>=
print('you cannot see my source because !dothis is FALSE')
@
Then we set \texttt{eval=dothis} and \texttt{echo=dothis} for another
chunk:
<<cond-out2,eval=dothis,echo=dothis>>=
dothis
@
If we change the value of \texttt{dothis} to \texttt{FALSE}, neither
of the above chunks will be evaluated any more. Therefore we can control
many chunks with a single variable, and present results selectively.
This old feature requires \textbf{knitr} to treat \texttt{eval} and
\texttt{echo} specially, and we can easily see that it is no longer
necessary with the new syntax: \texttt{eval=dothis} will tell R to
find the variable \texttt{dothis} automatically just like we call
a function \texttt{foobar(eval = dothis)}. What is more, all options
will be evaluated as R expressions unless they are already constants
which do not need to be evaluated, so this old feature has been generalized
to all other options naturally.
\subsection{Customization}
The \textbf{knitr} package is ready for customization. Both the patterns
and hooks can be customized; see the package website for details.
Here I show an example on how to save \textbf{rgl} plots \citep{R-rgl}
using a customized hook function. First we define a hook named \texttt{rgl}
using the function \emph{hook\_rgl()} in \textbf{rgl}:
<<rgl-demo>>=
library(rgl)
knit_hooks$set(rgl = hook_rgl)
head(hook_rgl) # the hook function is defined as this
@
Then we only have to set the chunk option \texttt{rgl=TRUE}:
<<fancy-rgl, rgl=TRUE, dev='png', fig.width=5, fig.height=5, out.width='2in', message=FALSE, warning=FALSE, cache=TRUE>>=
library(rgl)
demo('bivar', package='rgl', echo=FALSE)
par3d(zoom=.7)
@
Due to the flexibility of output hooks, \textbf{knitr} supports several
different output formats. The implementation is fairly easy, e.g.,
for \LaTeX{} we put R output in \texttt{verbatim} environments, and
in HTML, it is only a matter of putting output in \texttt{div} layers.
These are simply character string operations. Many demos in \url{https://yihui.org/knitr/demo/}
show this idea clearly. This manual did not cover all the features
of \textbf{knitr}, and users are encouraged to thumb through the website
to know more possible features.
\section{Editors}
You can use any text editors to write the source documents, but some
have built-in support for \textbf{knitr}. Both RStudio (\url{http://www.rstudio.org})
and \LyX{} (\url{http://www.lyx.org}) have full support for \textbf{knitr},
and you can compile the document to PDF with just one click. See \url{https://yihui.org/knitr/demo/rstudio/}
and \url{https://yihui.org/knitr/demo/lyx/} respectively. It is also
possible to support other editors like \href{https://yihui.org/knitr/demo/eclipse/}{Eclipse},
\href{https://yihui.org/knitr/demo/editors/}{Texmaker and WinEdt};
see the demo list in the website for configuration instructions.
\section*{About This Document}
This manual was written in \LyX{} and compiled with \textbf{knitr}
(version \Sexpr{packageVersion('knitr')}). The \LyX{} source and
the Rnw document exported from \LyX{} can be found under these directories:
<<source-location, eval=FALSE>>=
system.file('examples', 'knitr-manual.lyx', package='knitr') # lyx source
system.file('examples', 'knitr-manual.Rnw', package='knitr') # Rnw source
@
You can use the function \emph{knit()} to knit the Rnw document (remember
to put the two \textsf{.bib} files under the same directory), and
you need to make sure all the R packages used in this document are
installed:
<<required-packages, eval=FALSE>>=
install.packages(c('animation', 'rgl', 'tikzDevice', 'ggplot2'))
@
Feedback and comments on this manual and the package are always welcome.
Bug reports and feature requests can be sent to \url{https://github.com/yihui/knitr/issues},
and questions can be delivered to the \href{mailto:knitr@googlegroups.com}{mailing list}
\url{https://groups.google.com/group/knitr}.
% when knitr is updated, this chunk will be updated; why?
<<auto-bib, version=packageVersion('knitr'), echo=FALSE, cache=TRUE, message=FALSE, warning=FALSE>>=
# write all packages in the current session to a bib file
write_bib(c(.packages(), 'evaluate', 'formatR', 'highr'), file = 'knitr-packages.bib')
@
\bibliographystyle{jss}
\bibliography{knitr-manual,knitr-packages}
\end{document}