691 lines
33 KiB
Plaintext
691 lines
33 KiB
Plaintext
%% LyX 2.2.1 created this file. For more info, see http://www.lyx.org/.
|
|
%% Do not edit unless you really know what you are doing.
|
|
\documentclass{article}
|
|
\usepackage[sc]{mathpazo}
|
|
\renewcommand{\sfdefault}{lmss}
|
|
\renewcommand{\ttdefault}{lmtt}
|
|
\usepackage[T1]{fontenc}
|
|
\usepackage{geometry}
|
|
\geometry{verbose,tmargin=2.5cm,bmargin=2.5cm,lmargin=2.5cm,rmargin=2.5cm}
|
|
\setcounter{secnumdepth}{2}
|
|
\setcounter{tocdepth}{2}
|
|
\usepackage{url}
|
|
\usepackage[authoryear]{natbib}
|
|
\usepackage[unicode=true,pdfusetitle,
|
|
bookmarks=true,bookmarksnumbered=true,bookmarksopen=true,bookmarksopenlevel=2,
|
|
breaklinks=false,pdfborder={0 0 1},backref=false,colorlinks=false]
|
|
{hyperref}
|
|
\hypersetup{
|
|
pdfstartview={XYZ null null 1}}
|
|
\usepackage{breakurl}
|
|
|
|
\makeatletter
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
|
|
\providecommand{\LyX}{\texorpdfstring%
|
|
{L\kern-.1667em\lower.25em\hbox{Y}\kern-.125emX\@}
|
|
{LyX}}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% User specified LaTeX commands.
|
|
\renewcommand{\textfraction}{0.05}
|
|
\renewcommand{\topfraction}{0.8}
|
|
\renewcommand{\bottomfraction}{0.8}
|
|
\renewcommand{\floatpagefraction}{0.75}
|
|
|
|
\usepackage[buttonsize=1em]{animate}
|
|
|
|
\makeatother
|
|
|
|
\begin{document}
|
|
<<setup, include=FALSE, cache=FALSE>>=
|
|
library(knitr)
|
|
## set global chunk options
|
|
opts_chunk$set(fig.path='figure/manual-', cache.path='cache/manual-', fig.align='center', fig.show='hold', par=TRUE)
|
|
## I use = but I can replace it with <-; set code/output width to be 68
|
|
options(formatR.arrow=TRUE, width=68, digits=4)
|
|
## tune details of base graphics (https://yihui.org/knitr/hooks/)
|
|
knit_hooks$set(par=function(before, options, envir){
|
|
if (before && options$fig.show!='none') par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3)
|
|
})
|
|
@
|
|
|
|
\title{knitr: A General-Purpose Tool for Dynamic Report Generation in R}
|
|
|
|
\author{Yihui Xie}
|
|
|
|
\maketitle
|
|
The original paradigm of literate programming was brought forward
|
|
mainly for software development, or specifically, to mix source code
|
|
(for computer) and documentation (for human) together. Early systems
|
|
include \href{http://www.literateprogramming.com/web.pdf}{WEB} and
|
|
\href{http://www.cs.tufts.edu/~nr/noweb/}{Noweb}; Sweave \citep{leisch2002}
|
|
was derived from the latter, but it is less focused on documenting
|
|
software, instead it is mainly used for reproducible data analysis
|
|
and generating statistical reports. The \textbf{knitr} package \citep{R-knitr}
|
|
is following the steps of Sweave. For this manual, I assume readers
|
|
have some background knowledge of Sweave to understand the technical
|
|
details; for a reference of available options, hooks and demos, see
|
|
the package homepage \url{https://yihui.org/knitr/}.
|
|
|
|
\section{Hello World}
|
|
|
|
A natural question is why to reinvent the wheel. The short answer
|
|
is that extending Sweave by hacking \textsf{SweaveDrivers.R} in the
|
|
\textbf{utils} package is a difficult job to me. Many features in
|
|
\textbf{knitr} come naturally as users would have expected. Figure
|
|
\ref{fig:cars-demo} is a simple demo of some features of \textbf{knitr}.
|
|
|
|
\begin{figure}
|
|
<<cars-demo,dev='tikz',fig.width=4,fig.height=2.8,out.width='.45\\textwidth',message=FALSE,cache=TRUE>>=
|
|
fit=lm(dist~speed,data=cars) # linear regression
|
|
par(mar=c(4, 4, 1, .1), mgp=c(2,1,0))
|
|
with(cars,plot(speed,dist,panel.last=abline(fit)))
|
|
text(10,100,'$Y = \\beta_0 + \\beta_1x + \\epsilon$')
|
|
library(ggplot2)
|
|
qplot(speed, dist, data=cars)+geom_smooth()
|
|
@
|
|
|
|
\caption{\label{fig:cars-demo}A simple demo of possible output in \textbf{knitr}:
|
|
(1) multiple plots per chunk; (2) no need to \emph{print()} objects
|
|
in \textbf{ggplot2}; (3) device size is $4\times2.8$ (inches) but
|
|
output size is adjusted to \texttt{.45\textbackslash{}textwidth} in
|
|
chunk options; (4) base graphics and \textbf{ggplot2} can sit side
|
|
by side; (5) use the \emph{tikz()} device in \textbf{tikzDevice} by
|
|
setting chunk option \texttt{dev='tikz'} (hence can write native \protect\LaTeX{}
|
|
expressions in R plots); (6) code highlighting.}
|
|
\end{figure}
|
|
|
|
I would have chosen to hide the R code if this were a real report,
|
|
but here I show the code just for the sake of demonstration. If we
|
|
type \emph{qplot()} in R, we get a plot, and the same thing happens
|
|
in \textbf{knitr}. If we draw two plots in the code, \textbf{knitr}
|
|
will show two plots and we do not need to tell it how many plots are
|
|
there in the code in advance. If we set \texttt{out.width='.49\textbackslash{}\textbackslash{}textwidth'}
|
|
in chunk options, we get it in the final output document. If we say
|
|
\texttt{fig.align='center'}, the plots are centered. That's it. Many
|
|
enhancements and new features will be introduced later. If you come
|
|
from the Sweave land, you can take a look at the page of transition
|
|
first: \url{https://yihui.org/knitr/demo/sweave/}.
|
|
|
|
\section{Design}
|
|
|
|
The flow of processing an input file is similar to Sweave, and two
|
|
major differences are that \textbf{knitr} provides more flexibility
|
|
to the users to customize the processing, and has many built-in options
|
|
such as the support to a wide range of graphics devices and cache.
|
|
Below is a brief description of the process:
|
|
\begin{enumerate}
|
|
\item \textbf{knitr} takes an input file and automatically determines an
|
|
appropriate set of \href{https://yihui.org/knitr/patterns/}{patterns}
|
|
to use if they are not provided in advance (e.g. \textsf{file.Rnw}
|
|
will use \texttt{knit\_patterns\$get('rnw')});
|
|
\item a set of output \href{https://yihui.org/knitr/hooks/}{hooks} will
|
|
also be set up automatically according to the filename extension (e.g.
|
|
use \LaTeX{} environments or HTML elements to wrap up R results);
|
|
\item the input file is read in and split into pieces consisting of R code
|
|
chunks and normal texts; the former will be executed one after the
|
|
other, and the latter may contain global chunk options or inline R
|
|
code;
|
|
\item for each chunk, the code is evaluated using the \textbf{evaluate}
|
|
package \citep{R-evaluate}, and the results may be filtered according
|
|
to chunk options (e.g. \texttt{echo=FALSE} will remove the R source
|
|
code)
|
|
|
|
\begin{enumerate}
|
|
\item if \texttt{cache=TRUE} for this chunk, \textbf{knitr} will first check
|
|
if there are previously cached results under the cache directory before
|
|
really evaluating the chunk; if cached results exist and this code
|
|
chunk has not been changed since last run (use MD5 sum to verify),
|
|
the cached results will be (lazy-) loaded, otherwise new cache will
|
|
be built; if a cached chunk depends on other chunks (see the \texttt{dependson}
|
|
\href{https://yihui.org/knitr/options/}{option}) and any one of these
|
|
chunks has changed, this chunk must be forcibly updated (old cache
|
|
will be purged)
|
|
\item there are six types of possible output from \textbf{evaluate}, and
|
|
their classes are \texttt{character} (normal text output), \texttt{source}
|
|
(source code), \texttt{warning}, \texttt{message}, \texttt{error}
|
|
and \texttt{recordedplot}; an internal S3 generic function \emph{wrap()}
|
|
is used to deal with different types of output, using output hooks
|
|
defined in the object \texttt{knit\_hooks}
|
|
\item note plots are recorded as R objects before they are really saved
|
|
to files, so graphics devices will not be opened unless plots have
|
|
really been produced in a chunk
|
|
\item a code chunk is evaluated in a separate empty environment with the
|
|
global environment as its parent, and all the objects in this environment
|
|
after the evaluation will be saved if \texttt{cache=TRUE}
|
|
\item chunk hooks can be run before and/or after a chunk
|
|
\end{enumerate}
|
|
\item for normal texts, \textbf{knitr} will find inline R code (e.g. in
|
|
\texttt{\textbackslash{}Sexpr\{\}}) and evaluate it; the output is
|
|
wrapped by the \texttt{inline} hook;
|
|
\end{enumerate}
|
|
The hooks play important roles in \textbf{knitr}: this package makes
|
|
almost everything accessible to the users. Consider the following
|
|
extremely simple example which may demonstrate this freedom:
|
|
|
|
<<simple-example>>=
|
|
1+1
|
|
@
|
|
|
|
There are two parts in the final output: the source code \texttt{1
|
|
+ 1} and the output \texttt{{[}1{]} 2}; the comment characters \texttt{\#\#}
|
|
are from the default chunk option \texttt{comment}. Users may define
|
|
a hook function for the source code like this to use the \texttt{lstlisting}
|
|
environment:
|
|
|
|
<<hook-source, eval=FALSE>>=
|
|
knit_hooks$set(source = function(x, options) {
|
|
paste('\\begin{lstlisting}\n', x, '\\end{lstlisting}\n', sep = '')
|
|
})
|
|
@
|
|
|
|
Similarly we can put other types of output into other environments.
|
|
There is no need to hack at \textsf{Sweave.sty} for \textbf{knitr}
|
|
and you can put the output in any environments. What is more, the
|
|
output hooks make \textbf{knitr} ready for other types of output,
|
|
and a typical one is HTML (there are built-in hooks). The website
|
|
has provided many examples demonstrating the flexibility of the output.
|
|
|
|
\section{Features}
|
|
|
|
The \textbf{knitr} package borrowed features such as tikz graphics
|
|
and cache from \textbf{pgfSweave} and \textbf{cacheSweave} respectively,
|
|
but the implementations are different. New features like code reference
|
|
from an external R script as well as output customization are also
|
|
introduced. The feature of hook functions in Sweave is re-implemented
|
|
and hooks have new usage now. There are several other small features
|
|
which are motivated from my everyday use of Sweave. For example, a
|
|
progress bar is provided when knitting a file so we roughly know how
|
|
long we still need to wait; output from inline R code (e.g. \texttt{\textbackslash{}Sexpr\{x{[}1{]}\}})
|
|
is automatically formatted in \TeX{} math notation (like \Sexpr{123456789})
|
|
if the result is numeric. You may check out a number of specific manuals
|
|
dedicated to specific features such as graphics in the website: \url{https://yihui.org/knitr/demo/}.
|
|
|
|
\subsection{Code Decoration}
|
|
|
|
The \textbf{highr} package \citep{R-highr} is used to highlight R
|
|
code, and the \textbf{formatR} package \citep{R-formatR} is used
|
|
to reformat R code (like \texttt{keep.source=FALSE} in Sweave but
|
|
will also try to retain comments). For \LaTeX{} output, the \textbf{framed}
|
|
package is used to decorate code chunks with a light gray background.
|
|
If this \LaTeX{} package is not found in the system, a version will
|
|
be copied directly from \textbf{knitr}. The prompt characters are
|
|
removed by default because they mangle the R source code in the output
|
|
and make it difficult to copy R code. The R output is masked in comments
|
|
by default based on the same rationale. It is easy to revert to the
|
|
output with prompts (set option \texttt{prompt=TRUE}), and you will
|
|
quickly realize the inconvenience to the readers if they want to copy
|
|
and run the code in the output document:
|
|
|
|
<<stupid-prompts, prompt=TRUE, comment=NA, highlight=FALSE>>=
|
|
x=rnorm(5)
|
|
x
|
|
var(x)
|
|
@
|
|
|
|
The example below shows the effect of \texttt{tidy=TRUE/FALSE}:
|
|
|
|
<<tidy-no, eval=FALSE, tidy=FALSE>>=
|
|
## option tidy=FALSE
|
|
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
|
|
@
|
|
<<tidy-yes, eval=FALSE, tidy=TRUE>>=
|
|
## option tidy=TRUE
|
|
for(k in 1:10){j=cos(sin(k)*k^2)+3;print(j-5)}
|
|
@
|
|
|
|
Note \texttt{=} is replaced by \texttt{<-} because \texttt{options('formatR.arrow')}
|
|
was set to be \texttt{TRUE} in this document; see the documentation
|
|
of \emph{tidy.source()} in \textbf{formatR} for details.
|
|
|
|
Many highlighting themes can be used in \textbf{knitr}, which are
|
|
borrowed from the \textbf{highlight} package by \href{http://www.andre-simon.de/}{Andre Simon}\footnote{not the R package mentioned before; for a preview of these themes,
|
|
see \url{http://www.andre-simon.de/dokuwiki/doku.php?id=theme_examples}}; it is also possible to use themes from \url{http://www.eclipsecolorthemes.org/}
|
|
by providing a theme id to \textbf{knitr}\footnote{many thanks to \href{https://github.com/ramnathv}{Ramnath Vaidyanathan}
|
|
for the work on themes}. See \texttt{?knit\_theme} for details.
|
|
|
|
\subsection{Graphics}
|
|
|
|
Graphics is an important part of reports, and several enhancements
|
|
have been made in \textbf{knitr}. For example, grid graphics may not
|
|
need to be explicitly printed as long as the same code can produce
|
|
plots in R (in some cases, however, they have to be printed, e.g.
|
|
in a loop, because you have to do so in an R terminal).
|
|
|
|
\subsubsection{Graphical Devices}
|
|
|
|
Over a long time, a frequently requested feature for Sweave was the
|
|
support for other graphics devices, which has been implemented since
|
|
R 2.13.0. Instead of using logical options like \texttt{png} or \texttt{jpeg}
|
|
(this list can go on and on), \textbf{knitr} uses a single option
|
|
\texttt{dev} (like \texttt{grdevice} in Sweave) which has support
|
|
for more than 20 devices. For instance, \texttt{dev='png'} will use
|
|
the \emph{png()} device, and \texttt{dev='CairoJPEG'} uses the \emph{CairoJPEG()}
|
|
device in the \textbf{Cairo} package (it has to be installed first,
|
|
of course). If none of these devices is satisfactory, you can provide
|
|
the name of a customized device function, which must have been defined
|
|
before it is called.
|
|
|
|
\subsubsection{Plot Recording}
|
|
|
|
As mentioned before, all the plots in a code chunk are first recorded
|
|
as R objects and then ``replayed'' inside a graphical device to
|
|
generate plot files. The \textbf{evaluate} package will record plots
|
|
per \emph{expression} basis, in other words, the source code is split
|
|
into individual complete expressions and \textbf{evaluate} will examine
|
|
possible plot changes in snapshots after each single expression has
|
|
been evaluated. For example, the code below consists of three expressions,
|
|
out of which two are related to drawing plots, therefore \textbf{evaluate}
|
|
will produce two plots by default:
|
|
|
|
<<low-level-plots, fig.keep='all', dev='tikz', fig.width=2.5, fig.height=2.5, out.width='.3\\textwidth', cache=TRUE>>=
|
|
par(mar=c(3,3,.1,.1))
|
|
plot(1:10, ann=FALSE,las=1)
|
|
text(5,9,'mass $\\rightarrow$ energy\n$E=mc^2$')
|
|
@
|
|
|
|
This brings a significant difference with traditional tools in R for
|
|
dynamic report generation, since low-level plotting changes can also
|
|
be recorded. The option \texttt{fig.keep} controls which plots to
|
|
keep in the output; \texttt{fig.keep='all'} will keep low-level changes
|
|
as separate plots; by default (\texttt{fig.keep='high'}), \textbf{knitr}
|
|
will merge low-level plot changes into the previous high-level plot,
|
|
like most graphics devices do. This feature may be useful for teaching
|
|
R graphics step by step. Note, however, low-level plotting commands
|
|
in a single expression (a typical case is a loop) will not be recorded
|
|
accumulatively, but high-level plotting commands, regardless of where
|
|
they are, will always be recorded. For example, this chunk will only
|
|
produce 2 plots instead of 21 plots because there are 2 complete expressions:
|
|
|
|
<<low-plot-loop, eval=FALSE>>=
|
|
plot(0,0,type='n',ann=FALSE)
|
|
for(i in seq(0, 2*pi,length=20)) points(cos(i),sin(i))
|
|
@
|
|
|
|
But this will produce 20 plots as expected:
|
|
|
|
<<high-plot-loop, eval=FALSE>>=
|
|
for(i in seq(0, 2*pi,length=20)) {plot(cos(i),sin(i),xlim=c(-1,1),ylim=c(-1,1))}
|
|
@
|
|
|
|
As I showed in the beginning of this manual, it is straightforward
|
|
to let \textbf{knitr} keep all the plots in a chunk and insert them
|
|
into the output document, so we no longer need the \texttt{cat('\textbackslash{}\textbackslash{}includegraphics\{\}')}
|
|
trick.
|
|
|
|
We can discard all previous plots and keep the last one only by \texttt{fig.keep='last'},
|
|
or keep only the first plot by \texttt{fig.keep='first'}, or discard
|
|
all plots by \texttt{fig.keep='none'}.
|
|
|
|
\subsubsection{Plot Rearrangement}
|
|
|
|
The option \texttt{fig.show} can decide whether to hold all plots
|
|
while evaluating the code and ``flush'' all of them to the end of
|
|
a chunk (\texttt{fig.show='hold'}), or just insert them to the place
|
|
where they were created (by default \texttt{fig.show='asis'}). Here
|
|
is an example of \texttt{fig.show='asis'}:
|
|
|
|
<<fig-hold, fig.show='asis', dev='pdf', fig.width=6, fig.height=4, out.width='.35\\linewidth'>>=
|
|
contour(volcano) # contour lines
|
|
filled.contour(volcano) # fill contour plot with colors
|
|
@
|
|
|
|
Beside \texttt{hold} and \texttt{asis}, the option \texttt{fig.show}
|
|
can take a third value: \texttt{animate}, which makes it possible
|
|
to insert animations into the output document. In \LaTeX{}, the package
|
|
\textbf{animate} is used to put together image frames as an animation.
|
|
For animations to work, there must be more than one plot produced
|
|
in a chunk. The option \texttt{interval} controls the time interval
|
|
between animation frames; by default it is 1 second. Note you have
|
|
to add \texttt{\textbackslash{}usepackage\{animate\}} in the \LaTeX{}
|
|
preamble, because \textbf{knitr} will not add it automatically. Animations
|
|
in the PDF output can only be viewed in Adobe Reader.
|
|
|
|
As a simple demonstration, here is a \href{http://en.wikipedia.org/wiki/Mandelbrot_set}{Mandelbrot animation}
|
|
taken from the \textbf{animation} package \citep{R-animation}; note
|
|
the PNG device is used because PDF files are too large. You should
|
|
be able to see the animation immediately with Acrobat Reader since
|
|
it was set to play automatically:
|
|
|
|
<<animate-demo, fig.show='animate', dev='png', out.width='.45\\linewidth', interval=.5, aniopts='controls,loop,autoplay', cache=TRUE>>=
|
|
library(animation)
|
|
demo('Mandelbrot', echo = FALSE, package = 'animation')
|
|
@
|
|
|
|
\subsubsection{Plot Size}
|
|
|
|
The \texttt{fig.width} and \texttt{fig.height} options specify the
|
|
size of plots in the graphics device, and the real size in the output
|
|
document can be different (see \texttt{out.width} and \texttt{out.height}).
|
|
When there are multiple plots per chunk, it is possible to arrange
|
|
more than one plot per line in \LaTeX{} \textendash{} just specify
|
|
\texttt{out.width} to be less than half of the current line width,
|
|
e.g. \texttt{out.width='.49\textbackslash{}\textbackslash{}linewidth'}.
|
|
|
|
\subsubsection{The tikz Device}
|
|
|
|
Beside PDF, PNG and other traditional R graphical devices, \textbf{knitr}
|
|
has special support to tikz graphics via the \textbf{tikzDevice} package
|
|
\citep{R-tikzDevice}, which is similar to \textbf{pgfSweave}. If
|
|
we set the chunk option \texttt{dev='tikz'}, the \emph{tikz()} device
|
|
in \textbf{tikzDevice} will be used to save plots. Options \texttt{sanitize}
|
|
and \texttt{external} are related to the tikz device: see the documentation
|
|
of \emph{tikz()} for details. Note \texttt{external=TRUE} in \textbf{knitr}
|
|
has a different meaning with \textbf{pgfSweave} \textendash{} it means
|
|
\texttt{standAlone=TRUE} in \emph{tikz()}, and the tikz graphics output
|
|
will be compiled to PDF \emph{immediately} after it is created, so
|
|
the ``externalization'' does not depend on the \textbf{tikz} package;
|
|
to maintain consistency in (font) styles, \textbf{knitr} will read
|
|
the preamble of the input document and use it in the tikz device.
|
|
At the moment, I'm not sure if this is a faithful way to externalize
|
|
tikz graphics, but I have not seen any problems so far. The assumption
|
|
to make, however, is that you declare all the styles in the preamble;
|
|
\textbf{knitr} is agnostic of \emph{local} style changes in the body
|
|
of the document.
|
|
|
|
Below is an example taken from StackOverflow\footnote{\url{http://stackoverflow.com/q/8190087/559676}};
|
|
we usually have to write R code like this to obtain a math expression
|
|
$\mathrm{d}\mathbf{x}_{t}=\alpha[(\theta-\mathbf{x}_{t})\mathrm{d}t+4]\mathrm{d}B_{t}$
|
|
in R graphics:
|
|
|
|
<<math-expr-R, eval=FALSE>>=
|
|
qplot(1:10, 1:10) + opts(title = substitute(paste(d *
|
|
bolditalic(x)[italic(t)] == alpha * (theta - bolditalic(x)[italic(t)]) *
|
|
d * italic(t) + lambda * d * italic(B)[italic(t)]), list(lambda = 4)))
|
|
@
|
|
|
|
With the tikz device, it is both straightforward and more beautiful:
|
|
|
|
<<math-expr-tikz, dev='tikz', fig.width=5, fig.height=3, out.width='.55\\linewidth', cache=TRUE, message=FALSE>>=
|
|
library(ggplot2)
|
|
qplot(1:10, 1:10) +
|
|
labs(title = sprintf('$\\mathrm{d}\\mathbf{x}_{t} = \\alpha[(\\theta - \\mathbf{x}_{t})\\mathrm{d}t + %d]\\mathrm{d}B_{t}$', 4))
|
|
@
|
|
|
|
The advantage of tikz graphics is the consistency of styles\footnote{Users are encouraged to read the vignette of \textbf{tikzDevice},
|
|
which is the most beautiful vignette I have ever seen in R packages:
|
|
\url{http://cran.r-project.org/web/packages/tikzDevice/vignettes/tikzDevice.pdf}}, and one disadvantage is that \LaTeX{} may not be able to handle
|
|
too large tikz files (it can run out of memory). For example, an R
|
|
plot with tens of thousands of graphical elements may fail to compile
|
|
in \LaTeX{} if we use the tikz device. In such cases, we can switch
|
|
to the PDF or PNG device, or reconsider our decision on the type of
|
|
plots, e.g., a scatter plot with millions of points is usually difficult
|
|
to read, and a contour plot or a hexagon plot showing the 2D density
|
|
can be a better alternative (they are smaller in size).
|
|
|
|
The graphics manual contains more detailed information and you can
|
|
check it out in the \href{https://yihui.org/knitr/demo/graphics/}{website}.
|
|
|
|
\subsection{Cache}
|
|
|
|
The feature of cache is not a new idea \textendash{} both \textbf{cacheSweave}
|
|
and \textbf{weaver} have implemented it based on Sweave, with the
|
|
former using \textbf{filehash} and the latter using \textsf{.RData}
|
|
images; \textbf{cacheSweave} also supports lazy-loading of objects
|
|
based on \textbf{filehash}. The \textbf{knitr} package directly uses
|
|
internal base R functions to save (\emph{tools:::makeLazyLoadDB()})
|
|
and lazy-load objects (\emph{lazyLoad()}). These functions are either
|
|
undocumented or marked as internal, but as far as I understand, they
|
|
are the tools to implement lazy-loading for packages. The \textbf{cacheSweave}
|
|
vignette has clearly explained lazy-loading, and roughly speaking,
|
|
lazy-loading means an object will not be really loaded into memory
|
|
unless it is really used somewhere. This is very useful for cache;
|
|
sometimes we read a large object and cache it, then take a subset
|
|
for analysis and this subset is also cached; in the future, the initial
|
|
large object will not be loaded into R if our computation is only
|
|
based on the object of its subset.
|
|
|
|
The paths of cache files are determined by the chunk option \texttt{cache.path};
|
|
by default all cache files are created under a directory \textsf{cache}
|
|
relative to the current working directory, and if the option value
|
|
contains a directory (e.g. \texttt{cache.path='cache/abc-'}), cache
|
|
files will be stored under that directory (automatically created if
|
|
it does not exist). The cache is invalidated and purged on any changes
|
|
to the code chunk, including both the R code and chunk options\footnote{One exception is the \texttt{include} option, which is not cached
|
|
because \texttt{include=TRUE/FALSE} does not affect code evaluation;
|
|
meanwhile, the value \texttt{getOption('width')} is also cached, so
|
|
if you change this option, the cache will also be invalidated (this
|
|
option affects the width of text output)}; this means previous cache files of this chunk are removed (filenames
|
|
are identified by the chunk label). Unlike \textbf{pgfSweave}, cache
|
|
files will never accumulate since old cache files will always be removed
|
|
in \textbf{knitr}. Unlike \textbf{weaver} or \textbf{cacheSweave},
|
|
\textbf{knitr} will try to preserve these side-effects:
|
|
\begin{enumerate}
|
|
\item printed results: meaning that any output of a code chunk will be loaded
|
|
into the output document for a cached chunk, although it is not really
|
|
evaluated. The reason is \textbf{knitr} also cache the output of a
|
|
chunk as a character string. Note this means graphics output is also
|
|
cached since it is part of the output. It has been a pain for me for
|
|
a long time to have to lose output to gain cache;
|
|
\item loaded packages: after the evaluation of each cached chunk, the list
|
|
of packages used in the current R session is written to a file under
|
|
the cache path named \textsf{\_\_packages}; next time if a cached
|
|
chunk needs to be rebuilt, these packages will be loaded first. The
|
|
reasons for caching package names are, it can be slow to load some
|
|
packages, and a package might be loaded in a previous cached chunk
|
|
which is not available to the next cached chunk when only the latter
|
|
needs to be rebuilt. Note this only applies to cached chunks, and
|
|
for uncached chunks, you must always use \emph{library()} to load
|
|
packages explicitly;
|
|
\end{enumerate}
|
|
Although \textbf{knitr} tries to keep some side-effects, there are
|
|
still other types of side-effects like setting \emph{par()} or \emph{options()}
|
|
which are not cached. Users should be aware of these special cases,
|
|
and make sure to clearly separate the code which is not meant to be
|
|
cached to other chunks which are not cached, e.g., set all global
|
|
options in the first chunk of a document and do not cache that chunk.
|
|
|
|
Sometimes a cached chunk may need to use objects from other cached
|
|
chunks, which can bring a serious problem \textendash{} if objects
|
|
in previous chunks have changed, this chunk will not be aware of the
|
|
changes and will still use old cached results, unless there is a way
|
|
to detect such changes from other chunks. There is an option called
|
|
\texttt{dependson} in \textbf{cacheSweave} which does this job. We
|
|
can explicitly specify which other chunks this chunk depends on by
|
|
setting an option like \texttt{dependson='chunkA;chunkB'} or equivalently
|
|
\texttt{dependson=c('chunkA', 'chunkB')}. Each time the cache of a
|
|
chunk is rebuilt, all other chunks which depend on this chunk will
|
|
lose cache, hence their cache will be rebuilt as well.
|
|
|
|
Another way to specify the dependencies among chunks is to use the
|
|
chunk option \texttt{autodep} and the function \emph{dep\_auto()}.
|
|
This is an experimental feature borrowed from \textbf{weaver} which
|
|
frees us from setting chunk dependencies manually. The basic idea
|
|
is, if a latter chunk uses any objects created from a previous chunk,
|
|
the latter chunk is said to depend on the previous one. The function
|
|
\emph{findGlobals()} in the \textbf{codetools} package is used to
|
|
find out all global objects in a chunk, and according to its documentation,
|
|
the result is an approximation. Global objects roughly mean the ones
|
|
which are not created locally, e.g. in the expression \texttt{function()
|
|
\{y <- x\}}, \texttt{x} should be a global object, whereas \texttt{y}
|
|
is local. Meanwhile, we also need to save the list of objects created
|
|
in each cached chunk, so that we can compare them to the global objects
|
|
in latter chunks. For example, if chunk A created an object \texttt{x}
|
|
and chunk B uses this object, chunk B must depend on A, i.e. whenever
|
|
A changes, B must also be updated. When \texttt{autodep=TRUE}, \textbf{knitr}
|
|
will write out the names of objects created in a cached chunk as well
|
|
as those global objects in two files named \textsf{\_\_objects} and
|
|
\textsf{\_\_globals} respectively; later we can use the function \emph{dep\_auto()}
|
|
to analyze the object names to figure out the dependencies automatically.
|
|
See \url{https://yihui.org/knitr/demo/cache/} for examples.
|
|
|
|
Yet another way to specify dependencies is \emph{dep\_prev()}: this
|
|
is a conservative approach which sets the dependencies so that a cached
|
|
chunk will depend on all its previous chunks, i.e. whenever a previous
|
|
chunk is updated, all later chunks will be updated accordingly.
|
|
|
|
\subsection{Code Externalization}
|
|
|
|
It can be more convenient to write R code in a separate file, rather
|
|
than mixing it into a \LaTeX{} document; for example, we can run
|
|
R code successively in a pure R script from one chunk to the other
|
|
without jumping through other texts. Since I prefer using \LyX{}
|
|
to write reports, Sweave is even more inconvenient because I have
|
|
to recompile the whole document each time, even if I only want to
|
|
know the results of a single chunk. Therefore \textbf{knitr} introduced
|
|
the feature of code externalization to a separate R script. Currently
|
|
the setting is like this: the R script also uses chunk labels (marked
|
|
in the form \texttt{\#\# -{}-{}-{}- chunk-label} by default); if the
|
|
code chunk in the input document is empty, \textbf{knitr} will match
|
|
its label with the label in the R script to input external R code.
|
|
For example, suppose this is a code chunk labelled as \texttt{Q1}
|
|
in an R script named \textsf{homework1-xie.R} which is under the same
|
|
directory as the Rnw document:
|
|
|
|
<<ext-r-code, eval=FALSE>>=
|
|
## ---- Q1 ---------------------
|
|
gcd = function(m, n) {
|
|
while ((r <- m %% n) != 0) {
|
|
m = n; n = r
|
|
}
|
|
n
|
|
}
|
|
@
|
|
|
|
In the Rnw document, we can first read the script using the function
|
|
\emph{read\_chunk()}:
|
|
|
|
<<read-chunk, eval=FALSE>>=
|
|
read_chunk('homework1-xie.R')
|
|
@
|
|
|
|
This is usually done in an early chunk, and we can use the chunk \texttt{Q1}
|
|
later in the Rnw document:
|
|
|
|
<<use-ext-chunk, echo=FALSE, comment=NA>>=
|
|
cat('<<Q1, echo=TRUE, tidy=TRUE>>=','@',sep='\n')
|
|
@
|
|
|
|
Different documents can read the same R script, so the R code can
|
|
be reusable across different input documents.
|
|
|
|
\subsection{Evaluation of Chunk Options\label{subsec:conditional}}
|
|
|
|
By default \textbf{knitr} uses a new syntax to parse chunk options:
|
|
it treats them as function arguments instead of a text string to be
|
|
split to obtain option values. This gives the user much more power
|
|
than the old syntax; we can pass arbitrary R objects to chunk options
|
|
besides simple ones like \texttt{TRUE}/\texttt{FALSE}, numbers and
|
|
character strings. The page \url{https://yihui.org/knitr/demo/sweave/}
|
|
has given two examples to show the advantages of the new syntax. Here
|
|
we show yet another useful application.
|
|
|
|
Before \textbf{knitr} 0.3, there was a feature named ``conditional
|
|
evaluation''\footnote{request from \url{https://plus.google.com/u/0/116405544829727492615/posts/43WrRUffjzK}}.
|
|
The idea is, instead of setting chunk options \texttt{eval} and \texttt{echo}
|
|
to be \texttt{TRUE} or \texttt{FALSE} (constants), their values can
|
|
be controlled by global variables in the current R session. This enables
|
|
\textbf{knitr} to conditionally evaluate code chunks according to
|
|
variables. For example, here we assign \texttt{TRUE} to a variable
|
|
\texttt{dothis}:
|
|
|
|
<<cond-variable>>=
|
|
dothis=TRUE
|
|
@
|
|
|
|
In the next chunk, we set chunk options \texttt{eval=dothis} and \texttt{echo=!dothis},
|
|
both are valid R expressions since the variable \texttt{dothis} exists.
|
|
As we can see, the source code is hidden, but it was indeed evaluated:
|
|
|
|
<<cond-out1, eval=dothis, echo=!dothis>>=
|
|
print('you cannot see my source because !dothis is FALSE')
|
|
@
|
|
|
|
Then we set \texttt{eval=dothis} and \texttt{echo=dothis} for another
|
|
chunk:
|
|
|
|
<<cond-out2,eval=dothis,echo=dothis>>=
|
|
dothis
|
|
@
|
|
|
|
If we change the value of \texttt{dothis} to \texttt{FALSE}, neither
|
|
of the above chunks will be evaluated any more. Therefore we can control
|
|
many chunks with a single variable, and present results selectively.
|
|
|
|
This old feature requires \textbf{knitr} to treat \texttt{eval} and
|
|
\texttt{echo} specially, and we can easily see that it is no longer
|
|
necessary with the new syntax: \texttt{eval=dothis} will tell R to
|
|
find the variable \texttt{dothis} automatically just like we call
|
|
a function \texttt{foobar(eval = dothis)}. What is more, all options
|
|
will be evaluated as R expressions unless they are already constants
|
|
which do not need to be evaluated, so this old feature has been generalized
|
|
to all other options naturally.
|
|
|
|
\subsection{Customization}
|
|
|
|
The \textbf{knitr} package is ready for customization. Both the patterns
|
|
and hooks can be customized; see the package website for details.
|
|
Here I show an example on how to save \textbf{rgl} plots \citep{R-rgl}
|
|
using a customized hook function. First we define a hook named \texttt{rgl}
|
|
using the function \emph{hook\_rgl()} in \textbf{rgl}:
|
|
|
|
<<rgl-demo>>=
|
|
library(rgl)
|
|
knit_hooks$set(rgl = hook_rgl)
|
|
head(hook_rgl) # the hook function is defined as this
|
|
@
|
|
|
|
Then we only have to set the chunk option \texttt{rgl=TRUE}:
|
|
|
|
<<fancy-rgl, rgl=TRUE, dev='png', fig.width=5, fig.height=5, out.width='2in', message=FALSE, warning=FALSE, cache=TRUE>>=
|
|
library(rgl)
|
|
demo('bivar', package='rgl', echo=FALSE)
|
|
par3d(zoom=.7)
|
|
@
|
|
|
|
Due to the flexibility of output hooks, \textbf{knitr} supports several
|
|
different output formats. The implementation is fairly easy, e.g.,
|
|
for \LaTeX{} we put R output in \texttt{verbatim} environments, and
|
|
in HTML, it is only a matter of putting output in \texttt{div} layers.
|
|
These are simply character string operations. Many demos in \url{https://yihui.org/knitr/demo/}
|
|
show this idea clearly. This manual did not cover all the features
|
|
of \textbf{knitr}, and users are encouraged to thumb through the website
|
|
to know more possible features.
|
|
|
|
\section{Editors}
|
|
|
|
You can use any text editors to write the source documents, but some
|
|
have built-in support for \textbf{knitr}. Both RStudio (\url{http://www.rstudio.org})
|
|
and \LyX{} (\url{http://www.lyx.org}) have full support for \textbf{knitr},
|
|
and you can compile the document to PDF with just one click. See \url{https://yihui.org/knitr/demo/rstudio/}
|
|
and \url{https://yihui.org/knitr/demo/lyx/} respectively. It is also
|
|
possible to support other editors like \href{https://yihui.org/knitr/demo/eclipse/}{Eclipse},
|
|
\href{https://yihui.org/knitr/demo/editors/}{Texmaker and WinEdt};
|
|
see the demo list in the website for configuration instructions.
|
|
|
|
\section*{About This Document}
|
|
|
|
This manual was written in \LyX{} and compiled with \textbf{knitr}
|
|
(version \Sexpr{packageVersion('knitr')}). The \LyX{} source and
|
|
the Rnw document exported from \LyX{} can be found under these directories:
|
|
|
|
<<source-location, eval=FALSE>>=
|
|
system.file('examples', 'knitr-manual.lyx', package='knitr') # lyx source
|
|
system.file('examples', 'knitr-manual.Rnw', package='knitr') # Rnw source
|
|
@
|
|
|
|
You can use the function \emph{knit()} to knit the Rnw document (remember
|
|
to put the two \textsf{.bib} files under the same directory), and
|
|
you need to make sure all the R packages used in this document are
|
|
installed:
|
|
|
|
<<required-packages, eval=FALSE>>=
|
|
install.packages(c('animation', 'rgl', 'tikzDevice', 'ggplot2'))
|
|
@
|
|
|
|
Feedback and comments on this manual and the package are always welcome.
|
|
Bug reports and feature requests can be sent to \url{https://github.com/yihui/knitr/issues},
|
|
and questions can be delivered to the \href{mailto:knitr@googlegroups.com}{mailing list}
|
|
\url{https://groups.google.com/group/knitr}.
|
|
|
|
% when knitr is updated, this chunk will be updated; why?
|
|
<<auto-bib, version=packageVersion('knitr'), echo=FALSE, cache=TRUE, message=FALSE, warning=FALSE>>=
|
|
# write all packages in the current session to a bib file
|
|
write_bib(c(.packages(), 'evaluate', 'formatR', 'highr'), file = 'knitr-packages.bib')
|
|
@
|
|
|
|
\bibliographystyle{jss}
|
|
\bibliography{knitr-manual,knitr-packages}
|
|
|
|
\end{document}
|