2025-01-12 04:36:52 +08:00

314 lines
11 KiB
Plaintext

\documentclass[a4paper]{article}
\usepackage[round,longnamesfirst]{natbib}
\usepackage{graphicx,keyval,thumbpdf,a4wide,makeidx,color,colordvi}
\usepackage{amsfonts,hyperref}
\usepackage[utf8]{inputenc}
\DeclareUnicodeCharacter{201C}{"}
\DeclareUnicodeCharacter{201D}{"}
\newcommand\R{\textsf{R}}
\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
\newcommand{\sQuote}[1]{`{#1}'}
\newcommand{\dQuote}[1]{``{#1}''}
\newcommand{\file}[1]{\sQuote{\textsf{#1}}}
\newcommand{\data}[1]{\texttt{#1}}
\newcommand{\var}[1]{\textit{#1}}
\newcommand{\class}[1]{\textsf{#1}}
\newcommand{\proglang}[1]{\textsf{#1}}
%% \code without `-' ligatures
\def\nohyphenation{\hyphenchar\font=-1 \aftergroup\restorehyphenation}
\def\restorehyphenation{\hyphenchar\font=`-}
{\catcode`\-=\active%
\global\def\code{\bgroup%
\catcode`\-=\active \let-\codedash%
\Rd@code}}
\def\codedash{-\discretionary{}{}{}}
\def\Rd@code#1{\texttt{\nohyphenation#1}\egroup}
\newcommand{\codefun}[1]{\code{#1()}}
\newcommand{\codefunind}[1]{\codefun{#1}\index{\texttt{#1}}}
\newcommand{\codeind}[1]{\code{#1}\index{\texttt{#1}}}
\SweaveOpts{strip.white=true}
\definecolor{Blue}{rgb}{0,0,0.8}
\definecolor{Red}{rgb}{0.7,0,0}
\date{2009-02-17}
\title{A Generic Registry Infrastructure for \R}
\author{David Meyer}
%\VignetteIndexEntry{Registry}
%\VignetteDepends{registry}
%\VignetteKeywords{registry}
%\VignettePackage{registry}
\makeindex{}
\sloppy{}
\begin{document}
\maketitle
% \begin{abstract}
% This document introduces a generic registry infrastructure for \R,
% provided by the \pkg{registry} package.
% \end{abstract}
<<echo=FALSE>>=
options(width = 80)
library("registry")
@ %
\section{Introduction}
\label{sec:introduction}
More and more, \R~packages are offering dynamic functionality,
allowing users to extend a \dQuote{repository} of initial features or
data. For example, the \pkg{proxy} package \citep{registry:meyer+buchta:2008}
provides an enhanced
\codefun{dist} function for computing dissimilarity matrices,
allowing to choose among several proximity
measures stored in a registry. Each entry is composed of a small
workhorse function and some meta data including, e.g., a character vector
of aliases, literature references, the formula in plain text,
a function to coerce
between similarity and distance, and a type categorization
(binary, metric, etc.). Users can add new proximity measures to the
registry at run-time and immediately use them without recreating the
package, specifying one of the aliases defined in the meta data.
Similarly, the \pkg{relations} \citep{registry:hornik+meyer:2008}
and \pkg{CLUE} \citep{registry:hornik:2005,registry:hornik:2007}
packages use simple
registries internally to link some meta data to available functions,
used by the high-level consensus ranking and cluster ensemble
functions, respectively.
Such a registry, whether exposed to the user or not, is conceptually a
small in-memory data base where entries with a common field structure are
stored and retrieved and whose fields can be of mixed type.
At first sight, a data frame seems to be the
data structure of choice for an appropriate implementation.
Unfortunately, data frames are inconvenient to use
with factors, functions, or other recursive types such as lists
due to automatic coercions taking place behind the scenes. In fact, a
simpler, record-like structure such as a list with named components
(\dQuote{fields}) appears more practical. Also,
features known from \dQuote{real} data bases such as compound keys,
validity checking of new entries, and use of access rights are not
available by default and need to be \dQuote{reinvented} every time
they are needed.
The \pkg{registry} package provides a simple mechanism for defining
and manipulating user-extensible registry objects. A typical
use case in the context of an \R~package could include the following steps:
\begin{enumerate}
\item Create one or more registry objects inside the package's namespace.
\item Insert entries to the registry.
\item Possibly, \dQuote{seal} the entries and set access rights.
\item Possibly, export the registry object to the user level.
\item Browse and retrieve entries from the registry.
\end{enumerate}
In the following, we explain these steps in more detail:
first, how a registry can be set up; second, how entries
can be added, modified and retrieved; and third, how a registry can be
sealed and restricted through the definition of access rights.
\section{Creating Registries}
A registry basically is a container (implemented in \R~as an
environment), along with some access functions. A new object of class
\code{registry} can simply be created using the \codefun{registry} function:
<<>>=
library(registry)
R <- registry()
print(R)
@
Optional parameters include the specification of an (additional) class
for the created registry object and the individual entries,
as well as the specification of some validity function checking new
entries to be added to the registry.
In the following, we will use the example of a simple address book,
whose entries include first and last name, address, age, home/cell
phone number, and a business/private classification.
Last and first name build the search key. Age is an
optional integer in the range of 1 and
99. Additionally, at least one phone number should be added to the registry.
We start by creating two simple validity functions. The first one, to
be specified at field level later on, checks a given age:
<<>>=
checkAge <- function(x) stopifnot(is.na(x) || x > 0 && x < 100)
@
The second one, specified at registry level,
checks whether a given registry entry (list of named components)
contains at least one phone number:
<<>>=
checkPhone <- function(x) stopifnot(!is.na(x$mobile) || !is.na(x$home))
@
Next, we create a registry of class \code{Addressbook} (inheriting
from \code{registry}), containing entries of class \code{Address} and
using the above validity function.
<<>>=
R <- registry(registry_class = "Addressbook", entry_class = "Address",
validity_FUN = checkPhone)
@
The additional class for the registry allows, e.g., user-defined printing:
<<>>=
print.Addressbook <-
function(x, ...) {
writeLines(sprintf("An address book with %i entries.\n", length(x)))
invisible(x)
}
print(R)
@
At this stage, we are ready to set up the field information. First and last
names are mandatory character fields, uniquely identifying an entry
(key fields). Lookups should work with partial completion, ignoring case:
<<>>=
R$set_field("last", type = "character", is_key = TRUE, index_FUN = match_partial_ignorecase)
R$set_field("first", type = "character", is_key = TRUE, index_FUN = match_partial_ignorecase)
@
The address is also character, but optional:
<<>>=
R$set_field("address", type = "character")
@
At least one phone number (character) is required. This can be
achieved by making them optional, and using the validity
function specified at the registry level to check whether one of them is empty:
<<>>=
R$set_field("mobile", type = "character")
R$set_field("home", type = "character")
@
The age field is an optional integer with a defined range, checked by
the field-level validity function:
<<>>=
R$set_field("age", type = "integer", validity_FUN = checkAge)
@
Finally, the business/private category is defined by specifying the
possible alternatives (\code{Business} is set as default):
<<>>=
R$set_field("type", type = "character",
alternatives = c("Business", "Private"),
default = "Business")
@
The setup for a field can be retrieved using \codefun{get\_field}:
<<>>=
R$get_field("type")
@
\codefun{get\_fields} returns the complete list.
\section{Using Registries}
We now can start adding entries to the registry:
<<>>=
R$set_entry(last = "Smith", first = "Mary", address = "Vienna",
home = "734 43 34", type = "Private", age = 44L)
R$set_entry(last = "Smith", first = "Peter", address = "New York",
mobile = "878 78 87")
@
If all field values are specified, the field names can be omitted:
<<>>=
R$set_entry("Myers", "John", "Washington", "52 32 34", "898 89 99",
33L, "Business")
@
Duplicate or invalid entries are not accepted:
<<>>=
TRY <- function(expr) tryCatch(expr, error = print)
TRY(R$set_entry(last = "Smith", first = "Mary"))
TRY(R$set_entry(last = "Miller", first = "Henry"))
TRY(R$set_entry(last = "Miller", first = "Henry", age = 12.5))
TRY(R$set_entry(last = "Miller", first = "Henry", age = 999L))
@
A single entry can be retrieved using \codefun{get\_entry}:
<<>>=
R$get_entry(last = "Smith", first = "mar")
@
Since returned entries inherit from \code{Address}, we can provide a
user-defined print method:
<<>>=
print.Address <- function(x) with(x,
writeLines(sprintf("%s %s, %s; home: %s, mobile: %s; age: %i (%s)", first, last, address, home, mobile, age, type)))
R$get_entry(last = "Smith", first = "mar")
@
Note that even though
the first name of Mary Smith is incompletely specified and in
lower case, the lookup is still successful
because of the partial matching indexing function. The \code{[[} operator
can be used as an alternative to \codefun{get\_entry}:
<<>>=
R[["Myers"]]
@
For Myers, the last name uniquely identifies the entry, so
the first name can be omitted. Key values can have alternative values:
<<>>=
R$set_entry(last = "Frears", first = c("Joe", "Jonathan"),
address = "Washington", home = "721 42 34")
@
Either of them can be used for retrieval:
<<>>=
identical(R[["Frears", "Jonathan"]], R[["Frears", "Joe"]])
@
Unsuccessful lookups result in a
return of \code{NULL}. Multiple entries can be retrieved
using the \codefun{get\_entries} accessing function. They are returned
in a list whose component names are generated from the key values:
<<>>=
R$get_entries("Smith")
@
Full-text search in all information is provided by \codefun{grep\_entries}:
<<>>=
R$grep_entries("Priv")
@
A list of all entries can be obtained using either of:
<<eval=FALSE>>=
R$get_entries()
R[]
@
The summary method for registry objects returns a data frame:
<<>>=
summary(R)
@
Entries can also be modified using \codefun{modify\_entry}, specifying
key and new field values:
<<>>=
R[["Smith", "Peter"]]
R$modify_entry(last = "Smith", first = "Peter", age = 22L)
R[["Smith", "Peter"]]
@
Finally, entries can be removed using \codefun{delete\_entry}:
<<>>=
R$delete_entry(last = "Smith", first = "Peter")
R[["Smith", "Peter"]]
@
\section{Sealing Registries and Setting Access Rights}
Occasionally, developers might want to protect a registry that ships
with some package to prevent accidental deletions or
alterations. For this, \pkg{registry} offers two mechanisms: first, a
registry object can be \dQuote{sealed} to prevent modifications of
\emph{existing} data:
<<>>=
R$seal_entries()
TRY(R$delete_entry("Smith", "Mary"))
R$set_entry(last = "Slater", first = "Christian", address = "Boston",
mobile = "766 23 88")
R[["Slater"]]
@
Second, the access permissions for registries can be restricted:
<<>>=
R$get_permissions()
R$restrict_permissions(delete_entries = FALSE)
TRY(R$delete_entry("Slater"))
R$modify_entry(last = "Slater", first = "Christian", age = 44L)
R[["Slater"]]
@
\bibliographystyle{abbrvnat}
\bibliography{registry}
\end{document}