238 lines
8.5 KiB
Plaintext
Raw Normal View History

2025-01-12 00:52:51 +08:00
\documentclass{article}
%
\usepackage{myVignette}
\usepackage[authoryear,round]{natbib}
\bibliographystyle{plainnat}
\newcommand{\noFootnote}[1]{{\small (\textit{#1})}}
\newcommand{\myOp}[1]{{$\left\langle\ensuremath{#1}\right\rangle$}}
%% vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
%%\VignetteIndexEntry{Design Issues in Matrix package Development}
%%\VignetteDepends{Matrix,utils}
\SweaveOpts{engine=R,eps=FALSE,pdf=TRUE,width=5,height=3,strip.white=true,keep.source=TRUE}
% ^^^^^^^^^^^^^^^^
\title{Design Issues in Matrix package Development}
\author{Martin Maechler and Douglas Bates\\R Core Development Team
\\\email{maechler@stat.math.ethz.ch}, \email{bates@r-project.org}}
\date{Spring 2008; Aug~2022 ({\tiny typeset on \tiny\today})}
%
\begin{document}
\maketitle
\begin{abstract}
This is a (\textbf{currently very incomplete}) write-up of the many smaller and
larger design decisions we have made in organizing functionalities in the
Matrix package.
Classes: There's a rich hierarchy of matrix classes, which you can
visualize as a set of trees whose inner (and ``upper'') nodes are
\emph{virtual} classes and only the leaves are non-virtual ``actual'' classes.
Functions and Methods:
- setAs()
- others
\end{abstract}
%% Note: These are explained in '?RweaveLatex' :
<<preliminaries, echo=FALSE>>=
options(width=75)
library(utils) # for R_DEFAULT_PACKAGES=NULL
library(Matrix)
@
\section{The Matrix class structures}
\label{sec:classes}
Take Martin's DSC 2007 talk to depict the Matrix class hierarchy;
available from {\small
\url{https://stat.ethz.ch/~maechler/R/DSC-2007_MatrixClassHierarchies.pdf}} .
% ~/R/Meetings-Kurse-etc/2007-DSC/talk.tex Matrix-classes.Rnw
--- --- --- %% \hrule[1pt]{\textwidth}
From far, there are \textbf{three} separate class hierarchies, and every \pkg{Matrix} package
matrix has an actual (or ``factual'') class inside these three hierarchies:
% ~/R/Meetings-Kurse-etc/2007-DSC/Matrix-classes.Rnw
More formally, we have three (\ 3 \ ) main ``class classifications'' for our Matrices, i.e.,\\
three ``orthogonal'' partitions of ``Matrix space'', and every Matrix
object's class corresponds to an \emph{intersection} of these three partitions;
i.e., in R's S4 class system: We have three independent inheritance
schemes for every Matrix, and each such Matrix class is simply defined to
\texttt{contain} three \emph{virtual} classes (one from each partitioning
scheme), e.g,
The three partioning schemes are
\begin{enumerate}
\item Content \texttt{type}: Classes \code{dMatrix}, \code{lMatrix},
\code{nMatrix},
(\code{iMatrix}, \code{zMatrix}) for entries of type \textbf{d}ouble,
\textbf{l}ogical, patter\textbf{n} (and not yet \textbf{i}nteger and
complex) Matrices.
\code{nMatrix} only stores the
\emph{location} of non-zero matrix entries (where as logical Matrices
can also have \code{NA} entries!)
\item structure: general, triangular, symmetric, diagonal Matrices
\item sparsity: \code{denseMatrix}, \code{sparseMatrix}
\end{enumerate}
For example in the most used sparseMatrix class, \code{"dgCMatrix"},
the three initial letters \code{dgC} each codes for one of the three hierarchies:
\begin{description}
\item{d: } \textbf{d}ouble
\item{g: } \textbf{g}eneral
\item{C: } \textbf{C}sparseMatrix, where \textbf{C} is for \textbf{C}olumn-compressed.
\end{description}
Part of this is visible from printing \code{getClass("\emph{<classname>}")}:
<<dgC-ex>>=
getClass("dgCMatrix")
@
Another example is the \code{"nsTMatrix"} class, where \code{nsT} stands for
\begin{description}
\item{n: } \textbf{n} is for ``patter\textbf{n}'', boolean content where
only the \emph{locations} of the non-zeros need to be stored.
\item{t: } \textbf{t}riangular matrix; either \textbf{U}pper, or \textbf{L}ower.
\item{T: } \textbf{T}sparseMatrix, where \textbf{T} is for \textbf{T}riplet,
the simplest but least efficient way to store a sparse matrix.
\end{description}
From R itself, via \code{getClass(.)}:
<<dgC-ex>>=
getClass("ntTMatrix")
@
\subsection{Diagonal Matrices}
\label{ssec:diagMat}
The class of diagonal matrices is worth mentioning for several reasons.
First, we have wanted such a class, because \emph{multiplication}
methods are particularly simple with diagonal matrices.
The typical constructor is \Rfun{Diagonal} whereas the accessor
(as for traditional matrices), \Rfun{diag} simply returns the
\emph{vector} of diagonal entries:
<<diag-class>>=
(D4 <- Diagonal(4, 10*(1:4)))
str(D4)
diag(D4)
@
We can \emph{modify} the diagonal in the traditional way
(via method definition for \Rfun{diag<-}):
<<diag-2>>=
diag(D4) <- diag(D4) + 1:4
D4
@
Note that \textbf{unit-diagonal} matrices (the identity matrices of linear algebra)
with slot \code{diag = "U"} can have an empty \code{x} slot, very
analogously to the unit-diagonal triangular matrices:
<<unit-diag>>=
str(I3 <- Diagonal(3)) ## empty 'x' slot
getClass("diagonalMatrix") ## extending "sparseMatrix"
@
Originally, we had implemented diagonal matrices as \emph{dense} rather than sparse
matrices. After several years it became clear that this had not been
helpful really both from a user and programmer point of view.
So now, indeed the \code{"diagonalMatrix"} class does also extend
\code{"sparseMatrix"}, i.e., is a subclass of it.
However, we do \emph{not} store explicitly
where the non-zero entries are, and the class does \emph{not} extend any of
the typical sparse matrix classes, \code{"CsparseMatrix"},
\code{"TsparseMatrix"}, or \code{"RsparseMatrix"}.
Rather, the \code{diag()}onal (vector) is the basic part of such a matrix,
and this is simply the \code{x} slot unless the \code{diag} slot is \code{"U"},
the unit-diagonal case, which is the identity matrix.
Further note, e.g., from the \code{?$\,$Diagonal} help page, that we provide
(low level) utility function
\code{.sparseDiagonal()} with wrappers
\code{.symDiagonal()} and \code{.trDiagonal()} which will provide diagonal
matrices inheriting from \code{"CsparseMatrix"} which may be advantageous
in \emph{some cases}, but less efficient in others, see the help page.
\section{Matrix Transformations}
\label{sec:trafos}
\subsection{Coercions between Matrix classes}
\label{ssec:coerce}
You may need to transform Matrix objects into specific shape (triangular,
symmetric), content type (double, logical, \dots) or storage structure
(dense or sparse).
Every useR should use \code{as(x, <superclass>)} to this end, where
\code{<superclass>} is a \emph{virtual} Matrix super class, such as
\code{"triangularMatrix"} \code{"dMatrix"}, or \code{"sparseMatrix"}.
In other words, the user should \emph{not} coerce directly to a specific
desired class such as \code{"dtCMatrix"}, even though that may
occasionally work as well.
Here is a set of rules to which the Matrix developers and the users
should typically adhere:
\begin{description}
\item[Rule~1]: \code{as(M, "matrix")} should work for \textbf{all} Matrix
objects \code{M}.
\item[Rule~2]: \code{Matrix(x)} should also work for matrix like
objects \code{x} and always return a ``classed'' Matrix.
Applied to a \code{"matrix"} object \code{m}, \code{M. <- Matrix(m)} can be
considered a kind of inverse of \code{m <- as(M, "matrix")}.
For sparse matrices however, \code{M.} well be a
\code{CsparseMatrix}, and it is often ``more structured'' than \code{M},
e.g.,
<<Matrix-ex>>=
(M <- spMatrix(4,4, i=1:4, j=c(3:1,4), x=c(4,1,4,8))) # dgTMatrix
m <- as(M, "matrix")
(M. <- Matrix(m)) # dsCMatrix (i.e. *symmetric*)
@
\item[Rule~3]: All the following coercions to \emph{virtual} matrix
classes should work:\\
\begin{enumerate}
\item \code{as(m, "dMatrix")}
\item \code{as(m, "lMatrix")}
\item \code{as(m, "nMatrix")}
\item \code{as(m, "denseMatrix")}
\item \code{as(m, "sparseMatrix")}
\item \code{as(m, "generalMatrix")}
\end{enumerate}
whereas the next ones should work under some assumptions:
\begin{enumerate}
\item \code{as(m1, "triangularMatrix")} \\
should work when \code{m1} is a triangular matrix, i.e. the upper or
lower triangle of \code{m1} contains only zeros.
\item \code{as(m2, "symmetricMatrix")}
should work when \code{m2} is a symmetric matrix in the sense of
\code{isSymmetric(m2)} returning \code{TRUE}.
Note that this is typically equivalent to something like
\code{isTRUE(all.equal(m2, t(m2)))}, i.e., the lower and upper
triangle of the matrix have to be equal \emph{up to small
numeric fuzz}.
\end{enumerate}
\end{description}
\section{Session Info}
<<sessionInfo, results=tex>>=
toLatex(sessionInfo())
@
%not yet
%\bibliography{Matrix}
\end{document}