178 lines
6.9 KiB
Plaintext
178 lines
6.9 KiB
Plaintext
\documentclass{article}[11pt]
|
|
\usepackage{Sweave}
|
|
\usepackage{amsmath}
|
|
\newcommand{\code}[1]{\texttt{#1}}
|
|
|
|
\addtolength{\textwidth}{1in}
|
|
\addtolength{\oddsidemargin}{-.5in}
|
|
\setlength{\evensidemargin}{\oddsidemargin}
|
|
|
|
\SweaveOpts{keep.source=TRUE, fig=FALSE}
|
|
% Ross Ihaka suggestions
|
|
\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em}
|
|
\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em}
|
|
\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
|
|
\fvset{listparameters={\setlength{\topsep}{0pt}}}
|
|
\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}
|
|
|
|
\SweaveOpts{prefix.string=adjcurve,width=6,height=4}
|
|
\setkeys{Gin}{width=\textwidth}
|
|
%\VignetteIndexEntry{Matrix exponentials}
|
|
|
|
<<init, echo=FALSE>>=
|
|
options(continue=" ", width=60)
|
|
options(SweaveHooks=list(fig=function() par(mar=c(4.1, 4.1, .3, 1.1))))
|
|
pdf.options(pointsize=8) #text in graph about the same as regular text
|
|
library(survival, quietly=TRUE)
|
|
library(Matrix, quietly=TRUE)
|
|
@
|
|
\title{Matrix exponentials and survival curves}
|
|
\author{Terry Therneau}
|
|
\date{May 2023}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\section{Simple Cox models}
|
|
|
|
Define
|
|
\begin{align*}
|
|
N(t) &= \sum_i N_i(t) \\
|
|
Y(t) &= \sum_i Y_i(t) \\
|
|
\lambda(t) &= dN(t)/Y(t)
|
|
\end{align*}
|
|
where $N_i(t)$ is the cumulative number of events up to time $t$ for subject
|
|
$i$, and $Y_i(t)$ is the 0/1 indicator that subject $i$ is at risk.
|
|
|
|
For a survival curve we almost uniformly use the Kaplan-Meier estimate, with
|
|
the Fleming-Harrington as a rare alternative:
|
|
\begin{align*}
|
|
KM(t) &= \prod_{s \le t} (1- \lambda(s)) \\
|
|
FH(t) &= \prod_{s \le t} e^{-\lambda(s)} \\
|
|
&= \exp(-\sum_{s\le t}\lambda(s))
|
|
\end{align*}
|
|
Since $\exp(-x) \approx 1-x$ we could view the KM as a first-order Taylor
|
|
series approximation to the FH.
|
|
|
|
For survival curves based on a Cox model things get a bit more complicated.
|
|
For a fit with covariates $x$ and coefficient $\beta$, the predicted hazard for
|
|
a new subject with covariate vector $z$ will be
|
|
\begin{align*}
|
|
\lambda(t; z) &= \frac{dN(t)}{\sum_i Y_i(t) e^{(x_i- z)'\beta}}
|
|
\end{align*}
|
|
and the Breslow estimate of survival is
|
|
\begin{equation*}
|
|
S(t; z) = \prod_{s \le t} e^{-\lambda(s;z)}
|
|
\end{equation*}
|
|
This is parallel to the Fleming-Harrington form.
|
|
The KM equivalent is never used for a simple reason, which is that for large
|
|
values of $z$ (assume wlog that $\beta$ is positive) $\lambda(t;z)$ may be
|
|
greater than 1, which in turn leads to a negative values of $1-\lambda$,
|
|
i.e., negative values for $S(t)$.
|
|
This most often occurs near the end of the curve, when the number at risk is
|
|
small.
|
|
|
|
\section{Multi-state models}
|
|
|
|
For a multi-state hazards model with m states, there will be a family of
|
|
hazards, one for each possible transition from state $j$ to state $k$
|
|
($j \ne k$).
|
|
\begin{align*}
|
|
\lambda_{jk}(t) &= \frac{dN_{jk}(t)}{\sum_i Y_{ij}(t)} \\
|
|
\lambda_{jk}(t;z) &=\frac{dN_{jk}(t)}{\sum_i Y_{ij}(t) e^{(x_i- z)'\beta}}
|
|
\end{align*}
|
|
$N_{jk}$ counts the cumulative number of $j$ to $k$ transitions, and
|
|
$Y_{ij}(t)$ is 1 if subject $i$ is in state $j$ and at risk for a transition
|
|
(out of state $j$) at time $t$.
|
|
The first equation above defines the non-parametric hazard, and the second the
|
|
predicted hazard from a multi-state hazards model.
|
|
The coefficicient vector $\beta$ will often be different for each $j$:$k$
|
|
transition, but we have omitted that from the notation for simplicity.
|
|
|
|
The $m$ by $m$ intensity matrix $A(t)$ is defined to have off diagonal elements
|
|
$A_{jk}(t) = \lambda_{jk}(t)$ and similarly for $A(t; z)$ based on the
|
|
multi-state hazards (Cox) model.
|
|
The diagonal element is defined such that each row sums to zero,
|
|
$A_{jj} = -\sum_{k\ne j} \lambda_{jk}(t)$.
|
|
Two natural estimates of the probability in state are the Aalen-Johansen
|
|
and exponential estimates
|
|
|
|
\begin{align*}
|
|
AJ(t) &= p(0) \prod_{s \le t} (I + A(s)) \\
|
|
p(t) &= p(0) \prod_{s \le t} e^{A(s)}
|
|
\end{align*}
|
|
Here $p(0)$ is the probabilty distribution at the starting time, which is
|
|
very often $(1, 0, 0, \ldots, 0)$, i.e., everyone starts in state 1,
|
|
and $e$ is the matrix exponential.
|
|
For matrices, $\exp(A)\exp(B) != \exp(A+B)$ unless $AB= BA$, a condition that
|
|
will not hold for our models, so the second equation does not collapse to a sum.
|
|
For both estimates $\sum p(t) =1$ at all time points.
|
|
|
|
For a multi-state Cox model, the first formula has the same flaw as before,
|
|
namely that for some values of $z$ it will lead to negative elements in $p(t)$.
|
|
This normally occurs for high risk subjects (high predicted hazards for one
|
|
or more transitions) and smaller risk sets; the flaw normally does
|
|
not arise in a study with significant censoring as the risk sets never become
|
|
small.
|
|
However, unlike the single endpoint Cox model, this alternate estimate \emph{is}
|
|
suggested as an estimate by many authors, e.g. the Cook and Lawless textbook,
|
|
and also appear as the default in some packages (mstate).
|
|
The survival package follows the Breslow pattern and uses the exponential
|
|
estimate.
|
|
|
|
Accurate and efficient compuation of the matrix exponential is a long-standing
|
|
research topic.
|
|
The simple and direct definition
|
|
\begin{equation*}
|
|
e^A = I + A + A^2/2! + A^3/3! + A^4/4! + \ldots
|
|
\end{equation*}
|
|
is neither efficient or accurate.
|
|
A strong background is provided by the textbook of Higham
|
|
\cite{Higham08}; many of the methods discussed there and some newer refinements
|
|
are incorporated into the expm library in R.
|
|
One important special case is
|
|
for an event time where all the transitions are from a single state.
|
|
The intensity matrix $A$ then will have only a single non-zero row,
|
|
say it is row $j$.
|
|
Then $B= \exp(A)$ will be equal to the identity matrix, for all rows except
|
|
row $j$, and
|
|
\begin{align*}
|
|
B_{jj} &= exp(A_{jj}) \\
|
|
B_{jk} &= (1-B_{jj}) \lambda_{jk}/\sum_{k \ne j} \lambda_{jk}
|
|
\end{align*}
|
|
This occurs for any event time without ties, and also for competing risks.
|
|
It is so common that the survival package uses an internal routine
|
|
\code{survexpm} which
|
|
checks for the case, invoking the expm routine otherwise.
|
|
|
|
One interesting aside is the issue of scaling.
|
|
The identity $\exp(\theta I + A) = exp(\theta) \exp(A)$ means that it is easy
|
|
to pre-scale the diagonal of $A$, which can have an impact on downstream
|
|
compuations.
|
|
Corollary 4.22 of \cite{Higham} shows that for an intensity matrix $A$, the
|
|
optimal scaling is $\theta = \max_j |A_{jj}|$; the resulting matrix
|
|
$B = \theta I + A$ has all elements positive.
|
|
Hence $B^k$ is postive for all powers $k$ and there is no cancellation in the
|
|
simple Taylor series for expm.
|
|
|
|
An $A(t;z)$ matrix that actually occurs in one of our examples has
|
|
|
|
<<>>=
|
|
A = rbind(c(-.2, .1, .1), c(0, -1.1, 1.1), c(0, 0,0))
|
|
expm(A)
|
|
B <- A + 1.1*diag(3)
|
|
exp(-1.1) * expm(B) # verify the formula
|
|
|
|
diag(3) + A # the bad estimate
|
|
diag(3) + A + A^2/2 + A^3/6
|
|
|
|
exp(-1.1) *(diag(3)+ B)
|
|
exp(-1.1) *(diag(3)+ B + B^2/2 + B^3/6)
|
|
@
|
|
|
|
We see that the scaled version converges faster, but $I+B$ is still not a
|
|
particularly good approximation.
|
|
\end{document}
|
|
|
|
|