1186 lines
52 KiB
Plaintext
1186 lines
52 KiB
Plaintext
\documentclass{article}[11pt]
|
|
\usepackage{Sweave}
|
|
\usepackage{amsmath}
|
|
\addtolength{\textwidth}{1in}
|
|
\addtolength{\oddsidemargin}{-.5in}
|
|
\setlength{\evensidemargin}{\oddsidemargin}
|
|
|
|
\SweaveOpts{keep.source=TRUE, fig=FALSE}
|
|
% Ross Ihaka suggestions
|
|
\DefineVerbatimEnvironment{Sinput}{Verbatim} {xleftmargin=2em}
|
|
\DefineVerbatimEnvironment{Soutput}{Verbatim}{xleftmargin=2em}
|
|
\DefineVerbatimEnvironment{Scode}{Verbatim}{xleftmargin=2em}
|
|
\fvset{listparameters={\setlength{\topsep}{0pt}}}
|
|
\renewenvironment{Schunk}{\vspace{\topsep}}{\vspace{\topsep}}
|
|
|
|
\SweaveOpts{prefix.string=adjcurve,width=6,height=4}
|
|
\setkeys{Gin}{width=\textwidth}
|
|
%\VignetteIndexEntry{Adjusted Survival Curves}
|
|
|
|
<<init, echo=FALSE>>=
|
|
options(continue=" ", width=60)
|
|
options(SweaveHooks=list(fig=function() par(mar=c(4.1, 4.1, .3, 1.1))))
|
|
pdf.options(pointsize=8) #text in graph about the same as regular text
|
|
library(survival, quietly=TRUE)
|
|
fdata <- flchain[flchain$futime > 7,]
|
|
fdata$age2 <- cut(fdata$age, c(0,54, 59,64, 69,74,79, 89, 110),
|
|
labels = c(paste(c(50,55,60,65,70,75,80),
|
|
c(54,59,64,69,74,79,89), sep='-'), "90+"))
|
|
@
|
|
|
|
\title{Adjusted Survival Curves}
|
|
\author{Terry M Therneau, Cynthia S Crowson, Elizabeth J Atkinson}
|
|
\date{Jan 2015}
|
|
\newcommand{\myfig}[1]{\includegraphics[height=!, width=\textwidth]
|
|
{adjcurve-#1.pdf}}
|
|
|
|
\begin{document}
|
|
\maketitle
|
|
|
|
\section{Introduction}
|
|
Suppose we want to investigate to what extent some factor influences
|
|
survival, as an example we might compare the experience of
|
|
diabetic patients who are using metformin versus those on injected
|
|
insulin as their primary treatment modality.
|
|
There is some evidence that metformin has a positive influence,
|
|
particularly in cancers,
|
|
but the ascertainment is confounded by the fact that it is a
|
|
first line therapy:
|
|
the patients on metformin will on average be younger and have had
|
|
a diabetes diagnosis for a shorter amount of time than those
|
|
using insulin.
|
|
``Young people live longer'' is not a particularly novel observation.
|
|
|
|
|
|
The ideal way to test this is with a controlled clinical trial.
|
|
This is of course not always possible, and assessments using available
|
|
data that includes and adjusts for such confounders is also needed.
|
|
There is extensive literature --- and debate --- on this topic in the
|
|
areas of modeling and testing.
|
|
The subtopic of how to create honest survival curve estimates in the
|
|
presence of confounders is less well known, and is the focus
|
|
of this note.
|
|
|
|
Assume that we have an effect of interest, treatment say,
|
|
and a set of possible confounding variables.
|
|
Creation a pair of adjusted survival curves has two parts:
|
|
definition of a reference population for the confounders,
|
|
and then the computation of estimated curves for
|
|
that population. There are important choices in both steps.
|
|
The first, definition of a target, is often not explicitly stated but
|
|
can be critical.
|
|
If an outcome differs with age, myocardial infarction say, and two treatments
|
|
also had age dependent efficacy, then the comparison will depend greatly on
|
|
whether we are talking about a population of young, middle aged, or older
|
|
subjects.
|
|
|
|
The computational step has two main approaches.
|
|
The first, sometimes known as \emph{marginal} analysis, first reweights
|
|
the data such that each subgroup's weighted distribution matches that
|
|
of our population target.
|
|
An immediate consequence is that all subgroups will be
|
|
balanced with respect to the confounding variables.
|
|
We can then proceed with a simple analysis of survival using the
|
|
reformulated data, ignoring the confounders.
|
|
The second approach seeks to understand and model the effect of each
|
|
confounder, with this we can then correct for them.
|
|
From a comprehensive overall model we can obtain predicted survival curves
|
|
for any configuration of variables, and from these get predicted overall
|
|
curves for the reference population.
|
|
This is often called the \emph{conditional} approach since we are
|
|
using conditional survival curves given covariates $x$.
|
|
|
|
A third but more minor choice is division of the covariates $x$ into effects
|
|
of interest vs. confounders.
|
|
For instance, we might want to see separate curves for two treatments, each
|
|
adjusted for age and sex. The reference population will describe the age and
|
|
sex distribution. For simplicity we will use $x$ to describe all the
|
|
confounding variables and use $c$ for the control variable(s), e.g. treatment.
|
|
The set $c$ might be empty, producing a single overall curve, but this is
|
|
the uncommon case.
|
|
As shown below, our two methods differ essentially in the \emph{order}
|
|
in which the two necessary operations are done, balancing and survival
|
|
curve creation.
|
|
\begin{center}
|
|
\begin{tabular}{rccc}
|
|
Marginal: & balance data on $x$ & $\longrightarrow$ &
|
|
form survival curves for each $c$\\
|
|
Conditional: & predicted curves for $\{x,c\}$ subset
|
|
& $\longrightarrow$ & average the predictions for each $c$
|
|
\end{tabular}
|
|
\end{center}
|
|
We can think of them as ``balance and then model'' versus
|
|
``model then balance''.
|
|
An analysis might use a combinations of these, of course, balancing on
|
|
some factors and modeling others.
|
|
All analyses are marginal analyses with respect to important predictors
|
|
that are unknown to us, although in that case we have no assurance of
|
|
balance on those factors.
|
|
|
|
\begin{figure}[tb]
|
|
\myfig{flc1}
|
|
\caption{Survival of \Sexpr{nrow(flchain)} residents of Olmsted County,
|
|
broken into three cohorts based on FLC value.}
|
|
\label{flc1}
|
|
\end{figure}
|
|
|
|
\section{Free Light Chain}
|
|
Our example data set for this comparison uses a particular
|
|
assay of plasma immunoglobulins and is based on work of Dr Angela Dispenzieri
|
|
and her colleagues at the Mayo Clinic \cite{Dispenzieri12}.
|
|
In brief: plasma cells (PC) are responsible for the production of
|
|
immunoglobulins, but PC comprise only a small portion ($<1$\%) of the
|
|
total blood and marrow hematapoetic cell population in normal patients.
|
|
The normal human repertoire is estimated to contain over
|
|
$10^{8}$ unique immunoglobulins, conferring a broad range of immune
|
|
protection. In multiple myeloma, the most common form of plasma
|
|
cell malignancy, almost all of
|
|
the circulating antigen will be identical, the product of a single malignant
|
|
clone.
|
|
An electrophoresis examination of circulating immunoglobulins will
|
|
exhibit a ``spike'' corresponding to this unique molecule.
|
|
This anomaly is used both as a diagnostic method and in monitoring
|
|
the course of the disease under treatment.
|
|
|
|
The presence of a similar, albeit much smaller, spike in normal patients has
|
|
been a long term research interest of the Mayo Clinic
|
|
hematology research group \cite{Kyle93}. In 1995
|
|
Dr Robert Kyle undertook a population based study of this, and collected
|
|
serum samples on 19,261 of the 24,539 residents of Olmsted County, Minnesota,
|
|
aged 50 years or more \cite{Kyle06}.
|
|
In 2010 Dr. Angela Dispenzieri assayed a sub fraction of the immunoglobulins,
|
|
the free light chain (FLC), on 15,748 of these subjects who had sufficient
|
|
remaining sera from the original sample collection.
|
|
All studies took place under the oversight of the appropriate Institutional
|
|
Review Boards, which ensure rigorous safety and ethical standards in
|
|
research.
|
|
|
|
A subset of the Dispenzieri study is available in the survival
|
|
package as data set \texttt{flchain}.
|
|
Because the original study assayed nearly the entire population, there is
|
|
concern that some portions of the anonymized data could be linked to actual
|
|
subjects by a diligent searcher, and so only a subset of the study has been
|
|
made available as a measure to strengthen anonymity.
|
|
It was randomly selected from the whole within sex and age
|
|
group strata so as to preserve the age/sex structure.
|
|
The data set contains 3 subjects whose blood sample was obtained on the day
|
|
of their death. It is rather odd to think of a sample obtained on
|
|
the final day as ``predicting'' death, or indeed for any results obtained
|
|
during a patient's final mortality cascade.
|
|
There are also a few patients with no follow-up beyond the clinic visit
|
|
at which the assay occurred.
|
|
We have chosen in this analysis to exclude
|
|
the handful of subjects with less
|
|
than 7 days of follow-up, leaving \Sexpr{nrow(fdata)} observations.
|
|
|
|
|
|
\begin{table} \centering
|
|
\begin{tabular}{l|cccc}
|
|
& 50--59 & 60--69 & 70--79 & 80+ \\ \hline
|
|
|
|
<<echo=FALSE, fig=FALSE, results=tex>>=
|
|
group3 <- factor(1+ 1*(fdata$flc.grp >7) + 1*(fdata$flc.grp >9),
|
|
levels=1:3,
|
|
labels=c("FLC < 3.38", "3.38 - 4.71", "FLC > 4.71"))
|
|
age1 <- cut(fdata$age, c(49,59,69,79, 110))
|
|
levels(age1) <- c(paste(c(50,60,70), c(59,69,79), sep='-'), '80+')
|
|
temp1 <- table(group3, age1)
|
|
temp2 <- round(100* temp1/rowSums(temp1))
|
|
pfun <- function(x,y) {
|
|
paste(ifelse(x<1000, "\\phantom{0}", ""), x, " (",
|
|
ifelse(y<10, "\\phantom{0}", ""), y, ") ", sep="")
|
|
}
|
|
cat(paste(c("FLC $<$ 3.38", pfun(temp1[1,], temp2[1,])), collapse=" & "), "\\\\\n")
|
|
cat(paste(c("FLC 3.38--4.71", pfun(temp1[2,], temp2[2,])), collapse=" & "), "\\\\\n")
|
|
cat(paste(c("FLC $>$ 4.71", pfun(temp1[3,], temp2[3,])), collapse=" & "), "\n")
|
|
@
|
|
\end{tabular}
|
|
\caption{Comparison of the age distributions (percents) for each of the
|
|
three groups.}
|
|
\label{tflc1}
|
|
\end{table}
|
|
|
|
Figure \ref{flc1} shows the survival curves for three subgroups of
|
|
the patients: those whose total free light chain (FLC) is in the upper
|
|
10\% of all values found in the full study,
|
|
those in the 70--89th percentile, and the remainder.
|
|
There is a clear survival effect.
|
|
Average free light chain amounts rise with age, however,
|
|
at least in part because it is eliminated through the kidneys and renal
|
|
function declines with age.
|
|
Table \ref{tflc1} shows the age distribution for each of the three groups.
|
|
In the highest decile of FLC (group 3) over half the subjects are
|
|
age 70 or older compared to only 23\% in those below the 70th percentile.
|
|
How much of the survival difference is truly associated with FLC
|
|
and how much is simply an artifact of age?
|
|
(The cut points are arbitrary, but we have chosen to mimic the original study
|
|
and retain them. Division into three groups is a convenient number to
|
|
illustrate the methods in this vignette, but we do not make any claim that
|
|
such a categorization is optimal or even sensible statistical practice.)
|
|
The R code for figure 1 is shown below.
|
|
<<flc1, fig=TRUE, include=FALSE>>=
|
|
fdata <- flchain[flchain$futime >=7,]
|
|
fdata$age2 <- cut(fdata$age, c(0,54, 59,64, 69,74,79, 89, 110),
|
|
labels = c(paste(c(50,55,60,65,70,75,80),
|
|
c(54,59,64,69,74,79,89), sep='-'), "90+"))
|
|
fdata$group <- factor(1+ 1*(fdata$flc.grp >7) + 1*(fdata$flc.grp >9),
|
|
levels=1:3,
|
|
labels=c("FLC < 3.38", "3.38 - 4.71", "FLC > 4.71"))
|
|
|
|
sfit1 <- survfit(Surv(futime, death) ~ group, fdata)
|
|
plot(sfit1, mark.time=F, col=c(1,2,4), lty=1, lwd=2,
|
|
xscale=365.25, xlab="Years from Sample",
|
|
ylab="Survival")
|
|
text(c(11.1, 10.5, 7.5)*365.25, c(.88, .57, .4),
|
|
c("FLC < 3.38", "3.38 - 4.71", "FLC > 4.71"), col=c(1,2,4))
|
|
@
|
|
|
|
\section{Reference populations}
|
|
There are a few populations that are commonly used as the
|
|
reference group.
|
|
\begin{enumerate}
|
|
\item Empirical. The overall distribution of confounders $x$ in the
|
|
data set as a whole. For this study we would use the observed
|
|
age/sex distribution, ignoring FLC group. This is also
|
|
called the ``sample'' or ``data'' distribution.
|
|
\item External reference. The distribution from some external
|
|
study or standard.
|
|
\item Internal reference. A particular subset of the data is
|
|
chosen as the reference, and other subsets are then aligned with
|
|
it.
|
|
\end{enumerate}
|
|
Method 2 is common in epidemiology, using a reference population based
|
|
on a large external population such as the age/sex distribution of the
|
|
2000 United States census. Method 3 most often arises in
|
|
the case-control setting, where one group is small and precious
|
|
(a rare disease say) and the other group (the controls) from which we can
|
|
sample is much larger.
|
|
In each case the final result of the computation can be thought of as the
|
|
expected answer we ``would obtain'' in a study that was perfectly balanced with
|
|
respect to the list of confounders $x$.
|
|
Population 1 is the most frequent.
|
|
|
|
|
|
\section{Marginal approach}
|
|
\begin{table} \centering
|
|
\begin{tabular}{crrrrrrrr}
|
|
\multicolumn{3}{c}{Females} \\
|
|
& \multicolumn{8}{c}{Age} \\
|
|
FLC group & 50--54& 55--59& 60--64 & 65--69 & 70--74 & 75--79 & 80--89& 90+ \\
|
|
\hline
|
|
<<echo=F, results=tex>>=
|
|
|
|
tab1 <- with(fdata, table(group, age2, sex))
|
|
cat("Low&", paste(tab1[1,,1], collapse=" &"), "\\\\\n")
|
|
cat("Med&", paste(tab1[2,,1], collapse=" &"), "\\\\\n")
|
|
cat("High&", paste(tab1[3,,1], collapse=" &"), "\\\\\n")
|
|
@
|
|
\\ \multicolumn{3}{c}{Males} \\
|
|
% & 50--54& 55--59& 60--64 & 65--69 & 70--74 & 75--79 & 80--89& 90+ \\ \hline
|
|
|
|
<<echo=F, results=tex>>=
|
|
cat("Low&", paste(tab1[1,,2], collapse=" &"), "\\\\\n")
|
|
cat("Med&", paste(tab1[2,,2], collapse=" &"), "\\\\\n")
|
|
cat("High&", paste(tab1[3,,2], collapse=" &"), "\n")
|
|
@
|
|
\end{tabular}
|
|
\caption{Detailed age and sex distribution for the study population}
|
|
\label{tab2}
|
|
\end{table}
|
|
|
|
\subsection{Selection}
|
|
One approach for balancing is to select a subset of the data
|
|
such that its distribution matches
|
|
the referent for each level of $c$, i.e., for each curve that we wish to
|
|
obtain.
|
|
As an example we take a case-control like approach to the FLC data,
|
|
with FLC high
|
|
as the ``cases'' since it is the smallest group.
|
|
Table \ref{tab2} shows a detailed distribution of the data with
|
|
respect to age and sex.
|
|
The balanced subset has all \Sexpr{tab1[3,1,1]}
|
|
females aged 50--54 from the high FLC group,
|
|
a random sample of \Sexpr{tab1[3,1,1]} out of the \Sexpr{tab1[1,1,1]}
|
|
females in the age 50--54 low FLC group,
|
|
and \Sexpr{tab1[3,1,1]} out of \Sexpr{tab1[2,1,1]} for the middle FLC.
|
|
Continue this for all age/sex subsets.
|
|
We cannot \emph{quite} compute a true case-control estimate for this
|
|
data since there are not enough ``controls'' in the female 90+ category
|
|
to be able to select one unique control for each case,
|
|
and likewise in the male 80-89 and 90+ age groups.
|
|
To get around this we will sample with
|
|
replacement in these strata.
|
|
|
|
\begin{figure}[tb]
|
|
\myfig{flc2}
|
|
\caption{Survival curves from a case-control sample are shown as solid
|
|
lines, dashed lines are curves for the unweighted data set
|
|
(as found in figure \ref{flc1}).}
|
|
\label{flc2}
|
|
\end{figure}
|
|
<<flc2, echo=TRUE, fig=TRUE, include=FALSE>>=
|
|
temp <- with(fdata, table(group, age2, sex))
|
|
dd <- dim(temp)
|
|
|
|
# Select subjects
|
|
set.seed(1978)
|
|
select <- array(vector('list', length=prod(dd)), dim=dd)
|
|
for (j in 1:dd[2]) {
|
|
for (k in 1:dd[3]) {
|
|
n <- temp[3,j,k] # how many to select
|
|
for (i in 1:2) {
|
|
indx <- which(as.numeric(fdata$group)==i &
|
|
as.numeric(fdata$age2) ==j &
|
|
as.numeric(fdata$sex) ==k)
|
|
select[i,j,k] <- list(sample(indx, n, replace=(n> temp[i,j,k])))
|
|
}
|
|
indx <- which(as.numeric(fdata$group)==3 &
|
|
as.numeric(fdata$age2) ==j &
|
|
as.numeric(fdata$sex) ==k)
|
|
select[3,j,k] <- list(indx) #keep all the group 3 = high
|
|
}
|
|
}
|
|
|
|
data2 <- fdata[unlist(select),]
|
|
sfit2 <- survfit(Surv(futime, death) ~ group, data2)
|
|
plot(sfit2,col=c(1,2,4), lty=1, lwd=2,
|
|
xscale=365.25, xlab="Years from Sample",
|
|
ylab="Survival")
|
|
lines(sfit1, col=c(1,2,4), lty=2, lwd=1,
|
|
xscale=365.25)
|
|
legend(730, .4, levels(fdata$group), lty=1, col=c(1,2,4),
|
|
bty='n', lwd=2)
|
|
@
|
|
|
|
|
|
|
|
%\begin{table}[tb] \centering
|
|
% \begin{tabular}{ccccccc}
|
|
% &\multicolumn{2}{c}{FLC low} & \multicolumn{2}{c}{FLC med}&
|
|
% \multicolumn{2}{c}{FLC high} \\
|
|
% & Total & Subset & Total & Subset & Total & Subset \\ \hline
|
|
%<<echo=FALSE, results=tex>>=
|
|
%tab3 <- with(fdata, table(age2, group))
|
|
%tab3 <- round(100*scale(tab3, center=F, scale=colSums(tab3)))
|
|
%tab4 <- with(data2, table(age2, group))
|
|
%tab4 <- round(100*scale(tab4, center=F, scale=colSums(tab4)))
|
|
%tab5 <- cbind(tab3[,1], tab4[,1], tab3[,2], tab4[,2], tab3[,3], tab4[,3])
|
|
%pfun <- function(x) paste(ifelse(x<10, paste("\\phantom{0}", x), x),
|
|
% collapse=" &")
|
|
%dtemp <- dimnames(tab5)[[1]]
|
|
%for (j in 1:7)
|
|
% cat(dtemp[j], " &", pfun(tab5[j,]), "\\\\\n")
|
|
%cat(dtemp[8], " & ", pfun(tab5[8,]), "\n")
|
|
%@
|
|
%\end{tabular}
|
|
%\caption{Age distributions (\%) of the original data set along with that of
|
|
% the subset, for the three FLC groups.}
|
|
%\label{tflc2}
|
|
%\end{table}
|
|
|
|
The survival curves for the subset data are shown in figure \ref{flc2}.
|
|
The curve for the high risk group is unchanged, since by definition
|
|
all of those subjects were retained. We see that adjustment for
|
|
age and sex has reduced the apparent survival difference between
|
|
the groups by about half, but a clinically important effect for
|
|
high FLC values remains.
|
|
The curve for group 1 has moved more than that for group 2 since the
|
|
age/sex adjustment is more severe for that group.
|
|
|
|
<<echo=FALSE>>=
|
|
# I can't seem to put this all into an Sexpr
|
|
z1 <- with(fdata,table(age, sex, group))
|
|
z2<- apply(z1, 1:2, min)
|
|
ztemp <- 3*sum(z2)
|
|
z1b <- with(fdata, table(age>64, sex, group))
|
|
ztemp2 <- sum(apply(z1b, 1:2, min))
|
|
@
|
|
|
|
In actual practice, case-control designs arise when matching and
|
|
selection can occur \emph{before} data collection, leading to a
|
|
substantial decrease in the amount of data that needs to be
|
|
gathered and a consequent cost or time savings.
|
|
When a data set is already in hand it has two major disadvantages.
|
|
The first is that the approach wastes data; throwing away information in
|
|
order to achieve balance is always a bad idea.
|
|
Second is that though it returns an unbiased comparison, the result
|
|
is for a very odd reference population.
|
|
|
|
One advantage of matched subsets is that standard variance calculations
|
|
for the curves are correct; the values provided by the usual Kaplan-Meier
|
|
program need no further processing.
|
|
We can also use the usual statistical tests to check for
|
|
differences between the curves.
|
|
<<>>=
|
|
survdiff(Surv(futime, death) ~ group, data=data2)
|
|
@
|
|
|
|
\subsection{Reweighting}
|
|
\label{sect:logistic}
|
|
A more natural way to adjust the data distribution is by weighting.
|
|
Let $\pi(a,s)$, $a$ = age group, $s$ = sex be a target population age/sex
|
|
distribution for our graph,
|
|
and $p(a,s,i)$ the observed probability of each age/sex/group combination
|
|
in the data. Both $\pi$ and $p$ sum to 1.
|
|
Then if each observation in the data set is given a case weight of
|
|
\begin{equation}
|
|
w_{asi} = \frac{\pi(a,s)}{p(a,s,i)} \label{wt1}
|
|
\end{equation}
|
|
the weighted age/sex distribution for each of the groups will equal
|
|
the target distribution $\pi$.
|
|
An obvious advantage of this approach
|
|
is that the resulting curves represent a tangible and well defined group.
|
|
|
|
As an example, we will first adjust our curves to match the age/sex distribution
|
|
of the 2000 US population, a common reference target in epidemiology studies.
|
|
The \texttt{uspop2} data set is found in later releases of the survival
|
|
package in R. It is an array of counts with dimensions of age,
|
|
sex, and calendar year. We only want ages of 50 and over, and the population
|
|
data set has collapsed ages of 100 and over into a single category.
|
|
We create a table \texttt{tab100} of observed age/sex counts within group
|
|
for our own data, using the same upper age threshold.
|
|
New weights are the values $\pi/p$ = \texttt{pi.us/tab100}.
|
|
<<>>=
|
|
refpop <- uspop2[as.character(50:100),c("female", "male"), "2000"]
|
|
pi.us <- refpop/sum(refpop)
|
|
age100 <- factor(ifelse(fdata$age >100, 100, fdata$age), levels=50:100)
|
|
tab100 <- with(fdata, table(age100, sex, group))/ nrow(fdata)
|
|
us.wt <- rep(pi.us, 3)/ tab100 #new weights by age,sex, group
|
|
range(us.wt)
|
|
@
|
|
There are infinite weights! This is because the US population has coverage
|
|
at all ages, but our data set does not have representatives in every
|
|
age/sex/FLC group combination; there are for instance no 95 year old males in
|
|
in the data set.
|
|
Let us repeat the process, collapsing the US population from single years into
|
|
the 8 age groups used previously in table \ref{tab2}.
|
|
Merging the per age/sex/group weights found in the 3-dimensional array
|
|
\texttt{us.wt} into the data set as per-subject weights uses matrix
|
|
subscripts, a useful but less known feature of R.
|
|
<<>>=
|
|
temp <- as.numeric(cut(50:100, c(49, 54, 59, 64, 69, 74, 79, 89, 110)+.5))
|
|
pi.us<- tapply(refpop, list(temp[row(refpop)], col(refpop)), sum)/sum(refpop)
|
|
tab2 <- with(fdata, table(age2, sex, group))/ nrow(fdata)
|
|
us.wt <- rep(pi.us, 3)/ tab2
|
|
range(us.wt)
|
|
index <- with(fdata, cbind(as.numeric(age2), as.numeric(sex),
|
|
as.numeric(group)))
|
|
fdata$uswt <- us.wt[index]
|
|
sfit3a <-survfit(Surv(futime, death) ~ group, data=fdata, weights=uswt)
|
|
@
|
|
\begin{figure}[tb]
|
|
\myfig{flc3a}
|
|
\caption{Population totals for the US reference (red) and for the observed
|
|
data set (black).}
|
|
\label{flc3a}
|
|
\end{figure}
|
|
|
|
A more common choice is to use the overall age/sex distribution of the
|
|
sample itself as
|
|
our target distribution $\pi$, i.e., the empirical distribution.
|
|
Since FLC data set is population based and has excellent coverage
|
|
of the county, this will not differ greatly from the US population in this
|
|
case, as is displayed in figure \ref{flc3a}.
|
|
|
|
<<flc3a, fig=TRUE, include=FALSE >>=
|
|
tab1 <- with(fdata, table(age2, sex))/ nrow(fdata)
|
|
matplot(1:8, cbind(pi.us, tab1), pch="fmfm", col=c(2,2,1,1),
|
|
xlab="Age group", ylab="Fraction of population",
|
|
xaxt='n')
|
|
axis(1, 1:8, levels(fdata$age2))
|
|
|
|
tab2 <- with(fdata, table(age2, sex, group))/nrow(fdata)
|
|
tab3 <- with(fdata, table(group)) / nrow(fdata)
|
|
|
|
rwt <- rep(tab1,3)/tab2
|
|
fdata$rwt <- rwt[index] # add per subject weights to the data set
|
|
sfit3 <- survfit(Surv(futime, death) ~ group, data=fdata, weights=rwt)
|
|
|
|
temp <- rwt[,1,] #show female data
|
|
temp <- temp %*% diag(1/apply(temp,2,min))
|
|
round(temp, 1) #show female data
|
|
@
|
|
|
|
\begin{figure}[tb]
|
|
\myfig{flc3}
|
|
\caption{Survival curves for the three groups using reweighted data
|
|
are shown with solid lines, the original unweighted analysis
|
|
as dashed lines. The heavier solid line adjusts to the Olmsted
|
|
population and the lighter one to the US population.}
|
|
\label{flc3}
|
|
\end{figure}
|
|
|
|
<<flc3, fig=TRUE, include=FALSE>>=
|
|
plot(sfit3, mark.time=F, col=c(1,2,4), lty=1, lwd=2,
|
|
xscale=365.25, xlab="Years from Sample",
|
|
ylab="Survival")
|
|
lines(sfit3a, mark.time=F, col=c(1,2,4), lty=1, lwd=1,
|
|
xscale=365.25)
|
|
lines(sfit1, mark.time=F, col=c(1,2,4), lty=2, lwd=1,
|
|
xscale=365.25)
|
|
legend(730, .4, levels(fdata$group), lty=1, col=c(1,2,4),
|
|
bty='n', lwd=2)
|
|
@
|
|
|
|
The calculation of weights is shown above, and finishes with a table
|
|
of the weights for the females. The table was scaled so as
|
|
to have a minimum weight
|
|
of 1 in each column for simpler reading.
|
|
We see that for the low FLC group there are larger weights for
|
|
the older ages, whereas the high FLC group requires substantial
|
|
weights for the youngest ages in order to achieve balance.
|
|
The resulting survival curve is shown in figure \ref{flc3}. The
|
|
distance between the adjusted curves is similar to the results
|
|
from subset selection, which is as expected since both approaches
|
|
are correcting for the same bias, but results are now for an overall population
|
|
distribution that matches Olmsted County.
|
|
The curves estimate what the results would have looked like, had
|
|
each of the FLC groups contained the full distribution of ages.
|
|
|
|
Estimation based on reweighted data is a common theme in survey sampling.
|
|
Correct standard errors for the curves are readily computed using
|
|
methods from that literature, and are available in some software packages.
|
|
In R the \texttt{svykm} routine in the \texttt{survey} package handles both this
|
|
simple case and more complex sampling schemes.
|
|
Tests of the curves can be done using a weighted Cox model;
|
|
the robust variance produced by \texttt{coxph} is identical to the standard
|
|
Horvitz-Thompsen variance estimate used in survey sampling \cite{Binder92}.
|
|
The robust score test from \texttt{coxph} corresponds to a log-rank test
|
|
corrected for weighting.
|
|
(In the example below the svykm function is only run if the survey package
|
|
is already loaded, as the variance calculation is very slow for this large
|
|
data set.)
|
|
<<>>=
|
|
id <- 1:nrow(fdata)
|
|
cfit <- coxph(Surv(futime, death) ~ group, data=fdata, cluster=id,
|
|
weights=rwt)
|
|
summary(cfit)$robscore
|
|
|
|
if (exists("svykm")) { #true if the survey package is loaded
|
|
sdes <- svydesign(id = ~0, weights=~rwt, data=fdata)
|
|
dfit <- svykm(Surv(futime, death) ~ group, design=sdes, se=TRUE)
|
|
}
|
|
@
|
|
Note: including the \texttt{cluster} term in the coxph call causes it to
|
|
treat the weights as resampling values and thus use the proper
|
|
survey sampling style variance.
|
|
The default without that term would be to treat the case weights as
|
|
replication counts.
|
|
This same alternate variance estimate is also called for when there are
|
|
correlated observations; many users will be more familiar with the
|
|
cluster statement in that context.
|
|
|
|
\paragraph{Inverse probability weighting}
|
|
Notice that when using the overall population as the target distribution
|
|
$\pi$ we can use Bayes rule to rewrite the weights as
|
|
\begin{align*}
|
|
\frac{1}{w_{asi}} &= \frac{{\rm Pr}({\rm age}=a, {\rm sex} =s,
|
|
{\rm group}=i)}
|
|
{{\rm Pr}({\rm age}=a, {\rm sex} =s)} \\
|
|
&= {\rm Pr}({\rm group}=i | {\rm age}=a, {\rm sex} =s)
|
|
\end{align*}
|
|
|
|
This last is precisely the probability estimated by a logistic
|
|
regression model, leading to \emph{inverse probability weighting}
|
|
as an alternate label for this approach.
|
|
We can reproduce the weights calculated just above with three
|
|
logistic regression models.
|
|
<<ipw>>=
|
|
options(na.action="na.exclude")
|
|
gg <- as.numeric(fdata$group)
|
|
lfit1 <- glm(I(gg==1) ~ factor(age2) * sex, data=fdata,
|
|
family="binomial")
|
|
lfit2 <- glm(I(gg==2) ~ factor(age2) * sex, data=fdata,
|
|
family="binomial")
|
|
lfit3 <- glm(I(gg==3) ~ factor(age2) * sex, data=fdata,
|
|
family="binomial")
|
|
temp <- ifelse(gg==1, predict(lfit1, type='response'),
|
|
ifelse(gg==2, predict(lfit2, type='response'),
|
|
predict(lfit3, type='response')))
|
|
all.equal(1/temp, fdata$rwt)
|
|
@
|
|
If there were only 2 groups then only a single regression model is
|
|
needed since P(group 2) = 1 - P(group 1).
|
|
Note the setting of na.action, which causes the predicted vector to have
|
|
the same length as the original data even when there are missing values.
|
|
This simplifies merging the derived weights with the original data set.
|
|
|
|
An advantage of the regression framework is that one can easily
|
|
accommodate more variables by using a model with additive terms
|
|
and only a few selected interactions, and the model can contain continuous
|
|
as well as categorical predictors.
|
|
The disadvantage is that such models are often used without the necessary
|
|
work to check their validity.
|
|
For instance models with \texttt{age + sex} could have been used above.
|
|
This makes the assumption that the odds of being a member of group 1
|
|
is linear in age and with the same slope for males and females; ditto
|
|
for the models for group 2 and group 3.
|
|
How well does this work?
|
|
Since the goal of reweighting is to standardize the ages,
|
|
a reasonable check is to compute and plot the reweighted
|
|
age distribution for each flc group.
|
|
|
|
\begin{figure}[tb]
|
|
\myfig{flc4}
|
|
\caption{The re-weighted age distribution using logistic regression
|
|
with continuous age, for females, FLC groups 1--3.
|
|
The target distribution is shown as a ``+''.
|
|
The original unadjusted distribution is shown as dashed lines.}
|
|
\label{flc4}
|
|
\end{figure}
|
|
Figure \ref{flc4} shows the result.
|
|
The reweighted age distribution is not perfectly balanced, i.e., the
|
|
`1', `2' and `3' symbols do no exactly overlay one another,
|
|
but in this case the simple linear
|
|
model has done an excellent job.
|
|
We emphasize that whenever the reweighting is based on a simplified model
|
|
then such a check is obligatory.
|
|
It is quite common that a simple model is not sufficient and the resulting
|
|
weight adjustment is inadequate.
|
|
Like a duct tape auto repair, proceeding forward as though
|
|
the underlying problem has been addressed is then most unwise.
|
|
<<flc4, fig=T, include=F>>=
|
|
lfit1b <-glm(I(gg==1) ~ age + sex, data=fdata,
|
|
family="binomial")
|
|
lfit2b <- glm(I(gg==2) ~ age +sex, data=fdata,
|
|
family="binomial")
|
|
lfit3b <- glm(I(gg==3) ~ age + sex, data=fdata,
|
|
family="binomial")
|
|
|
|
# weights for each group using simple logistic
|
|
twt <- ifelse(gg==1, 1/predict(lfit1b, type="response"),
|
|
ifelse(gg==2, 1/predict(lfit2b, type="response"),
|
|
1/predict(lfit3b, type="response")))
|
|
tdata <- data.frame(fdata, lwt=twt)
|
|
|
|
#grouped plot for the females
|
|
temp <- tdata[tdata$sex=='F',]
|
|
temp$gg <- as.numeric(temp$group)
|
|
c1 <- with(temp[temp$gg==1,], tapply(lwt, age2, sum))
|
|
c2 <- with(temp[temp$gg==2,], tapply(lwt, age2, sum))
|
|
c3 <- with(temp[temp$gg==3,], tapply(lwt, age2, sum))
|
|
|
|
xtemp <- outer(1:8, c(-.1, 0, .1), "+") #avoid overplotting
|
|
ytemp <- 100* cbind(c1/sum(c1), c2/sum(c2), c3/sum(c3))
|
|
|
|
matplot(xtemp, ytemp, col=c(1,2,4),
|
|
xlab="Age group", ylab="Weighted frequency (%)", xaxt='n')
|
|
ztab <- table(fdata$age2)
|
|
points(1:8, 100*ztab/sum(ztab), pch='+', cex=1.5, lty=2)
|
|
# Add the unadjusted
|
|
temp <- tab2[,1,]
|
|
temp <- scale(temp, center=F, scale=colSums(temp))
|
|
matlines(1:8, 100*temp, pch='o', col=c(1,2,4), lty=2)
|
|
axis(1, 1:8, levels(fdata$age2))
|
|
@
|
|
|
|
\paragraph{Rescaled weights}
|
|
As the weights were defined in equation \ref{wt1}, the sum of weights
|
|
for each of the groups is \Sexpr{nrow(fdata)},
|
|
the number of observations in the
|
|
data set. Since the number of subjects in group 3 is one seventh
|
|
of that in group 1, the average weight in group 3 is much larger.
|
|
An alternative is to define weights in terms of the \emph{within}
|
|
group distribution rather than the overall distribution, leading to
|
|
the rescaled weights $w^*$
|
|
\begin{align}
|
|
w^* &= \frac{\pi(a,s)}{p(a,s|i)} \label{wt2} \\
|
|
&= \frac{{\rm P}({\rm group}=i)}
|
|
{{\rm P}({\rm group}=i | {\rm age}=a, {\rm sex}=s)} \label{wt2b}
|
|
\end{align}
|
|
Each group's weights are rescaled by the overall prevalence of the group.
|
|
In its simplest form, the weights in each group are scaled to add up to the
|
|
number of subjects in the group.
|
|
<<>>=
|
|
# compute new weights
|
|
wtscale <- table(fdata$group)/ tapply(fdata$rwt, fdata$group, sum)
|
|
wt2 <- c(fdata$rwt * wtscale[fdata$group])
|
|
c("rescaled cv"= sd(wt2)/mean(wt2), "rwt cv"=sd(fdata$rwt)/mean(fdata$rwt))
|
|
|
|
cfit2a <- coxph(Surv(futime, death) ~ group, cluster = id,
|
|
data=fdata, weights= rwt)
|
|
cfit2b <- coxph(Surv(futime, death) ~ group, cluster = id,
|
|
data=fdata, weights=wt2)
|
|
round(c(cfit2a$rscore, cfit2b$rscore),1)
|
|
@
|
|
The rescaling results in weights that are much less variable
|
|
across groups.
|
|
This operation has no impact on the individual survival curves or
|
|
their standard errors, since within group we have multiplied all weights
|
|
by a constant.
|
|
When comparing curves across groups, however, the rescaled weights
|
|
reduce the standard error of the test statistic.
|
|
This results in increased power for the robust score test, although in
|
|
this particular data set the improvement is not very large.
|
|
|
|
\section{Conditional method}
|
|
In the marginal approach we first balance the data set and then compute
|
|
results on the adjusted data.
|
|
In the conditional approach we first compute a predicted survival curve
|
|
for each subject that accounts for flc group, age and sex,
|
|
and then take a weighted average of the curves to get an overall estimate
|
|
for each flc group.
|
|
For both methods a central consideration is the population of interest,
|
|
which drives the weights. Modeling has not removed the question of
|
|
\emph{who} these curves should represent, it has simply changed the order of
|
|
operation between the weighting step and the survival curves step.
|
|
|
|
\subsection{Stratification}
|
|
Our first approach is to subset the data into homogeneous age/sex strata,
|
|
compute survival curves within each strata, and then combine results.
|
|
We will use the same age/sex combinations as before.
|
|
The interpretation of these groups is different, however.
|
|
In the marginal approach it was important to find age/sex groups for
|
|
which the probability of membership within each FLC group was constant
|
|
within the strata (independent of age and sex, within strata),
|
|
in this case it is important that the survival for each FLC group is
|
|
constant in each age/sex stratum. Homogeneity of
|
|
membership within each stratum and homogeneity of survival within each
|
|
stratum may lead to different partitions for some data sets.
|
|
|
|
Computing curves for all the combinations is easy.
|
|
<<strata>>=
|
|
allfit <- survfit(Surv(futime, death) ~ group +
|
|
age2 + sex, fdata)
|
|
temp <- summary(allfit)$table
|
|
temp[1:6, c(1,4)] #abbrev printout to fit page
|
|
@
|
|
The resultant survival object has 48 curves: 8 age groups * 2 sexes *
|
|
3 FLC groups.
|
|
To get a single curve for the first FLC group we need to take a weighted
|
|
average over the 16 age/sex combinations that apply to that group,
|
|
and similarly for the second and third FLC subset.
|
|
Combining the curves is a bit of a nuisance computationally
|
|
because each of them is reported on a different set of time points.
|
|
A solution is to use the
|
|
\texttt{summary} function for survfit objects along with the \texttt{times}
|
|
argument of that function.
|
|
This feature was originally designed to allow printout of curves at selected
|
|
time points (6 months, 1 year, \ldots), but can also be used to
|
|
select a common set of time points for averaging.
|
|
We will arbitrarily use 4 per year, which is sufficient to create a
|
|
visually smooth plot over the time span of interest.
|
|
By default \texttt{summary} does not return data for times beyond the
|
|
end of a curve,
|
|
i.e., when there are no subjects left at risk;
|
|
the \texttt{extend} argument causes
|
|
a full set of times to always be reported.
|
|
As seen in the printout above, the computed curves are in
|
|
sex within age within group order.
|
|
The overall curve is a weighted average chosen to match the original
|
|
age/sex distribution of the population.
|
|
|
|
<<flc5, fig=TRUE, include=FALSE>>=
|
|
xtime <- seq(0, 14, length=57)*365.25 #four points/year for 14 years
|
|
smat <- matrix(0, nrow=57, ncol=3) # survival curves
|
|
serr <- smat #matrix of standard errors
|
|
pi <- with(fdata, table(age2, sex))/nrow(fdata) #overall dist
|
|
for (i in 1:3) {
|
|
temp <- allfit[1:16 + (i-1)*16] #curves for group i
|
|
for (j in 1:16) {
|
|
stemp <- summary(temp[j], times=xtime, extend=T)
|
|
smat[,i] <- smat[,i] + pi[j]*stemp$surv
|
|
serr[,i] <- serr[,i] + pi[j]*stemp$std.err^2
|
|
}
|
|
}
|
|
serr <- sqrt(serr)
|
|
|
|
plot(sfit1, lty=2, col=c(1,2,4), xscale=365.25,
|
|
xlab="Years from sample", ylab="Survival")
|
|
matlines(xtime, smat, type='l', lwd=2, col=c(1,2,4),lty=1)
|
|
@
|
|
\begin{figure}[tb]
|
|
\myfig{flc5}
|
|
\caption{Estimated curves from a stratified model, along with those
|
|
from the uncorrected fit as dashed lines.}
|
|
\label{flc5}
|
|
\end{figure}
|
|
Figure \ref{flc5} shows the resulting averaged curves.
|
|
Overlaid are the curves for the unadjusted model.
|
|
Very careful comparison of these curves with the weighted estimate
|
|
shows that they have almost identical spread, with just a tiny amount
|
|
of downward shift.
|
|
|
|
There are two major disadvantages to the stratified curves.
|
|
The first is that when the original data set is small
|
|
or the number of confounders is
|
|
large, it is not always feasible to stratify into a
|
|
large enough set of groups that each will be homogeneous.
|
|
The second is a technical aspect of the standard error estimate.
|
|
Since the curves are formed from disjoint sets of observations they
|
|
are independent and the variance of the weighted average is then a weighted
|
|
sum of variances. However,
|
|
when a Kaplan-Meier curve drops to zero the usual standard error
|
|
estimate at that point involves 0/0 and becomes undefined,
|
|
leading to the NaN (not a number) value in R.
|
|
Thus the overall standard error becomes undefined if any of the
|
|
component curves falls to zero.
|
|
In the above example this happens at about the
|
|
half way point of the graph.
|
|
(Other software packages carry forward the se value from the last
|
|
no-zero point on the curve, but the statistical validity of this is
|
|
uncertain.)
|
|
|
|
To test for overall difference between the curves we can use a stratified
|
|
test statistic, which is a sum of the test statistics computed within each
|
|
subgroup. The most common choice is the stratified log-rank statistic
|
|
which is shown below.
|
|
The score test from a stratified Cox model would give the same result.
|
|
<<>>=
|
|
survdiff(Surv(futime, death) ~ group + strata(age2, sex), fdata)
|
|
@
|
|
|
|
\subsection{Modeling}
|
|
The other approach for conditional estimation
|
|
is to model the risks due to the confounders.
|
|
Though we have left it till last, this is usually the first
|
|
(and most often the only) approach used by most data analysts.
|
|
|
|
Let's start with the very simplest method: a stratified Cox
|
|
model.
|
|
<<flc8, echo=TRUE, eval=TRUE, fig=TRUE, include=FALSE>>=
|
|
cfit4a <- coxph(Surv(futime, death) ~ age + sex + strata(group),
|
|
data=fdata)
|
|
surv4a <- survfit(cfit4a)
|
|
plot(surv4a, col=c(1,2,4), mark.time=F, xscale=365.25,
|
|
xlab="Years post sample", ylab="Survival")
|
|
@
|
|
This is a very fast and easy way to produce a set of curves,
|
|
but it has three problems.
|
|
First is the assumption that this simple
|
|
model adequately accounts for the effects of age and sex on survival.
|
|
That is, it assumes that the effect of age on mortality is linear, the
|
|
sex difference is constant across all ages, and that the
|
|
coefficients for both are identical for the three FLC groups.
|
|
The second problem with this approach is that it
|
|
produces the predicted curve for a single hypothetical subject
|
|
of age \Sexpr{round(cfit4a[['means']][1], 1)} years and sex
|
|
\Sexpr{round(cfit4a[['means']][2],2)}, the means of the
|
|
covariates, under each of the 3 FLC scenarios.
|
|
However, we are interested in the adjusted survival of a \emph{cohort}
|
|
of subjects in each range of FLC, and
|
|
the survival of an ``average'' subject is not the average
|
|
survival of a cohort.
|
|
The third and most serious issue is that it is not clear exactly
|
|
what these ``adjusted'' curves represent ---
|
|
exactly who \emph{is} this subject with a sex of
|
|
\Sexpr{round(cfit4a[['means']][2],2)}?
|
|
Multiple authors have commented on this problem,
|
|
see Thomsen et al \cite{Thomsen91}, Nieto and Coresh \cite{Nieto96}
|
|
or chapter 10 of Therneau and Grambsh \cite{Therneau00} for examples.
|
|
Even worse is a Cox model that treated the FLC group as a covariate,
|
|
since that will impose an additional constraint of proportional
|
|
hazards across the 3 FLC groups.
|
|
|
|
\begin{figure}
|
|
\myfig{flc6}
|
|
\caption{Curves for the three groups, adjusted for age and sex via
|
|
a risk model. Dotted lines show the curves from marginal adjustment.
|
|
Solid curves are for the simple risk model \texttt{cfit4a}.}
|
|
\label{flc6}
|
|
\end{figure}
|
|
|
|
We can address this last problem problem by doing a proper average.
|
|
A Cox model fit can produce the predicted curves for any
|
|
age/sex combination. The key idea is to produce a predicted survival
|
|
curve for every subject of some hypothetical population, and then
|
|
take the average of these curves.
|
|
The most straightforward approach is to retrieve the predicted individual
|
|
curves for all
|
|
\Sexpr{nrow(fdata)} subjects in the data set, assuming each of the three
|
|
FLC strata one by one, and take a simple average for each strata.
|
|
For this particular data set that is a bit slow since it involves
|
|
\Sexpr{nrow(fdata)} curves. However there are only 98
|
|
unique age/sex pairs in the data,
|
|
it is sufficient to obtain the 98 * 3 FLC groups unique curves and
|
|
take a weighted average.
|
|
We will make use of the survexp function, which is designed for just this
|
|
purpose.
|
|
Start by creating a data set which has one row for each age/sex combination
|
|
along with its count.
|
|
Then replicate it into 3 copies, assigning one copy to each of the three
|
|
FLC strata.
|
|
|
|
<<flc6, echo=TRUE, fig=TRUE, include=FALSE>>=
|
|
tab4a <- with(fdata, table(age, sex))
|
|
uage <- as.numeric(dimnames(tab4a)[[1]])
|
|
tdata <- data.frame(age = uage[row(tab4a)],
|
|
sex = c("F","M")[col(tab4a)],
|
|
count= c(tab4a))
|
|
tdata3 <- tdata[rep(1:nrow(tdata), 3),] #three copies
|
|
tdata3$group <- factor(rep(1:3, each=nrow(tdata)),
|
|
labels=levels(fdata$group))
|
|
sfit4a <- survexp(~group, data=tdata3, weight = count,
|
|
ratetable=cfit4a)
|
|
|
|
plot(sfit4a, mark.time=F, col=c(1,2,4), lty=1, lwd=2,
|
|
xscale=365.25, xlab="Years from Sample",
|
|
ylab="Survival")
|
|
lines(sfit3, mark.time=F, col=c(1,2,4), lty=2, lwd=1,
|
|
xscale=365.25)
|
|
legend(730,.4, c("FLC low", "FLC med", "FLC high"), lty=1, col=c(1,2,4),
|
|
bty='n', lwd=2)
|
|
@
|
|
Figure \ref{flc6} shows the result. Comparing this to the prior 3
|
|
adjustments shown in figures \ref{flc3}, \ref{flc4}, and \ref{flc5}
|
|
we see that this result is different. Why?
|
|
Part of the reason is due to the fact that $E[f(X)] \ne f(E[X])$
|
|
for any non-linear operation $f$, so that averages of survival
|
|
curves and survival curves of averages will never be exactly
|
|
the same.
|
|
This may explain the small difference between the stratified and
|
|
the marginal approaches of figures \ref{flc3} and \ref{flc5},
|
|
which were based on the same subsets.
|
|
The Cox based result is systematically higher than the stratified
|
|
one, however, so something more is indicated.
|
|
|
|
Aside:
|
|
An alternate computational approach is to create the individual
|
|
survival curves using
|
|
the \texttt{survfit} function and then take averages.
|
|
<<>>=
|
|
tfit <- survfit(cfit4a, newdata=tdata, se.fit=FALSE)
|
|
curves <- vector('list', 3)
|
|
twt <- c(tab4a)/sum(tab4a)
|
|
for (i in 1:3) {
|
|
temp <- tfit[i,]
|
|
curves[[i]] <- list(time=temp$time, surv= c(temp$surv %*% twt))
|
|
}
|
|
@
|
|
The above code is a bit sneaky. I know that the result from the
|
|
survfit function contains a matrix \texttt{tfit\$surv}
|
|
of 104 columns, one for each row in the tdata data
|
|
frame, each column containing the curves for the three strata one after
|
|
the other. Sub setting \texttt{tfit} results in the matrix for a single
|
|
flc group.
|
|
Outside of R an approach like the above may be needed, however.
|
|
|
|
\begin{figure}
|
|
\myfig{flc6b}
|
|
\caption{Left panel: comparison of Cox model based adjustment (solid) with the
|
|
curves based on marginal adjustment (dashed).
|
|
The Cox model curves without (black) and with (red) an age*sex interaction
|
|
term overlay.
|
|
Right panel: plot of the predicted relative risks from a Cox model
|
|
\texttt{crate} versus population values from
|
|
the Minnesota rate table.}
|
|
\label{flc6b}
|
|
\end{figure}
|
|
|
|
So why are the modeling results so different than either reweighting
|
|
or stratification?
|
|
Suspicion first falls on the use of a simple linear model for
|
|
age and sex, so start by
|
|
fitting a slightly more refined model that allows
|
|
for a different slope for the two sexes, but is still linear in age.
|
|
In this particular data set an external check on the fit is also
|
|
available via the Minnesota death rate tables, which are included
|
|
with the survival package as \texttt{survexp.mn}. This is an
|
|
array that contains daily death rates by age, sex, and calendar year.
|
|
<<flc6b, fig=TRUE, include=FALSE>>=
|
|
par(mfrow=c(1,2))
|
|
cfit4b <- coxph(Surv(futime, death) ~ age*sex + strata(group),
|
|
fdata)
|
|
sfit4b <- survexp(~group, data=tdata3, ratetable=cfit4b, weights=count)
|
|
plot(sfit4b, fun='event', xscale=365.25,
|
|
xlab="Years from sample", ylab="Deaths")
|
|
lines(sfit3, mark.time=FALSE, fun='event', xscale=365.25, lty=2)
|
|
lines(sfit4a, fun='event', xscale=365.25, col=2)
|
|
|
|
temp <- median(fdata$sample.yr)
|
|
mrate <- survexp.mn[as.character(uage),, as.character(temp)]
|
|
crate <- predict(cfit4b, newdata=tdata, reference='sample', type='lp')
|
|
crate <- matrix(crate, ncol=2)[,2:1] # mrate has males then females, match it
|
|
# crate contains estimated log(hazards) relative to a baseline,
|
|
# and mrate absolute hazards, make both relative to a 70 year old
|
|
for (i in 1:2) {
|
|
mrate[,i] <- log(mrate[,i]/ mrate[21,2])
|
|
crate[,i] <- crate[,i] - crate[21,2]
|
|
}
|
|
matplot(mrate, crate, col=2:1, type='l')
|
|
abline(0, 1, lty=2, col=4)
|
|
@
|
|
The resulting curves are shown in the left panel of
|
|
figure \ref{flc6b} and reveal that addition of an interaction term
|
|
did not change the predictions, and that the Cox model result
|
|
for the highest risk group is distinctly different
|
|
predicted survival for the highest FLC group is distinctly different
|
|
when using model based prediction.
|
|
The right hand panel of the figure shows that though there are slight
|
|
differences with the Minnesota values, linearity of the age effect
|
|
is very well supported.
|
|
So where exactly does the model go wrong? Since this is such a
|
|
large data set we have the luxury of looking at subsets.
|
|
This would be a very large number of curves to plot ---
|
|
age by sex by FLC = 48 --- so an overlay of the observed and expected
|
|
curves by group would be too confusing.
|
|
Instead we will summarize each of the groups according to their
|
|
observed and predicted number of events.
|
|
<<fig=TRUE, include=FALSE>>=
|
|
obs <- with(fdata, tapply(death, list(age2, sex, group), sum))
|
|
pred<- with(fdata, tapply(predict(cfit4b, type='expected'),
|
|
list(age2, sex, group), sum))
|
|
excess <- matrix(obs/pred, nrow=8) #collapse 3 way array to 2
|
|
dimnames(excess) <- list(dimnames(obs)[[1]], c("low F", "low M",
|
|
"med F", "med M",
|
|
"high F", "high M"))
|
|
round(excess, 1)
|
|
@
|
|
The excess risks, defined as the observed/expected number of deaths,
|
|
are mostly modest ranging from .8 to 1.2.
|
|
The primary exception exception is the high FLC group for ages 50--59
|
|
which has values of 1.6 to 2.5;
|
|
the Cox model fit has greatly overestimated the survival for the
|
|
age 50--54 and 55--59 groups.
|
|
Since this is also the age category with the highest count in the data
|
|
set, this overestimation will have a large impact on the overall
|
|
curve for high FLC subset, which is exactly where the the deviation
|
|
in figure \ref{flc6b} is observed to lie.
|
|
There is also mild evidence for a linear trend in age for the low FLC females,
|
|
in the other direction.
|
|
Altogether this suggests that the model might need to have a different
|
|
age coefficient for each of the three FLC groups.
|
|
<<>>=
|
|
cfit5a <- coxph(Surv(futime, death) ~ strata(group):age +sex, fdata)
|
|
cfit5b <- coxph(Surv(futime, death) ~ strata(group):(age +sex), fdata)
|
|
cfit5c <- coxph(Surv(futime, death) ~ strata(group):(age *sex), fdata)
|
|
|
|
options(show.signif.stars=FALSE) # see footnote
|
|
anova(cfit4a, cfit5a, cfit5b, cfit5c)
|
|
temp <- coef(cfit5a)
|
|
names(temp) <- c("sex", "ageL", "ageM", "ageH")
|
|
round(temp,3)
|
|
@
|
|
The model with separate age coefficients for each FLC group gives a
|
|
major improvement in goodness of fit,
|
|
but adding separate sex coefficients per group
|
|
or further interactions does not add importantly
|
|
beyond that.
|
|
\footnote{There are certain TV shows that make one dumber just by
|
|
watching them; adding stars to the output has the same effect on
|
|
statisticians.}
|
|
|
|
\begin{figure}
|
|
\myfig{flc7}
|
|
\caption{Adjusted survival for the 3 FLC groups based on the
|
|
improved Cox model fit. Dashed lines show the predictions
|
|
from the marginal model.}
|
|
\label{flc7}
|
|
\end{figure}
|
|
|
|
A recheck of the observed/expected values now shows a much more
|
|
random pattern, though some excess remains in the upper
|
|
right corner. The updated survival curves are shown in
|
|
figure \ref{flc7} and now are
|
|
in closer concordance with the marginal fit.
|
|
<<flc7, include=FALSE, fig=TRUE>>=
|
|
pred5a <- with(fdata, tapply(predict(cfit5a, type='expected'),
|
|
list(age2, sex, group), sum))
|
|
excess5a <- matrix(obs/pred5a, nrow=8,
|
|
dimnames=dimnames(excess))
|
|
round(excess5a, 1)
|
|
|
|
sfit5 <- survexp(~group, data=tdata3, ratetable=cfit5a, weights=count)
|
|
plot(sfit3, fun='event', xscale=365.25, mark.time=FALSE, lty=2, col=c(1,2,4),
|
|
xlab="Years from sample", ylab="Deaths")
|
|
lines(sfit5, fun='event', xscale=365.25, col=c(1,2,4))
|
|
@
|
|
One problem with the model based estimate is that standard errors for the
|
|
curves are complex.
|
|
Standard errors of the individual curves for each age/sex/FLC combination
|
|
are a standard output of the survfit function, but the collection of curves is
|
|
correlated since they all depend on a common estimate of the model's coefficient
|
|
vector $\beta$.
|
|
Curves with disparate ages are anti-correlated (an increase in the age coefficient
|
|
of the model would raise one and lower the other) whereas those for close ages are
|
|
positively correlated.
|
|
A proper variance for the unweighted average has been derived by Gail and
|
|
Byar \cite{Gail86},
|
|
but this has not been implemented in any of the standard packages, nor extended
|
|
to the weighted case.
|
|
A bootstrap estimate would appear to be the most feasible.
|
|
|
|
\section{Conclusions}
|
|
When two populations need to be adjusted and one is much larger than the other,
|
|
the balanced subset method has been popular.
|
|
It is most often seen in the
|
|
context of a case-control study, with cases as the rarer group and a set
|
|
of matched controls selected from the larger one.
|
|
This method has the advantage that the usual standard error estimates from a
|
|
standard package are appropriate, so no further work is required.
|
|
However, in the general situation it leads to a correct answer but for the wrong
|
|
problem, i.e., not for a population in which we are interested.
|
|
|
|
|
|
The population reweighted estimate is flexible, has a readily available variance
|
|
in some statistical packages (but not all), and the result is directly
|
|
interpretable. It is the method we recommend in general.
|
|
The approach can be extended to a large number of balancing factors by using a
|
|
regression model to derive the weights. Exploration and checking of said model
|
|
for adequacy is an important step in this case.
|
|
The biggest downside to the method
|
|
arises when there is a subset which is rare in the data sample but frequent in
|
|
the adjusting population. In this case subjects in that subset will be assigned
|
|
large weights, and the resulting curves will have high variance.
|
|
|
|
The stratified method is closely related to reweighting (not shown).
|
|
It does not
|
|
do well if the sample size is small, however.
|
|
|
|
Risk set modeling is a very flexible method, but is also the one where it is
|
|
easiest to go wrong by using an inadequate model, and variance estimation is
|
|
also difficult.
|
|
To the extent that the fitted model is relevant, it
|
|
allows for interpolation and extrapolation to a reference population with
|
|
a different distribution of covariates than the one in the training data.
|
|
It may be applicable in cases such as rare subsets where population
|
|
reweighting is problematic, with the understanding that one is depending
|
|
heavily on extrapolation in this case, which is always dangerous.
|
|
|
|
\section{A note on type 3 tests}
|
|
One particular software package (not R) and its proponents are very fond
|
|
of something called ``type 3'' tests.
|
|
Said tests are closely tied to a particular reference population:
|
|
\begin{itemize}
|
|
\item For all continuous covariates in the model, the empirical
|
|
distribution is used as the reference.
|
|
\item For all categorical adjusters, a uniform distribution over the
|
|
categories is used.
|
|
\end{itemize}
|
|
|
|
Figure \ref{flc8} shows the fit from such a model. Not surprisingly, the
|
|
predicted death rate is very high: 1/4 of our population is over 80 years old!
|
|
The authors do not find such a prediction particularly useful
|
|
since we don't ever expect to see a population like this
|
|
(it's sort of like planning for the zombie apocalypse),
|
|
but for those enamored of type 3 tests this shows how to create the
|
|
corresponding curves.
|
|
|
|
<<flc8, include=FALSE, fig=TRUE>>=
|
|
# there is a spurious warning from the model below: R creates 3 unneeded
|
|
# columns in the X matrix
|
|
cfit6 <- coxph(Surv(futime, death) ~ strata(group):age2 + sex, fdata)
|
|
|
|
saspop <- with(fdata, expand.grid(age2= levels(age2), sex= levels(sex),
|
|
group = levels(group)))
|
|
|
|
sfit6 <- survexp(~group, data=saspop, ratetable=cfit6)
|
|
plot(sfit6, fun='event', xscale=365.25, mark.time=FALSE, lty=1, col=c(1,2,4),
|
|
xlab="Years from sample", ylab="Deaths")
|
|
lines(sfit5, fun='event', xscale=365.25, lty=2, col=c(1,2,4))
|
|
@
|
|
\begin{figure}
|
|
\myfig{flc8}
|
|
\caption{Adjusted survival for the 3 FLC groups based on a fit with
|
|
categorical age, and predicting for a uniform age/sex population.
|
|
Dashed lines show the predictions from the marginal model.}
|
|
\label{flc8}
|
|
\end{figure}
|
|
|
|
\newpage
|
|
\bibliographystyle{plain}
|
|
\bibliography{refer}
|
|
\end{document}
|
|
|
|
|