Abstract

In this paper, the authors describe a simple method for making longitudinal comparisons of alternative markers of a subsequent event. The method is based on the aggregate prediction gain from knowing whether or not a marker has occurred at any particular age. An attractive feature of the method is the exact decomposition of the measure into 2 components: 1) discriminatory ability, which is the difference in the mean time to the subsequent event for individuals for whom the marker has and has not occurred, and 2) prevalence factor, which is related to the proportion of individuals who are positive for the marker at a particular age. Development of the method was motivated by a study that evaluated proposed markers of the menopausal transition, where the markers are measures based on successive menstrual cycles and the subsequent event is the final menstrual period. Here, results from application of the method to 4 alternative proposed markers of the menopausal transition are compared with previous findings.

It is common in medical and epidemiologic event-history research to seek marker events that are predictive of future events. Examples include staging systems for development of aging processes, staging systems for diseases where transition to a higher stage is associated with increased risk of death, or surrogate markers for survival that allow for more rapid and less costly assessment of alternative treatments. The application that motivated our work concerned assessment of alternative markers of the onset of the menopausal transition, based on the length and variability of menstrual cycles, as indicators of this reproductive life stage and as predictors of age at the final menstrual period (FMP). Unlike puberty, a staging system for reproductive aging has not been definitively established; both the number of stages (1–3) and the precise criteria for defining each stage (1–9) are still being debated. The 2001 Stages of Reproductive Aging Workshop (STRAW) recommendations (3) are the most widely known, and they were revised following our empirical comparative evaluation in 4 large cohorts (6). However, evaluation of the proposed criteria has been limited by the lack of a summary measure capable of comparing several key aspects of multiple proposed markers simultaneously across time (6). In this paper, we present a simple approach that provides an interpretable graphical summary of which proposed marker is more effective at any particular age. The summary measure reflects the differences in the frequency and distribution of age at occurrence of the markers, as well as their ability to predict time to FMP at various ages.

The problem has the following generic features: 1) longitudinal data are available on a sample of individuals from a population for whom the times of intermediate events (markers) are recorded; 2) it is assumed that the markers have not occurred prior to entry into the study; and 3) the ability of the markers to predict the time of a final event F is of interest. The marker m is determined to have occurred at some age ai for individual i, and it is then treated as having occurred for all subsequent ages. The predictive value of the marker m is related to the frequency of its occurrence in a population, the extent to which the occurrence of m predicts the time to F, or equivalently the age at which F occurs. The utility of a marker also depends upon the proportion of individuals who exhibit the marker. In practice, a marker would also appropriately reflect the relevant underlying biologic event.

The final event F is assumed to occur after the marker events, and it may be recorded or censored in the data. We assume that this event occurs for everyone eventually. In situations where this does not hold, our methods can be applied to study the age of occurrence of the final event within a restricted time window: Specifically, a limiting age A is chosen, and the age of the final event is defined as v if it occurs at age v < A; otherwise it is defined as A (10, 11).

Previous work has focused on situations where the marker is a continuous measure and the outcome is binary (12, 13) or the outcome is the time to an event and the marker is continuous and fixed over time (14) or varying over time (15). The latter papers extended the notion of receiver operating characteristic curves to the longitudinal assessment of continuous markers. Here we consider the situation where the outcome is the time to an event and the marker is binary and time-varying, taking the value 0 before the marker event occurs and 1 after it occurs. Comparisons of the sensitivity and specificity of markers, kappa statistics, or receiver operating characteristic curve analysis do not apply naturally in this setting, since the outcome is the time to an event and hence is not binary (3, 16). A more appropriate approach is to apply the methods of survival analysis, where the outcome is the time to the event and marker occurrence is treated as a time-varying covariate—for example, in a Cox proportional hazards model (4, 16). The regression coefficient of the marker in such an analysis provides a measure of marker effectiveness. In our comparisons below, we include a varying-coefficient Cox model (17). However, the size of the coefficient does not reflect the distribution of the occurrence of the marker over time, and differences in the values of the regression coefficients of different markers do not reflect the fact that some markers may tend to occur more frequently or later than others. None of these approaches facilitate comparison of multiple markers simultaneously. Our longitudinal approach takes into account the prevalence of each marker over time, as well as its discriminatory ability, and allows the simultaneous comparison of several markers.

PROPOSED LONGITUDINAL MEASURE OF MARKER EFFECTIVENESS AND ITS COMPONENTS

The notation used in our method is summarized in Table 1. In a population P of interest, let P(a) be the set of individuals eligible for marker assessment at age a. For each eligible individual, a marker m is determined to have occurred or not occurred by age a, based on information available at age a; in our application, the history of menstrual bleeding up to age a. We allow for the possibility that the marker m is not defined at age a for a proportion 1 − ρma of the eligible population, because of incomplete information at age a. For example, in our application, one of the markers of menopausal transition is based on the times of 10 consecutive menstrual bleeds prior to the age of interest. The marker is only defined for individuals at ages after the date on which 10 menstrual bleeds have been recorded. For marker m, let πma be the proportion of definable persons who are positive for the marker—that is, for whom the marker occurred at or before age a—and let 1 − πma be the proportion of definable individuals who are negative for the marker at age a.

Table 1.

Notation Used in the Authors’ Method for Longitudinally Comparing Alternative Markers of a Subsequent Event

Notation Explanation 
Marker m A binary indicator of occurrence of an intermediate event predictive of the final event 
Final event F The final event being predicted by the markers 
Age a Age at which the effectiveness of a marker is being assessed 
ρma Proportion of the population for which marker m is defined at age a 
forumla Estimate of ρma (the circumflex denotes an estimate) 
πma Proportion of definable individuals who are positive for marker m at age a 
μa Average time from age a to the terminal event 
μma+ Average time from age a to the terminal event, for individuals positive for marker m at age a 
μma Average time from age a to the terminal event, for individuals negative for marker m at age a 
Fma+,Fma Distribution functions of time to F for individuals positive and negative for marker m at age a 
δma Discriminatory ability, defined in equation 3; its estimate is defined in equation 6
γma Prevalence factor, defined in equation 4; its estimate is defined in equation 7
ϵma Overall measure of marker effectiveness, defined in equation 1; its estimate is defined in equation 5. It is the product of discriminatory ability and prevalence factor. 
na No. of eligible individuals in the sample at age a 
Notation Explanation 
Marker m A binary indicator of occurrence of an intermediate event predictive of the final event 
Final event F The final event being predicted by the markers 
Age a Age at which the effectiveness of a marker is being assessed 
ρma Proportion of the population for which marker m is defined at age a 
forumla Estimate of ρma (the circumflex denotes an estimate) 
πma Proportion of definable individuals who are positive for marker m at age a 
μa Average time from age a to the terminal event 
μma+ Average time from age a to the terminal event, for individuals positive for marker m at age a 
μma Average time from age a to the terminal event, for individuals negative for marker m at age a 
Fma+,Fma Distribution functions of time to F for individuals positive and negative for marker m at age a 
δma Discriminatory ability, defined in equation 3; its estimate is defined in equation 6
γma Prevalence factor, defined in equation 4; its estimate is defined in equation 7
ϵma Overall measure of marker effectiveness, defined in equation 1; its estimate is defined in equation 5. It is the product of discriminatory ability and prevalence factor. 
na No. of eligible individuals in the sample at age a 

Let μa be the average time from age a to F for individuals eligible at age a. Similarly, let μma+,μma be the average times from age a to F for eligible individuals for whom the marker m is defined and who are positive (+) and negative (−) for marker m at age a, respectively. Our measure of the effectiveness of marker m at age a is 

(1)
graphic

and it is interpreted as the average prediction gain in the population from knowing that the marker is positive or negative at age a. The change in the predicted time to the final event from knowing that marker m is positive at age a is μaμma+, the difference in expected time to the final event and the expected time to event for individuals positive for the marker. Similarly, the change in predicted time to the final event from knowing that the marker is negative at age a is μmaμa. When the marker is not defined, the change in predicted time is 0, since no information is gained. (Statistically, there may be information about the expected time to F when the marker is not defined, but we assume that this information is not available or, if available, is not exploited in the analysis.) The average prediction gain in the eligible population from knowing that the marker is positive or negative at age a is obtained by summing the prediction gains, weighted by their respective proportions in the population, yielding 

graphic

Substituting (μaμma+)=(1πma)×(μmaμma+) and (μmaμa)=πma×(μmaμma+) in this expression leads (with simple algebra) to the effectiveness measure εma in equation 1.

The measure εma is a product of 2 components: 

(2)
graphic
where we call the first component 
(3)
graphic
the discriminatory ability of marker m at age a and we call the second component 
(4)
graphic
the prevalence factor of the marker m at age a. The discriminatory ability measures the extent to which occurrence of the marker improves prediction of time to the final event. The prevalence factor reflects the proportion of individuals for whom the marker is defined and positive at age a. It reflects the intuition that, for a given level of discriminatory ability, a marker is more useful in the aggregate when it divides the population approximately equally (e.g., 40% or 60% of the population are positive) than when it divides the population unequally (e.g., 2% or 98% of the population are positive). Specifically, the prevalence factor attains a maximum value of one-half when 50% of individuals are positive and 50% are negative, which is also the distribution of marker prevalence with the highest variance; it declines as the proportion positive tends to 0 or 1.

The inclusion of this prevalence factor (equation 4) is a distinctive feature of our proposed measure (equation 1). Standard regression-based measures of marker effectiveness, such as the regression coefficient of marker incidence in a Cox proportional hazards model, measure discriminatory ability but do not reflect the prevalence of the marker at any given age. Inclusion of the prevalence factor redresses the situation where a regression coefficient is large for a variable that has little impact on aggregate prediction because it has very low (or high) prevalence.

Estimates of the quantities in equations 1, 3, and 4 can be obtained from a suitable, ideally random, sample of the population. Let na be the number of eligible individuals in the sample at age a, forumla be the proportion of these individuals for whom the marker is defined, forumla be the proportion of definable individuals who are positive for the marker (i.e., for whom the marker occurred at or before age a), and 1−forumla be the proportion of definable individuals who are negative for the marker.

Let forumla be the estimated average time from age a to F for individuals who are eligible at age a. Similarly, let forumla be the estimated average times from age a to F for eligible individuals who are positive (+) and negative (−) for marker m at age a, respectively. If the time to F is not censored and hence recorded for all cases in the sample, these estimates could be the respective sample means, but if some times to F are censored, a method is needed to estimate the times for censored individuals, as is the case in our application. We then estimate the marker effectiveness of marker m at age a as 

(5)
graphic
the product of the estimated discriminatory ability 
(6)
graphic
and the estimated prevalence factor 
(7)
graphic
We compute the standard errors of equations 5–7 by simple bootstrapping of individuals in the sample.

APPLICATION: ASSESSMENT OF LATE MARKERS OF MENOPAUSAL TRANSITION

In 2001, the STRAW recommendations stipulated that reproductive life could be characterized as including 2 menopausal transition stages, early and late (5, 6). Entry into early menopausal transition was characterized by increasing levels of follicle-stimulating hormone and increasing variability in menstrual cycle length. Entry into late menopausal transition was characterized by continued elevation of follicle-stimulating hormone levels and the occurrence of skipped cycles or amenorrhea. The STRAW recommendations have gained acceptance, but debate remains as to how best to define bleeding markers of the onset of each stage (6).

We applied our methods to 3 menstrual bleeding markers proposed for defining the onset of late menopausal transition which were considered in the empirical evaluation of the STRAW recommendations (6). In these definitions, a segment is the time interval (in days) between 2 menstrual bleeds, defined precisely in the article by Harlow et al. (7). These markers reflect the notion that entrance into late menopausal transition is characterized by segments of increased or more variable length:

a) the first segment of at least 90 days (D90) (5),

b) the first segment of at least 60 days (D60) (8), and

c) the first instance of a running range of more than 42 days (RR10) (9). The running range is computed as the difference between the maximum and minimum lengths of 10 consecutive segments.

Note that point c requires data on 10 successive segments, and hence is not defined until data on 10 segments have been recorded. This motivates the definitions of ρma and its estimate forumla given above.

We also include, for illustration, 1 proposed marker of early menopausal transition:

d) persistent ≥7-day difference in consecutive segment lengths (DIFF7p). This marker is defined as the first segment whose length is at least 7 days greater than that of the previous segment, when at least this magnitude of difference between consecutive cycles is observed again within the subsequent 10 segments (1).

Our interest is in assessing empirically the relation between these events and the date of the FMP, defined retrospectively as the menstrual period that is followed by 12 months of amenorrhea. When assessing markers a–d, it is assumed that the first occurrence of these events did not take place before a woman was enrolled in the study. For methods of adjusting for left-truncation and censoring from late entry into a study, see the article by Cain et al. (18).

We compared markers using data from the TREMIN study (19), which prospectively recorded the menstrual cycles of students enrolled at the University of Minnesota from 1935 to 1939. This analysis includes records from 726 women who were enrolled by age 25 years, participated for at least 5 years, and were still participating and not using hormones at age 35 years, the baseline age for these analyses. (The data tape TRUST998FINAL was supplied by the TREMIN investigators in 1993.)

All untreated bleeding segments of each eligible woman observed from age 35 years through the FMP or censoring due to hysterectomy, withdrawal, or hormone use were included, with women contributing up to 322 segments. A nonparametric approach was applied to estimate the times to FMP for censored cases. Specifically, let forumla denote the distribution functions of the time to F for women positive and negative for the marker m at age a, respectively, and let F^ma+,F^ma denote the Kaplan-Meier estimates of these distributions, computed using the set of women for whom the marker is defined at age a. Then 

graphic
and we have the estimates 
graphic
and 
graphic

For these estimates to be well-defined, the longest times in the data set in the positive and negative marker groups have to be events, and not censored. If the last observation time happens to be a censoring time, we put the remaining probability mass on that time point. For these estimates to be consistent, the support of the censoring time needs to contain the support of age at FMP (20, 21). This is probably the case for this data set, since the TREMIN study has long follow-up, with the last observation time often being the FMP (19).

Approximately half of the women had gaps in their menstrual histories, with a median of 2 gaps per woman. Gaps in the reported menstrual histories were multiply imputed using a hot deck method described elsewhere (22). Since results from imputed data differed little from results from unimputed data, we present only the latter here. Gaps of less than 4 years were ignored, and women with gaps of 4 years or more were censored at the gap.

The median ages at occurrence of DIFF7p, RR10, D60, and D90 were 41.5, 48.1, 48.1, and 50.0 years, respectively. Thus, RR10 and D60 occurred at similar times; D90 occurred about 2 years later, on average; and the marker of early transition, DIFF7p, occurred 6.5 years earlier (Figure 1). Harlow et al. (7) modeled the hazard function for the FMP using a varying-coefficient Cox model (17) that incorporated censored and uncensored women. The log relative hazard of having one's FMP as a function of age is plotted for each marker in Figure 2. At any given age, the log relative hazards were similar for each of the 3 late transition markers, but the log relative hazard for DIFF7p was significantly lower than that for the other markers, reflecting the expected finding that the early marker is less predictive of the FMP than the late markers.

Figure 1.

Distributions of age at marker event for 1 bleeding-related marker of early menopausal transition (DIFF7p) and 3 bleeding-related markers of late menopausal transition (RR10, D60, and D90). See text for detailed definitions of marker events. The box plots are based on Kaplan-Meier estimates of the distributions that take into account right-censoring.

Figure 1.

Distributions of age at marker event for 1 bleeding-related marker of early menopausal transition (DIFF7p) and 3 bleeding-related markers of late menopausal transition (RR10, D60, and D90). See text for detailed definitions of marker events. The box plots are based on Kaplan-Meier estimates of the distributions that take into account right-censoring.

Figure 2.

Log relative hazard (RH) of having one's final menstrual period, by age, for 1 bleeding-related marker of early menopausal transition (Diff7p (·−·−·)) and 3 bleeding-related markers of late menopausal transition (RR10 (· · ·), D60 (—), and D90 (- - -)). See text for detailed definitions of marker events.

Figure 2.

Log relative hazard (RH) of having one's final menstrual period, by age, for 1 bleeding-related marker of early menopausal transition (Diff7p (·−·−·)) and 3 bleeding-related markers of late menopausal transition (RR10 (· · ·), D60 (—), and D90 (- - -)). See text for detailed definitions of marker events.

In Figure 3, part A, the estimated marker effectiveness (equation 5) is plotted as a function of age for all 4 markers. From these data, we see that 1) the early marker DIFF7p is more effective than the late markers before age 45 years and less effective after age 45 years, confirming empirically the early-versus-late designation; 2) RR10 is similar to D60 before age 49 years and inferior to D60 after age 49 years; and 3) D90 is similar to D60 after age 52 years but inferior to D60 at earlier ages. Overall, D60 seems to be the best of the 3 late markers in this data set (6).

Figure 3.

Measures of A) marker effectiveness, B) prevalence factor, and C) discriminatory ability for 4 markers of the final menstrual period, by age. —, D60 (segment ≥60 days); - - -, D90 (segment ≥90 days); · · ·, RR10 (10-segment running range); ·−·−·, Diff7p. See text for detailed definitions of marker events.

Figure 3.

Measures of A) marker effectiveness, B) prevalence factor, and C) discriminatory ability for 4 markers of the final menstrual period, by age. —, D60 (segment ≥60 days); - - -, D90 (segment ≥90 days); · · ·, RR10 (10-segment running range); ·−·−·, Diff7p. See text for detailed definitions of marker events.

Further insight into marker performance can be gained by plotting the components of marker effectiveness, prevalence factor and discriminatory ability, against age. Parts B and C of Figure 3 show these plots for the 4 markers. From these plots, we note that D90 has a very small prevalence factor before age 47 years, since the proportion of persons positive for this marker is small for these ages. The estimated discriminatory ability is actually higher for D90 than for D60 in this age range, but it is highly variable because of sampling error. Discriminatory ability tends to increase with age for D60 and decline with age for D90. The superiority of D60 as an overall measure between ages 42 and 52 years seems mainly attributable to the greater prevalence factor for D60 in this age range, since the proportion of women positive for D60 is higher for this less stringent criterion. Since the choice of 60 days and 90 days in these markers is somewhat arbitrary, it would be possible to extend this comparison to a wider set of markers (e.g., D45, D50, …, D90) to seek an optimal choice.

To conduct statistical tests and calculate confidence intervals for the differences in Figure 3A, we compute bootstrap standard errors for the difference in marker effectiveness between pairs of markers at each age. The associated 95% confidence intervals, given as the difference plus or minus 2 standard errors, are shown for the markers RR10 and D60 in Figure 4. Here the standard errors are computed using 500 bootstrap samples of women in the data set. From this figure, we see that confidence intervals for the differences in marker effectiveness include 0 at early ages but are positive for some ages beyond 50 years, suggesting an advantage for D60 for later ages.

Figure 4.

Difference in marker effectiveness between 2 bleeding-related markers of late menopausal transition (RR10 and D60), by age. The 95% confidence bands (lighter curves) were computed from 500 bootstrap samples. See text for detailed definitions of marker events.

Figure 4.

Difference in marker effectiveness between 2 bleeding-related markers of late menopausal transition (RR10 and D60), by age. The 95% confidence bands (lighter curves) were computed from 500 bootstrap samples. See text for detailed definitions of marker events.

The plot of discriminatory ability in Figure 3C shows the differences in predicted mean time to FMP for women who are negative and positive for the marker at each age. The predicted means for the positive and negative marker groups can also be plotted individually against age, as shown in Figure 5 for the D60 marker; this can be viewed as summarizing plots of the distributions of each group by age, as implemented by Lisabeth et al. (8). In interpreting Figure 5, note that 1) the time to FMP inevitably declines as women get older, 2) the composition of the positive and negative marker groups is changing over time, and 3) the plotted values do not account for the changes in the prevalence of the marker over age. Points 2 and 3 motivate our plot of the combined measure of marker effectiveness (Figure 3A).

Figure 5.

Predicted mean time to final menstrual period (FMP) for women positive or negative for the D60 marker of late menopausal transition, by age (—, positive for the marker on or before that age; - - -, negative for the marker at that age). See text for detailed definitions of marker events. Lighter curves, 95% confidence bands.

Figure 5.

Predicted mean time to final menstrual period (FMP) for women positive or negative for the D60 marker of late menopausal transition, by age (—, positive for the marker on or before that age; - - -, negative for the marker at that age). See text for detailed definitions of marker events. Lighter curves, 95% confidence bands.

DISCUSSION

Here we have proposed a longitudinal measure of marker effectiveness (equation 1) for longitudinal data involving recurrent events, where interest lies in using this information to identify markers that predict a future event. In the motivating application, the recurrent events were menstrual cycles, the markers defined onset of the menopausal transition, and the future event was the FMP. The proposed method provides a simple graphical comparison for assessing the predictive value of alternative markers of menopausal transition as a function of age, the central question in our study, with associated measures of uncertainty. We believe that the proposed method would be useful in other settings—for example, in studies of aging processes or episodic events such as migraine headaches, epileptic seizures, or visits to the emergency room.

The proposed measure has a direct interpretation in terms of the expected gain in prediction over the population expectation. It combines the degree to which the fact that a marker has occurred by any age discriminates between individuals who are and are not more proximal to the subsequent or final event (equation 3) with a factor that reflects the prevalence of the marker at that age (equation 4). Coefficients in a regression model reflect the discriminatory component but neglect the prevalence factor.

The interpretability of equation 1 is also a reason for using the difference in means as the measure of discriminatory ability, rather than other measures that seem more natural for event histories, such as the difference in medians or the log hazard ratio. Limitations of classical epidemiologic measures of association, such as the hazard ratio or relative risk, were discussed in a recent set of articles based on a symposium on this topic (23–26). The mean value has some attraction from the perspective of causal inference, since the causal effect of knowledge of the marker can be defined conceptually for individuals, and then equation 1 is simply an average of individual causal effects over the population (27).

The focus on means implies that all individuals eventually experience the final event, and it also requires a method to (in effect) impute the times of final events that are censored in the sample. As we noted in the Introduction, the former limitation can be relaxed by studying the age of occurrence of the final event within a restricted time window (10, 11). The method we applied to impute final event times for censored cases makes the common assumption that censoring is noninformative. This assumption could be relaxed by including auxiliary information in a model for imputing these times (28). The proposed measure can be readily extended to adjust for covariates, as when the effectiveness of a marker needs to be assessed net of other characteristics. If these characteristics are not time-varying, they can be included as covariates in the model used to estimate discriminatory ability.

Note that the measure of discriminatory ability at any age uses only the “current status” information of whether the marker has or has not occurred by that age. Thus, for cases where the marker has occurred, it does not use information about the time that has elapsed since marker occurrence. If this information were available, it could be used as a covariate to improve the prediction of time to the terminal event for individuals positive for the marker. This does not affect our aggregate measure of marker effectiveness, which averages over predictions of individuals with different elapsed times since marker occurrence.

Abbreviations

    Abbreviations
  • FMP

    final menstrual period

  • STRAW

    Stages of Reproductive Aging Workshop

Author affiliations: Department of Biostatistics, School of Public Health, University of Michigan, Ann Arbor, Michigan (Roderick J. Little, Bin Nan); and Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan (Matheos Yosef, Siobán D. Harlow).

This work was supported by grant AG 021543 (Siobán D. Harlow, Principal Investigator) from the National Institute on Aging.

The authors appreciate useful discussions on this work with Drs. Kevin Cain and Michael Elliott.

The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute on Aging or the National Institutes of Health.

Conflict of interest: none declared.

References

1.
Mitchell
ES
Woods
NF
Mariella
A
Three stages of the menopausal transition from the Seattle Midlife Women's Health Study: toward a more precise definition
Menopause
 , 
2000
, vol. 
7
 
5
(pg. 
334
-
349
)
2.
Gracia
CR
Sammel
MD
Freeman
EW
, et al.  . 
Defining menopause status: creation of a new definition to identify the early changes of the menopausal transition
Menopause
 , 
2005
, vol. 
12
 
2
(pg. 
128
-
135
)
3.
Soules
MR
Sherman
S
Parrott
E
, et al.  . 
Executive summary: Stages of Reproductive Aging Workshop (STRAW)
Fertil Steril
 , 
2001
, vol. 
76
 
5
(pg. 
874
-
878
)
4.
Taylor
SM
Kinney
AM
Kline
JK
Menopausal transition: predicting time to menopause for women 44 years or older from simple questions on menstrual variability
Menopause
 , 
2004
, vol. 
11
 
1
(pg. 
40
-
48
)
5.
Brambilla
DJ
McKinlay
SM
Johannes
CB
Defining the perimenopause for application in epidemiologic investigations
Am J Epidemiol
 , 
1994
, vol. 
140
 
12
(pg. 
1091
-
1095
)
6.
Harlow
SD
Crawford
S
Dennerstein
L
, et al.  . 
Recommendations from a multi-study evaluation of proposed criteria for staging reproductive aging
Climacteric
 , 
2007
, vol. 
10
 
2
(pg. 
112
-
119
)
7.
Harlow
SD
Cain
K
Crawford
S
, et al.  . 
Evaluation of four proposed bleeding criteria for the onset of late menopausal transition
J Clin Endocrinol Metab
 , 
2006
, vol. 
91
 
9
(pg. 
3432
-
3438
)
8.
Lisabeth
LD
Harlow
SD
Gillespie
B
, et al.  . 
Staging reproductive aging: a comparison of proposed bleeding criteria for the menopausal transition
Menopause
 , 
2004
, vol. 
11
 
2
(pg. 
186
-
197
)
9.
Taffe
JR
Dennerstein
L
Menstrual patterns leading to the final menstrual period
Menopause
 , 
2002
, vol. 
9
 
1
(pg. 
32
-
40
)
10.
Chen
PY
Tsiatis
AA
Causal inference on the difference of the restricted mean lifetime between two groups
Biometrics
 , 
2001
, vol. 
57
 
4
(pg. 
1030
-
1038
)
11.
Karrison
K
Restricted mean life with adjustment for covariates
J Am Stat Assoc.
 , 
1987
, vol. 
82
 
400
(pg. 
1169
-
1176
)
12.
Huang
Y
Sullivan Pepe
M
Feng
Z
Evaluating the predictiveness of a continuous marker
Biometrics
 , 
2007
, vol. 
63
 
4
(pg. 
1181
-
1188
)
13.
Pepe
MS
Feng
Z
Huang
Y
, et al.  . 
Integrating the predictiveness of a marker with its performance as a classifier
Am J Epidemiol
 , 
2008
, vol. 
167
 
3
(pg. 
362
-
368
)
14.
Heagerty
PJ
Lumley
T
Pepe
MS
Time-dependent ROC curves for censored survival data and a diagnostic marker
Biometrics
 , 
2000
, vol. 
56
 
2
(pg. 
337
-
344
)
15.
Zheng
Y
Heagerty
PJ
Prospective accuracy for longitudinal markers
Biometrics
 , 
2007
, vol. 
63
 
2
(pg. 
332
-
341
)
16.
Cooper
GS
Baird
DD
Darden
FR
Measures of menopausal status in relation to demographic, reproductive, and behavioral characteristics in a population-based study of women aged 35–49 years
Am J Epidemiol
 , 
2001
, vol. 
153
 
12
(pg. 
1159
-
1165
)
17.
Nan
B
Lin
X
Lisabeth
LD
, et al.  . 
A varying-coefficient Cox model for the effect of age at a marker event on age at menopause
Biometrics
 , 
2005
, vol. 
61
 
2
(pg. 
576
-
583
)
18.
Cain
KC
Harlow
SD
Little
RJ
, et al.  . 
Bias due to left truncation and left censoring in longitudinal studies of developmental processes
Am J Epidemiol
  
In press
19.
Treloar
AE
Boynton
RE
Behn
BG
, et al.  . 
Variation of the human menstrual cycle through reproductive life
Int J Fertil
 , 
1967
, vol. 
12
 
1
(pg. 
77
-
126
)
20.
Susarla
V
van Ryzin
J
Large sample theory for an estimator of the mean survival time from censored samples
Ann Stat
 , 
1980
, vol. 
8
 
5
(pg. 
1002
-
1016
)
21.
Stute
W
Wang J-
L
The strong law under random censorship
Ann Stat
 , 
1993
, vol. 
21
 
3
(pg. 
1591
-
1607
)
22.
Little
RJ
Yosef
M
Cain
KC
, et al.  . 
A hot-deck multiple imputation procedure for gaps in longitudinal data on recurrent events
Stat Med
 , 
2008
, vol. 
27
 
1
(pg. 
103
-
120
)
23.
Kaufman
JS
Toward a more disproportionate epidemiology
Epidemiology
 , 
2010
, vol. 
21
 
1
(pg. 
1
-
2
)
24.
Hernán
MA
The hazards of hazard ratios
Epidemiology
 , 
2010
, vol. 
21
 
1
(pg. 
13
-
15
)
25.
Langholz
B
Case-control studies = odds ratios: blame the retrospective model
Epidemiology
 , 
2010
, vol. 
21
 
1
(pg. 
10
-
12
)
26.
Poole
C
On the origin of risk relativism
Epidemiology
 , 
2010
, vol. 
21
 
1
(pg. 
3
-
9
)
27.
Angrist
JD
Imbens
GW
Rubin
DB
Identification of causal effects using instrumental variables
J Am Stat Assoc.
 , 
1996
, vol. 
91
 
434
(pg. 
444
-
472
)
28.
Taylor
JM
Hsu
CH
Murray
S
Survival estimation and testing via multiple imputation
Stat Probab Lett.
 , 
2002
, vol. 
58
 
3
(pg. 
221
-
232
)

Author notes

Editor's note: An invited commentary on this article appears on page 1388, and the authors’ response appears on page 1391.