-
PDF
- Split View
-
Views
-
Cite
Cite
Ree Dawson, Philip W. Lavori, Efficient design and inference for multistage randomized trials of individualized treatment policies, Biostatistics, Volume 13, Issue 1, January 2012, Pages 142–152, https://doi.org/10.1093/biostatistics/kxr016
Close -
Share
Abstract
Clinical demand for individualized “adaptive” treatment policies in diverse fields has spawned development of clinical trial methodology for their experimental evaluation via multistage designs, building upon methods intended for the analysis of naturalistically observed strategies. Because often there is no need to parametrically smooth multistage trial data (in contrast to observational data for adaptive strategies), it is possible to establish direct connections among different methodological approaches. We show by algebraic proof that the maximum likelihood (ML) and optimal semiparametric (SP) estimators of the population mean of the outcome of a treatment policy and its standard error are equal under certain experimental conditions. This result is used to develop a unified and efficient approach to design and inference for multistage trials of policies that adapt treatment according to discrete responses. We derive a sample size formula expressed in terms of a parametric version of the optimal SP population variance. Nonparametric (sample-based) ML estimation performed well in simulation studies, in terms of achieved power, for scenarios most likely to occur in real studies, even though sample sizes were based on the parametric formula. ML outperformed the SP estimator; differences in achieved power predominately reflected differences in their estimates of the population mean (rather than estimated standard errors). Neither methodology could mitigate the potential for overestimated sample sizes when strong nonlinearity was purposely simulated for certain discrete outcomes; however, such departures from linearity may not be an issue for many clinical contexts that make evaluation of competitive treatment policies meaningful.
1. INTRODUCTION
Increased interest in individualized treatment policies has shifted the focus of their methodological development from the analysis of “naturalistically” observed strategies (e.g. Murphy and others, 2001) to experimental evaluation of a preselected set of strategies via multistage designs (e.g. Lavori and Dawson, 2000). The candidate policies under study have been described as “adaptive” treatment strategies (ATS) or “dynamic” treatment regimes because treatment changes are tailored to the circumstances of the individual. The studies have been described as sequential, multiple assignment, randomized (SMAR) trials (Murphy, 2005) because successive courses of treatment are randomly and adaptively assigned over time, according to individual treatment and response history. The multiple randomization stages correspond to the sequential decision making formalized by an ATS.
The following ATS exemplifies those evaluated in the SMAR trial of antidepressants known as STAR*D (Rush and others, 2004): “Start on treatment A; switch to B if poor response or persistent side effects, otherwise, either continue on A or augment A with C, depending on degree of improvement; continue to monitor and switch to D or augment with F, respectively, according to degree of response.” As in STAR*D, the SMAR design to evaluate this and related ATS specifies that all subjects in the trial start on A, so that the first randomization is to possible options for B and C, nested within the response categories for treatment with A. Further randomization to options for D and F is similarly nested within previous treatment and response history. Other SMAR designs may start with nonadaptive randomization, e.g. to different choices for A.
Clinical equipoise successively guides SMAR options for B,C, D and F. That principle, coupled with standardizing of clinical details, such as dosing, reduces the usual explosive variation in treatment regimes found in observational settings. Accordingly, there is often no need to parametrically smooth SMAR trial data, a property that allows us to establish direct connections among different methodological approaches. This paper shows that the simplest estimators of the population mean of the outcome of an ATS and its standard error, derived using probability calculus and “plug-in” method of moments (MOM) estimates, are equal under certain experimental conditions to the analogous estimators provided by optimal semiparametric (SP) theory, maximum likelihood (ML) theory, and Bayesian predictive inference. In particular, we assume that constrained randomization ensures the observed allocation of subjects matches that intended by design and that the sample size is large enough to ensure “replete” data sets at the end of the experiment, in the sense of precluding random zeroes at intermediate randomization steps (Lavori and @x Dawson, 2007).
The equality of the optimal variance estimator with the others is not obvious by appearance and full induction across randomization stages is required to derive the result algebraically. The different formulations for standard error exemplify methodological differences. The iterative probability calculus underlying MOM, ML and predictive estimators is carried out sequentially to reflect the influence due to intervening outcomes used for (nested) multistage randomization. The resulting variance estimator decomposes into stage-specific components (Lavori and @x Dawson, 2007), which quantify the inference “penalty” paid for not knowing a priori the joint outcome distribution (Dawson and Lavori, 2008). The efficient SP influence function used to obtain the optimal variance estimator is specified in terms of the marginal mean of the outcome measured at the end of the study (Murphy and others, 2001). The resulting variance estimator derives from the population marginal variance of the final outcome, typically used for determining the sample size for a single-stage trial, plus a sum of stage-specific variances of the inversely weighted final outcome.
In this paper, we exploit the marginal character of the SP approach to develop a regression-based formula suitable for sample size calculations, which minimizes reliance on unknown population parameters. We also derive a nonparametric counterpart for the SP efficiency gains provided by the optimal estimator, relative to the simpler marginal mean (MM) estimator defined by Murphy (2005) for SMAR trials. We consider the performance of ML and SP inference, in terms of achieved power, when using the regression-based sample size formula. The intent is to provide a unified and efficient approach to design and inference for SMAR trials of ATS that adapt treatment according to discrete responses.
2. DESIGN FRAMEWORK AND ESTIMATORS
Consider a K-stage trial. For stage k in 1,…,K, let Sk be the status of the subject measured at the start of the kth stage and Ak the treatment assigned by the kth randomization according to values for Sk = (S1,S2,…,Sk) and Ak − 1 = (A1,A2,…,Ak − 1), with A1 a function of S1. SMAR assignment to treatment options can be expressed in terms of (sequential) allocation to different decision rules, which determine treatment as a function of treatment and response history. We write ak = dk(Sk = sk,Ak − 1 = ak − 1) for the decision rule dk at the kth stage, where ak and sk denote values for treatment and state; the randomization probabilities for dk, denoted {pk(dk|Sk,Ak − 1)}, are known and experimentally fixed functions of prior state-treatment history. The strategies to be evaluated can be represented as sequences of the decision rules with positive probability of assignment. Each sequence d = {d1,d2,…,dK} corresponds to an ATS if the domain for each successive rule includes the state-treatment histories produced by previous rules in the sequence. This condition ensures that the K-stage ATS is a well-defined policy for adaptively determining the “next” treatment. The introductory example consists of two decision rules {d1, d2}: A = d1(S1 = 1), A + C = d1(S1 = 2) and B = d1(S1 = 3), where the S1 indicates response to A. The second decision rule is similarly defined, e.g. a1 = d2(S2 = 1,a1), where S2 indicates response measured after a1 = d1(S1).
The SMAR design includes a primary outcome Y for evaluation purposes, obtained after the Kth stage of randomization. We judge the performance of an ATS d byμd, the population mean of Y that would be observed if all subjects were treated according to d.
2.1 Estimator of the mean of an ATS

and
. The optimal estimator is the solution to the efficient estimating equation
= 0, where n is the number of subjects and Uopt is 
The G-computational formula (2.1) can be used to provide consistent nonparametric estimates of the μk (given SMAR), in which case, the solution to the estimated estimating equation is optimal (most efficient) (Murphy and others, 2001). With some calculation, the optimal estimator also reduces to (2.1), a result that holds even if the observed assignment proportions differ from the preset probabilities (despite Uopt being defined in terms of randomization probabilities).
Because the ML estimates for means and proportions coincide with the plug-in estimates obtained by the MOM for common distributions of interest here, (2.1) is also ML. It is also equal to the predictive estimator of μd, assuming noninformative priors (Dawson and Lavori, 2008). We therefore refer to (2.1) unambiguously as the estimator of the ATS mean, denoted
.
2.2 Variance estimators of the estimator of the mean of an ATS
To obtain the ML variance of
, we assume that (i) the final outcome Y has a stratified normal (continuous case) or Bernoulli (discrete case) distribution across strata indexed by the possible sequences (sK,aK); (ii) the intermediate states Sk are distributed conditionally, given (sk − 1,ak − 1), as multinomial random variables; (iii) model parameters are distinct across state-treatment histories for a given stage k and across stages (reflective of SMAR allocation). Because the sequence of nested randomizations in a SMAR trial gives rise to a monotone pattern of missingness for each ATS, the likelihood for the parameters in (i) and (ii) can be factored into components, each of which is a complete-data problem; standard theory dictates that the information matrix and hence (asymptotic) ML covariance matrix of the parameters is block diagonal, with each block corresponding to a complete-data component (Little and Rubin, 1987). It is possible to obtain the ML variance of
from the block-diagonal covariance matrix (once calculated); however, a more tractable derivation uses iterated variance decomposition (Little and Rubin, 1987). The application to the SMAR set up factors the term of (2.1) for sK into φK(sK)mK(sK);φk(sk) = φk − 1(sk − 1)fk(sk),k = 2,…,K. The iterated calculation produces the same estimator obtained using probability calculus coupled with MOM (Lavori and @x Dawson, 2007) or Bayesian predictive inference (Dawson and Lavori, 2008). We use to denote the variance estimator of
provided by these three derivations.
is the “naïve” variance estimate that assumes the coefficients of mK(sK) in (2.1) are known a priori, and is the “penalty” paid for estimating them: 
is the sample variance of mK≡mK(sK) and φK′≡φK(sK′) (Dawson and Lavori, 2008). The can be obtained by induction on k, with K = 1 being the usual multinomial calculation (Lavori and @x Dawson, 2007). For general K, decomposes into stage-specific components of penalty variance, with the kth-stage term of reducible up to multiplicative factors to See Appendix A in the supplementary materials available at Biostatistics online.
, where 
In Appendix A in the supplementary materials available at Biostatistics online, we use induction to algebraically show equality of the variance estimators. The result holds asymptotically without restriction, but for finite samples requires that the observed allocation of subjects matches that intended by design, as set out in Section 2.1 for the MM estimator of μd. For analytic derivations, we assume blocking or some other form of constrained randomization makes this distinction moot and use the notation pk(dk|Sk,Ak − 1) interchangeably for expected and observed proportions under d.


provides a direct comparison of to the penalty component of in (2.3). Because the kth-stage term of restricts covariance uncertainty tofk andfk′ (see above), the difference gives rise to K remainder terms that telescope to zero (proved inductively). We remark that the variance components of V(Uopt), and consequently of are expressed in terms of squared deviations, a property that must be shared by for equality to hold. This motivates (i) as the likelihood model in Y.3. SP EFFICIENCY GAINS WITH THE OPTIMAL ESTIMATOR

, where nK≡nK(sK) is the number of subjects sequentially randomized to d through K and having state history sK; (2.5) becomes 
Consider and the kth-stage term of (3.8). Let
, which can be reexpressed as
noting that
. Suppose that Sk is binary (achievable by introducing more stages) taking on values sk, sk′. Accordingly, Δk can be sequentially defined in terms of stage-specific response heterogeneity: Δk = δk + δk − 1 + ⋯ + δ2 + δ1, where
, δ1 = Δ1 and fk′ = fk(sk − 1,sk′) = 1 − fk(sk − 1,sk)≡1 − fk. The derivation follows by induction.
We can reexpress directly in terms of the δk when pk(dk|Sk, Ak − 1)≡pk(dk) for all k. The case K = 3 suffices to concretely explicate the general result:

The SMAR randomization probabilities specified by the trialist govern SP efficiency gains with in a simple way under the assumed restrictions. The strength of the relationship of state history to Y, as evidenced by the magnitudes of the δk, has impact as well, consistent with simulated results for two-stage trials (Wahed and Tsiatis, 2004). Differentiation of (3.9) shows that efficiency is maximized when each Sk acts like a flip of a fair coin, thereby allowing sequential allocation of subjects to each possible state history. The worst improvement occurs when Sk is a degenerate binomial (all mass on one outcome) at each stage but the last. But then the study is not adaptive before the last stage and is equivalent to the cross-sectional K = 1 case.
4. OPTIMAL SP VARIANCE FOR SAMPLE SIZE CALCULATIONS
because of its marginal formulation. We further assume that Ed[(Y − μk)2|sd,k] = Vd(Y|sd,k) = σk2(sd,k) is homogeneous across state history at k, i.e. σk2(sd,k)≡σk,d2≡σk2, in order to reexpress V(U)opt in terms of familiar regression quantities. Applying iterated expectation to the kth-stage term in (2.4) gives Ed[(1 − pk)Pk − 1(Y − μk)2] = Ed[(1 − pk)Pk − 1]σk2. Moreover, Ed[(1 − pk)Pk − 1] = (1 − pk)Pk − 1 if the kth-stage randomization probabilities are all equal to pk(dk), as would occur in a “balanced' SMAR trial. In this case, V(Uopt) = σY2+∑k = 1K(1 − pk)Pk − 1σk2, where σY,d2≡σY2 is the marginal variance of Yd. Let RT2 = (1 − σK2/σY2) be the coefficient of determination for the regression of Yd on Sd,K, and Rk2 denote the (population) increment in coefficient of determination when Sd,k is added to the regression of Yd on Sd,k − 1. Then V(Uopt) becomes 

, where α is the significance level, 1 − β is the power to be achieved, and ES = (μd − μ0)/σY is the standardized difference between μd and the null meanμ0.For a balanced SMAR trial in which pk(dk|Sk, Ak − 1)≡pk(dk) for all k, the sample size formula only requires that the unknown distribution of (Sd,K, Y) be restricted in terms of the Vd(Y|sd,k), assumed homogenous across state values Sd,k = sd,k. Homogeneity of variance is a simplifying assumption typical of power calculations for fixed treatment trials, but sequential allocation makes the assumption unlikely for allK stages. More subtly, the assumed equality of the Vd(Y|sd,k) algebraically transforms the stratified (nonparametric) regression structure of V(Uopt) resulting from optimal SP theory into linear association, as characterized by the Rk2 in the VIF. Although the requirement of homogenous variances does not directly restrict conditional expectations, (4.10) or (4.11) may only partially account for any nonlinearity in the Ed(Y|sd,k).
We remark that our sample size formulae are conditional on the SMAR allocation and hence may underestimate the required number of subjects; calculations based on (4.10) suggest that the typical impact of such conditioning will be negligible when constrained randomization is used.
5. SIMULATION STUDIES
A central issue to the performance of the sample size formula in Section 4 is how well the parametric reexpression of V(Uopt), derived assuming homogeneity of variance, reflects the nonparametric inference carried out using the estimators in Section 2. It may be that successive stratification leads to one or more random zeroes at intermediate stages of randomization, even if the nominal level of power is achieved (in the frequency sense). As the sample size grows, the chance of this diminishes. We conducted simulations to understand the degree to which good performance of the sample size formula across repeated samples protects the trialist from an unlucky (nonreplete) SMAR realization. Because the formula may also fail to protect against near sampling zeroes (and thereby interfere with constrained randomization), we calculated the test statistic twice, using ML and SP estimators.
The simulation set up is structured to explicate the relationship between “repleteness,” defined as the lack of random zeroes at any intermediate stage of the SMAR experiment, and calculated sample size. Data for the ATS in the Introduction, denoted as d, are generated by the following scheme. The state space for symptom severity at each stage is {1,2,3}; these values determine whether to adaptively continue, augment or switch medication, using the stage-specific options specified by d. As in the STAR*D antidepressant study, Sd,1≡S1 is obtained after an initial trial on the medication A; baseline values are equiprobable. The values for Sd,2 evolve according to the transition matrix (TM) with rows (0.7, 0.2, 0.1), (0.5, 0.3, 0.2), (0.1, 0.5, 0.4), where TMij= Pr(j|i), consistent with “healthier” subjects having greater probability of better successive outcomes. The final outcome is generated as a regression on state history. For the continuous case, Yd = Sd,2Tβ + e, e∼N(0,σe2), where (β1,β2) = (1,2) and the intercept β0 = 0.5 is the coefficient for S0≡1. For the discrete case, Yd is Bernoulli with probability p = logit − 1(Sd,2Tβ),(β1,β2) = (1,2); β0 is − 6.0, − 4.5, or − 3.0 to govern the degree of nonlinearity in expected Bernoulli outcomes.
The simulation set up is further structured to investigate SP efficiency gains (relative to the MM estimator) beyond the case of a balanced design required by the analytic derivations in Section 3. In the simulations, random assignment to d depends on prior state values in the following way: subjects who are (well, in partial remission, ill) continue on d with probability (1, 1/3, 1/2); accordingly, we use formula (4.11) to calculate sample sizes. Sequential blocking is used throughout to ensure whenever possible that observed and expected allocations agree; constrained randomization also protects against nonrepleteness. Additionally, simulated trials vary by whether they use a “safe” mechanism to guarantee positive sample sizes across state histories at both stages of the trial (Lavori and @x Dawson, 2007). Specifically, safe implies that once the number of subjects for a particular state history falls below a certain value (set here to 6), further randomization stops and subjects with those states continue on d thereafter. The safe mechanism is intended to reflect the effects of good practice in the sense that the trialist would ensure repleteness either through design or by monitoring subject accrual during the trial.
For purposes of inference for μd, we set the standardized effect size in the sample size formula to be either 0.2 or 0.4. The trialist might specify the larger ES value to ensure adequate precision for individual ATS means when planning a pilot SMAR trial. The inherent “cost” in successfully implementing a whole-treatment strategy makes it unlikely that the trialist would find effects smaller than 0.2 of practical relevance.
6. RESULTS
Table 1 summarizes 2000 replications of the set up for continuous Yd for every combination of ES = 0.2, 0.4 and σe = 0.5, 1, 2. Throughout, the nominal level of power to be achieved was set to 0.80, with the level of the test = 0.05. The test statistic (the difference of the estimated mean and the null value divided by the standard error) was compared to 1.96, suggested by asymptotic normality of the ML and SP estimators of μd.
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is continuous. VIF is calculated from the regression of Yd on Sd, 2
| σe2 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| 0.5 | 0.2 | No | 1.62 | 320 | 99.3 | 0.798 | 0.737† |
| Yes | 1.62 | 320 | 100 | 0.798 | 0.756† | ||
| 0.4 | No | 1.62 | 80 | 59.6 | 0.818 | 0.664† | |
| Yes | 1.62 | 80 | 100 | 0.817 | 0.775 | ||
| 1.0 | 0.2 | No | 2.05 | 404 | 99.9 | 0.803 | 0.768† |
| Yes | 2.05 | 404 | 100 | 0.801 | 0.766† | ||
| 0.4 | No | 2.05 | 101 | 72.2 | 0.800 | 0.734† | |
| Yes | 2.05 | 101 | 100 | 0.849 | 0.826 | ||
| 2.0 | 0.2 | No | 2.97 | 587 | 100 | 0.801 | 0.795 |
| Yes | 2.97 | 587 | 100 | 0.792 | 0.784† | ||
| 0.4 | No | 2.97 | 147 | 88.6 | 0.803 | 0.780 | |
| Yes | 2.97 | 147 | 100 | 0.846 | 0.847 |
| σe2 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| 0.5 | 0.2 | No | 1.62 | 320 | 99.3 | 0.798 | 0.737† |
| Yes | 1.62 | 320 | 100 | 0.798 | 0.756† | ||
| 0.4 | No | 1.62 | 80 | 59.6 | 0.818 | 0.664† | |
| Yes | 1.62 | 80 | 100 | 0.817 | 0.775 | ||
| 1.0 | 0.2 | No | 2.05 | 404 | 99.9 | 0.803 | 0.768† |
| Yes | 2.05 | 404 | 100 | 0.801 | 0.766† | ||
| 0.4 | No | 2.05 | 101 | 72.2 | 0.800 | 0.734† | |
| Yes | 2.05 | 101 | 100 | 0.849 | 0.826 | ||
| 2.0 | 0.2 | No | 2.97 | 587 | 100 | 0.801 | 0.795 |
| Yes | 2.97 | 587 | 100 | 0.792 | 0.784† | ||
| 0.4 | No | 2.97 | 147 | 88.6 | 0.803 | 0.780 | |
| Yes | 2.97 | 147 | 100 | 0.846 | 0.847 |
Indicates that power differed significantly for ML and optimal SP estimation (at the 0.05 level).
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is continuous. VIF is calculated from the regression of Yd on Sd, 2
| σe2 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| 0.5 | 0.2 | No | 1.62 | 320 | 99.3 | 0.798 | 0.737† |
| Yes | 1.62 | 320 | 100 | 0.798 | 0.756† | ||
| 0.4 | No | 1.62 | 80 | 59.6 | 0.818 | 0.664† | |
| Yes | 1.62 | 80 | 100 | 0.817 | 0.775 | ||
| 1.0 | 0.2 | No | 2.05 | 404 | 99.9 | 0.803 | 0.768† |
| Yes | 2.05 | 404 | 100 | 0.801 | 0.766† | ||
| 0.4 | No | 2.05 | 101 | 72.2 | 0.800 | 0.734† | |
| Yes | 2.05 | 101 | 100 | 0.849 | 0.826 | ||
| 2.0 | 0.2 | No | 2.97 | 587 | 100 | 0.801 | 0.795 |
| Yes | 2.97 | 587 | 100 | 0.792 | 0.784† | ||
| 0.4 | No | 2.97 | 147 | 88.6 | 0.803 | 0.780 | |
| Yes | 2.97 | 147 | 100 | 0.846 | 0.847 |
| σe2 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| 0.5 | 0.2 | No | 1.62 | 320 | 99.3 | 0.798 | 0.737† |
| Yes | 1.62 | 320 | 100 | 0.798 | 0.756† | ||
| 0.4 | No | 1.62 | 80 | 59.6 | 0.818 | 0.664† | |
| Yes | 1.62 | 80 | 100 | 0.817 | 0.775 | ||
| 1.0 | 0.2 | No | 2.05 | 404 | 99.9 | 0.803 | 0.768† |
| Yes | 2.05 | 404 | 100 | 0.801 | 0.766† | ||
| 0.4 | No | 2.05 | 101 | 72.2 | 0.800 | 0.734† | |
| Yes | 2.05 | 101 | 100 | 0.849 | 0.826 | ||
| 2.0 | 0.2 | No | 2.97 | 587 | 100 | 0.801 | 0.795 |
| Yes | 2.97 | 587 | 100 | 0.792 | 0.784† | ||
| 0.4 | No | 2.97 | 147 | 88.6 | 0.803 | 0.780 | |
| Yes | 2.97 | 147 | 100 | 0.846 | 0.847 |
Indicates that power differed significantly for ML and optimal SP estimation (at the 0.05 level).
The results show that when ES = 0.2, the calculated sample sizes ensure repleteness for almost all experiments. By contrast, when ES = 0.40, the proportion of replete experiments among the 2000 replications ranges from 60% to 89%. One could argue that for most SMAR trials, the primary interest will be to detect moderate-sized causal effects, thereby increasing the sample size beyond that provided by the generalized t-test formula in Section 4 when ES = 0.4. Nonetheless, the simulations serve to illustrate the relevance of repleteness to good planning of a SMAR experiment, beyond the usual sample size considerations.
A more striking result in Table 1 are the differences in power achieved by the ML and optimal SP estimators. ML estimation is mostly robust to even substantial failures of repleteness, because of its use of sample quantities in (2.1) and (2.3) based on allocated proportions. In contrast, the SP reliance on assignment probabilities precludes the optimal estimator (and its standard error) from tuning to the sample at hand. This is true even with mostly replete repetitions, highlighting the influence of near sampling zeroes on achieved power with SP estimation. The expansion of Table 1 in Appendix B in the supplementary materials available at Biostatistics online shows that differences in power for the two approaches are influenced much more by their differences in estimates of μd than by differences in estimated standard errors. The cases n = 320,404 show this to be true for even modest loss of power when sample sizes for some strata are too small for sequentially blocked randomization to achieve a priori assignment probabilities. We note that the efficiency gains for ML estimation are modest, with relative efficiency running from 0.95 to 1.0, for simulated trials without safe turned on but using constrained randomization.
It is not surprising that the optimal estimator may sometimes be underpowered when the simulated trials use the safe option, given that certain a priori randomization probabilities may be set to zero. In contrast, ML estimation ensures nominal power in these cases, albeit conservatively for some scenarios. This property suggests that ML estimation is a suitable choice for inference, prior to the execution of the trial and any knowledge of the stochastic process underlying intermediate states. More generally, its “self-tuning” property of in the face of random and near sampling zeroes reminds us that the asymptotic ML variance estimator coincides with the finite sample one obtained from the MOM.
Table 2 shows that repleteness and near sampling zeroes have at most moderate impact on the SP efficiency gains provided by the optimal estimator; such impact occurs because of the (inversely weighted) estimates of theμk in Uopt. In theory, efficiency gains for fixed σe should not depend on n, and simulations with excessively large sample sizes show this to be the case. For the realistic values of n in Table 2, the relative efficiency for any given value of σe depends on whether the sample size was geared to ES = 0.2 or ES = 0.4. Nonetheless, the results of the simulations confirm that the strength of the relationship of state history to Yd, as evidenced by the RT2 values, governs the magnitude of efficiency gains.
Relative efficiency of the optimal SP estimator to the MM SP estimator when Yd is continuous. RT2 and the sample size n are calculated as described in Section 4
| σe2 | ES | Safe | RT2 | n | Replete (%) | / |
| 0.5 | 0.2 | No | 0.95 | 320 | 99.3 | 0.425 |
| Yes | 0.95 | 320 | 100 | 0.434 | ||
| 0.4 | No | 0.95 | 80 | 59.6 | 0.404 | |
| Yes | 0.95 | 80 | 100 | 0.626 | ||
| 1.0 | 0.2 | No | 0.81 | 404 | 99.9 | 0.508 |
| Yes | 0.81 | 404 | 100 | 0.511 | ||
| 0.4 | No | 0.81 | 101 | 72.2 | 0.460 | |
| Yes | 0.81 | 101 | 100 | 0.600 | ||
| 2.0 | 0.2 | No | 0.52 | 587 | 100 | 0.687 |
| Yes | 0.52 | 587 | 100 | 0.682 | ||
| 0.4 | No | 0.52 | 147 | 88.6 | 0.607 | |
| Yes | 0.52 | 147 | 100 | 0.678 |
| σe2 | ES | Safe | RT2 | n | Replete (%) | / |
| 0.5 | 0.2 | No | 0.95 | 320 | 99.3 | 0.425 |
| Yes | 0.95 | 320 | 100 | 0.434 | ||
| 0.4 | No | 0.95 | 80 | 59.6 | 0.404 | |
| Yes | 0.95 | 80 | 100 | 0.626 | ||
| 1.0 | 0.2 | No | 0.81 | 404 | 99.9 | 0.508 |
| Yes | 0.81 | 404 | 100 | 0.511 | ||
| 0.4 | No | 0.81 | 101 | 72.2 | 0.460 | |
| Yes | 0.81 | 101 | 100 | 0.600 | ||
| 2.0 | 0.2 | No | 0.52 | 587 | 100 | 0.687 |
| Yes | 0.52 | 587 | 100 | 0.682 | ||
| 0.4 | No | 0.52 | 147 | 88.6 | 0.607 | |
| Yes | 0.52 | 147 | 100 | 0.678 |
Relative efficiency of the optimal SP estimator to the MM SP estimator when Yd is continuous. RT2 and the sample size n are calculated as described in Section 4
| σe2 | ES | Safe | RT2 | n | Replete (%) | / |
| 0.5 | 0.2 | No | 0.95 | 320 | 99.3 | 0.425 |
| Yes | 0.95 | 320 | 100 | 0.434 | ||
| 0.4 | No | 0.95 | 80 | 59.6 | 0.404 | |
| Yes | 0.95 | 80 | 100 | 0.626 | ||
| 1.0 | 0.2 | No | 0.81 | 404 | 99.9 | 0.508 |
| Yes | 0.81 | 404 | 100 | 0.511 | ||
| 0.4 | No | 0.81 | 101 | 72.2 | 0.460 | |
| Yes | 0.81 | 101 | 100 | 0.600 | ||
| 2.0 | 0.2 | No | 0.52 | 587 | 100 | 0.687 |
| Yes | 0.52 | 587 | 100 | 0.682 | ||
| 0.4 | No | 0.52 | 147 | 88.6 | 0.607 | |
| Yes | 0.52 | 147 | 100 | 0.678 |
| σe2 | ES | Safe | RT2 | n | Replete (%) | / |
| 0.5 | 0.2 | No | 0.95 | 320 | 99.3 | 0.425 |
| Yes | 0.95 | 320 | 100 | 0.434 | ||
| 0.4 | No | 0.95 | 80 | 59.6 | 0.404 | |
| Yes | 0.95 | 80 | 100 | 0.626 | ||
| 1.0 | 0.2 | No | 0.81 | 404 | 99.9 | 0.508 |
| Yes | 0.81 | 404 | 100 | 0.511 | ||
| 0.4 | No | 0.81 | 101 | 72.2 | 0.460 | |
| Yes | 0.81 | 101 | 100 | 0.600 | ||
| 2.0 | 0.2 | No | 0.52 | 587 | 100 | 0.687 |
| Yes | 0.52 | 587 | 100 | 0.682 | ||
| 0.4 | No | 0.52 | 147 | 88.6 | 0.607 | |
| Yes | 0.52 | 147 | 100 | 0.678 |
Table 3 for the binary set up shows the sample size formula provides close to the nominal power of 0.80, albeit smaller at times, for at most moderate nonlinearity in expected Bernoulli outcomes (β0 = − 6.0, − 4.5), and is conservative otherwise. We attribute the excessive sample sizes for the case β0 = − 3.0 to the inability of the VIF to adequately account for strong nonlinearity rather than due to marked failure of sequential homogeneity of variance, given good performance for the normal model set up in the presence of this type of failure (Dawson and Lavori, 2010). However, strong departures from linearity may not be of issue for many realistic applications because of the impact on μd, which is much higher for β0 = − 3.0: μd = 0.82 compared to μd = 0.44, 0.63 for β0 = − 6.0, − 4.5, respectively. ATS will tend to be moderately successful (or not) in populations with sufficient response heterogeneity to make sequential treatment adaptation clinically attractive, making values of μd such as 0.82 unlikely to occur.
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is binary. VIF is calculated from the regression of Yd on Sd, 2
| β0 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| – 6.0 | 0.2 | No | 3.12 | 616 | 100 | 0.768 | 0.757† |
| Yes | 3.12 | 616 | 100 | 0.762 | 0.751† | ||
| 0.4 | No | 3.12 | 154 | 90.3 | 0.772 | 0.749 | |
| Yes | 3.12 | 154 | 100 | 0.804 | 0.814 | ||
| – 4.5 | 0.2 | No | 3.33 | 657 | 100 | 0.812 | 0.797† |
| Yes | 3.33 | 657 | 100 | 0.807 | 0.791† | ||
| 0.4 | No | 3.33 | 164 | 93.1 | 0.829 | 0.793† | |
| Yes | 3.33 | 164 | 100 | 0.855 | 0.859 | ||
| – 3.0 | 0.2 | No | 3.94 | 779 | 100 | 0.942 | 0.931† |
| Yes | 3.94 | 779 | 100 | 0.945 | 0.931† | ||
| 0.4 | No | 3.94 | 195 | 96.1 | 0.991 | 0.989 | |
| Yes | 3.94 | 195 | 100 | 0.997 | 0.996 |
| β0 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| – 6.0 | 0.2 | No | 3.12 | 616 | 100 | 0.768 | 0.757† |
| Yes | 3.12 | 616 | 100 | 0.762 | 0.751† | ||
| 0.4 | No | 3.12 | 154 | 90.3 | 0.772 | 0.749 | |
| Yes | 3.12 | 154 | 100 | 0.804 | 0.814 | ||
| – 4.5 | 0.2 | No | 3.33 | 657 | 100 | 0.812 | 0.797† |
| Yes | 3.33 | 657 | 100 | 0.807 | 0.791† | ||
| 0.4 | No | 3.33 | 164 | 93.1 | 0.829 | 0.793† | |
| Yes | 3.33 | 164 | 100 | 0.855 | 0.859 | ||
| – 3.0 | 0.2 | No | 3.94 | 779 | 100 | 0.942 | 0.931† |
| Yes | 3.94 | 779 | 100 | 0.945 | 0.931† | ||
| 0.4 | No | 3.94 | 195 | 96.1 | 0.991 | 0.989 | |
| Yes | 3.94 | 195 | 100 | 0.997 | 0.996 |
Indicates that power differed significantly for ML and optimal SP estimation (at the 0.05 level).
Performance of the sample size formula for nominal power = 0.80 using either ML estimation or optimal SP estimation when Yd is binary. VIF is calculated from the regression of Yd on Sd, 2
| β0 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| – 6.0 | 0.2 | No | 3.12 | 616 | 100 | 0.768 | 0.757† |
| Yes | 3.12 | 616 | 100 | 0.762 | 0.751† | ||
| 0.4 | No | 3.12 | 154 | 90.3 | 0.772 | 0.749 | |
| Yes | 3.12 | 154 | 100 | 0.804 | 0.814 | ||
| – 4.5 | 0.2 | No | 3.33 | 657 | 100 | 0.812 | 0.797† |
| Yes | 3.33 | 657 | 100 | 0.807 | 0.791† | ||
| 0.4 | No | 3.33 | 164 | 93.1 | 0.829 | 0.793† | |
| Yes | 3.33 | 164 | 100 | 0.855 | 0.859 | ||
| – 3.0 | 0.2 | No | 3.94 | 779 | 100 | 0.942 | 0.931† |
| Yes | 3.94 | 779 | 100 | 0.945 | 0.931† | ||
| 0.4 | No | 3.94 | 195 | 96.1 | 0.991 | 0.989 | |
| Yes | 3.94 | 195 | 100 | 0.997 | 0.996 |
| β0 | ES | Safe | VIF | n | Replete (%) | Power: ML | Power: optimal SP |
| – 6.0 | 0.2 | No | 3.12 | 616 | 100 | 0.768 | 0.757† |
| Yes | 3.12 | 616 | 100 | 0.762 | 0.751† | ||
| 0.4 | No | 3.12 | 154 | 90.3 | 0.772 | 0.749 | |
| Yes | 3.12 | 154 | 100 | 0.804 | 0.814 | ||
| – 4.5 | 0.2 | No | 3.33 | 657 | 100 | 0.812 | 0.797† |
| Yes | 3.33 | 657 | 100 | 0.807 | 0.791† | ||
| 0.4 | No | 3.33 | 164 | 93.1 | 0.829 | 0.793† | |
| Yes | 3.33 | 164 | 100 | 0.855 | 0.859 | ||
| – 3.0 | 0.2 | No | 3.94 | 779 | 100 | 0.942 | 0.931† |
| Yes | 3.94 | 779 | 100 | 0.945 | 0.931† | ||
| 0.4 | No | 3.94 | 195 | 96.1 | 0.991 | 0.989 | |
| Yes | 3.94 | 195 | 100 | 0.997 | 0.996 |
Indicates that power differed significantly for ML and optimal SP estimation (at the 0.05 level).
The performance of SP and ML estimation is more similar for the binary case than for continuous Yd, although larger sample sizes (expected for discrete outcomes) promote significant differences in achieved power. The impact on achieved power due to differences in estimates of μd is sometimes canceled out by the impact due to differences in estimated standard errors. When repleteness held across replications, differences in standard error had modest impact. See Appendix B in the supplementary materials available at Biostatistics online.
7. DISCUSSION
Prior development of SMAR sample size formulae derived from SP theory have specified estimators that used the known randomization probabilities (Murphy, 2005; Feng and Wahed, 2009) and did not use the most efficient influence function as a basis for the derivations. Building upon that work, we have developed theoretical connections that not only provide a more efficient basis for sample size calculations but also help to establish the advantage of using observed (ML) rather than expected (optimal SP) allocations for planned evaluation of ATS. The better performance of ML estimation in terms of achieved power parallels the superiority of model-based weights for studies with nonrandomized treatments (less bias) or missing data (better efficiency) (see, e.g. Rotnitzky and Robins, 1995). We note that the results obtained here may be specific to the sequential context in which randomizations are adaptively nested over time.
The sample and population formulations of SP variance in this paper elucidate the central role played by response heterogeneity in determining the magnitude of sequential uncertainty. Section 3 offers a nonparametric characterization of sample response heterogeneity in terms of stage-specific between-subgroup sum of squares, which captures the sequential effect of response heterogeneity on SP efficiency. The increments in regression-based coefficients of determination defined in Section 4 provide the parametric counterparts at the population level and describe the sequential effect of response heterogeneity on sample size requirements. Less apparent is the intrinsic role of response heterogeneity to estimators developed for SMAR data. The entire premise of an ATS relies on a strong relationship between outcome and state on which to base decisions. Because the SMAR design mimics sequential decision making, the missingness intentionally created by sequential (nested) randomization is governed implicitly by variation in responses across states for any given strategy. In the absence of such variation, treatment assignment at any given stage reduces to a flip of a fair coin, making sequential adjustment for state history unnecessary. For certain estimators, such as the ML and optimal SP ones considered here, their adjustment for SMAR missingness to guarantee consistency also reaps the usual efficiency gains, as translated to the sequential context.
The sample size formulae we developed apply directly to inference for a single ATS but require extension for paired comparisons. In Dawson and Lavori (2010), we use the ML formulation of variance to derive an analytic approximation to (positive) between-strategy covariance created by sequential nested randomization and adjust sample sizes for pairwise comparisons accordingly. The adjusted sample size formulae are the basis of a method we establish for sizing a SMAR trial with the goal of fully powering all pairs of strategies deemed “distinct” (defined in terms of effect size).
The results in this paper emphasize the importance of running a “tight” trial, using sequentially constrained randomization in combination with some version of an a priori designated safe option. The trialist should also consider whether the calculated sample size will sufficiently protect against sparse data and whether a larger number of subjects might circumvent the need for a safe option, which effectively truncates the ATS under evaluation. The simulation set up provides one means to translate clinical judgments about intermediate response rates into the frequentist probability of experimental repleteness. The trialist can also use the simulation set up to “firm up” guesses for variance inflation factors when more than moderate nonlinearity is suspected.
SUPPLEMENTARY MATERIAL
Supplementary material is available at http://biostatistics.oxfordjournals.org.
FUNDING
The National Institute of Mental Health (R01-MH51481 to Stanford University).
Conflict of Interest: None declared.
