Borrowing of information across patient subgroups in a basket trial based on distributional discrepancy

Summary Basket trials have emerged as a new class of efficient approaches in oncology to evaluate a new treatment in several patient subgroups simultaneously. In this article, we extend the key ideas to disease areas outside of oncology, developing a robust Bayesian methodology for randomized, placebo-controlled basket trials with a continuous endpoint to enable borrowing of information across subtrials with similar treatment effects. After adjusting for covariates, information from a complementary subtrial can be represented into a commensurate prior for the parameter that underpins the subtrial under consideration. We propose using distributional discrepancy to characterize the commensurability between subtrials for appropriate borrowing of information through a spike-and-slab prior, which is placed on the prior precision factor. When the basket trial has at least three subtrials, commensurate priors for point-to-point borrowing are combined into a marginal predictive prior, according to the weights transformed from the pairwise discrepancy measures. In this way, only information from subtrial(s) with the most commensurate treatment effect is leveraged. The marginal predictive prior is updated to a robust posterior by the contemporary subtrial data to inform decision making. Operating characteristics of the proposed methodology are evaluated through simulations motivated by a real basket trial in chronic diseases. The proposed methodology has advantages compared to other selected Bayesian analysis models, for (i) identifying the most commensurate source of information and (ii) gauging the degree of borrowing from specific subtrials. Numerical results also suggest that our methodology can improve the precision of estimates and, potentially, the statistical power for hypothesis testing.


Introduction
There has been an increasing interest in precision medicine (Mirnezami et al., 2012;Schork, 2015) over the past few decades.Rapid advances in genomics and biomarkers allow stratification of patients into subgroups that may have different benefit from new treatments.Unlike the one-size-fits-all concept in conventional paradigms of clinical drug development, the aim of precision medicine is to target the right treatments to the right patients at the right time.In the era of precision medicine, new trial designs have been developed, several of which are examples of master protocols (Woodcock and LaVange, 2017;Renfro and Mandrekar, 2018) to study multiple diseases or multiple agents, or sometimes both.One well known class of master protocols is basket trials (Renfro and Sargent, 2017).In the simplest formulation, basket trials evaluate a single targeted agent to patients that share a common feature but may present various disease subtypes.It is administratively more efficient to plan a basket trial than a number of separate trials for the small subgroups, respectively.Moreover, advantages also include addressing multiple research questions simultaneously, for instance, which subgroup(s) of patients may benefit and to what extent.
When analysing early phase basket trials, one major concern is the potential heterogeneity of the treatment effect in various patient subgroups.Investigators are faced with the dilemma of discarding or incorporating data from other subgroups to reach a decisive conclusion about the treatment effect for a specific subgroup.The option of using concurrent information from other subgroups is intriguing, as it may lead to considerable increase in the statistical power of the study.This should be balanced with the risk that treatment effect in an important patient subgroup may be overlooked or missed.Conventional analysis strategies such as stand-alone analyses (also known as the approach of no borrowing) and complete pooling irrespective of subgroup labels have been criticised.Some authors propose using hierarchical models, as a compromise between the two limiting opinions, to enable borrowing of information across subgroups (Thall et al., 2003;Thall and Wathen, 2008;Berry et al., 2013).Such well-established approach of borrowing is based upon the classical concept of exchangeability (Bernardo, 1996): it acknowledges that the magnitude of clinical benefit may differ, but nothing is known a priori to suppose patients of some subgroups benefit better than others.Neuenschwander et al. (2016) disucss a robust extension to the standard hierarchical models by including the possibility of non-exchangeability for each parameter (vector) that underpins the clinical benefit in a subgroup.Their approach permits an extreme subgroup not to be overly influenced by other subgroups in situations of data inconsistency.
Additional concerns about the subgroup effect are essential in precision medicine.Often, the targeted therapy is effective only in some subgroups, and certain subgroups illustrate more similar clinical benefit between themselves than with others.Several variations of standard hierarchical modelling have been considered suitably for the context of basket trials to implement borrowing of information (Liu et al., 2017;Chu and Yuan, 2018).Modifications are motivated mainly by (i) justification about plausible clustering of similar subgroups, and (ii) quantification about the magnitude, to which a subgroup-specific parameter should be shrunk towards the population mean.Most recently, more sophisticated methods in the framework of Bayesian model averaging (Madigan and Raftery, 1994;Draper, 1995) have been applied to analysing basket trials.Psioda et al. (2019) average over the complete model space, which is constituted by all models for possible configurations of the subgroups that may demonstrate the same or disparate efficacy.In a model that assumes identical treatment effect among certain subgroups, information is pooled across the corresponding subgroups under the assumption of interpatient exchangeability.The number of models to be included in the complete model space for averaging increases exponentially with the number of subgroups involved in the basket trial.Hobbs and Landin (2018) enumerate all possible subgroup pairs, wherein the parameters are considered to be either exchangeable or non-exchangeable.Instead of considering the prior probability of an individual model for a subgroup pair to be true, they use the product of prior probabilities that the parameters of each subgroup pair being exchangeable or not.This can greatly reduce the computational complexity, compared with conventional Bayesian model averaging.
To date, analysis methods for basket trials have predominantly been proposed in oncology, where each subtrial testing the treatment in a specific subgroup is single-arm with a binary endpoint.In this paper, we propose methodology motivated by a randomised, placebo-controlled phase II basket trial, which is being undertaken in patients of several related conditions that share symptoms.Clinical efficacy attributed to the treatment in every subtrial is recorded on a continuous outcome.We build our work upon commensurate priors (Hobbs et al., 2011(Hobbs et al., , 2012) ) for parameters of a general linear model.The commensurate priors approach can be seen as a special kind of hierarchical modelling, which is more advantageous for leveraging external information in situations of just a few relevant studies.Using it can facilitate the inferences with respect to all possible pairwise borrowing of information between K subgroups involved in a basket trial, according to a formal assessment of information commensurability.
Given any subtrial information external to the one of analysis interest, a commensurate prior can be specified.Specifically, it is a normal predictive prior centred at the external subtrial data parameter, with a precision parameter to capture the commensurability of the treatment effect that underpins the external and contemparory subtrials.We explore an empirical choice of the spike-and-slab prior (Mitchell and Beauchamp, 1988) to be placed on the precision parameter, which determines the degree of borrowing.For overcoming prior-data conflict, we propose using a distributional discrepancy measure to characterise the commensurability of information between any two subtrials.It could quantify the probability mass to be placed on the "spike" prior for strong borrowing, and that on the "slab" prior for discounting inconsistent information from an external subtrial, respectively.This discrepancy measure meanwhile discerns the external subtrials (when at least three subgroups are involved) according to their relative commensurability, and can therefore encourage differential borrowing of information for inferences of the model parameter specific to a subtrial.The proposed analysis methodology for information sharing is fundamentally different from the existing approaches.It avoids the limiting assumption about exchangeability for parameters of certain or all subgroups, but the distributional discrepancy measure informs the one-to-one borrowing from each external subgroup.In contrast to the Bayesian model averaging strategies, our approach presents great simplicity particularly when the investigation involves a large number of subgroups.
The remainder of this paper is structured as follows.We describe the motivating example and decision criteria in Section 2. In Section 3, we present our analysis methodology and discuss how a discrepancy measure may help make appropriate use of basket trial data.In Section 4, we perform a simulation study to evaluate the operating characteristics of phase II basket trials that would have been analysed using the proposed methodology, and compare our Bayesian model with some alternative analysis models.We close with a discussion of our findings and future research arising in Section 5.

Motivating example and notation
We use a randomised, placebo-controlled phase II basket trial, as a motivating example, which evaluates a new treatment for cognitive dysfunction in patients of primary biliary cholangitis (PBC) and Parkinson's disease (PD).This clinical trial is led by Newcastle University; at the time of writing, it has been funded but not yet opened to participants.Patients are to be recruited and stratified into three disjoint subgroups, representing different clinical conditions.The PBC and PD basket trial thus comprises of three subtrials, which hereafter will be referred to as modules.A continuous outcome measuring cognitive performance will be used as the clinical endpoint in each module.Once the trial begins, patients of each module will be randomised to receive either the new treatment or a placebo.
Letting k = 1, . . ., K index the modules, we suppose there are n k patients recruited in module k.The binary indicator with respect to the treatment assignment is denoted by T ik for patient i, i = 1, . . ., n k .Specifically, T ik = 1 if patient i in module k was allocated to treatment and 0 if placebo.The trial data collected from patient i of module k comprise some covariates denoted by z ik = {z ik1 , . . ., z ikp }, and a post-treatment outcome of clinical interest, denoted by y ik .The trial data can be fitted with a linear regression model given by: where γ k is a (1 × p) coefficient vector representing the main effects of the covariates.We are interested in estimating the parameter θ k that underpins the treatment effect in patients from a specific module, since within each module E(y ik |z ik , T ik = 0) = z ik γ k leads to an estimator of the treatment effect over a placebo, denoted by ∆(T ik ) = θ k .
The principal aim of this basket trial is to estimate the treatment effect per module.A more accurate estimate for θ k helps to support the decision as to whether a phase III trial of the treatment should go ahead, and in which patient subpopulation(s).Moreover, inferences based on evidence of the basket trial can inform the design of a future trial, such as computing the sample size to sufficiently power the trial.With respect to either continuing or halting the clinical evaluation, trial decisions per module can be framed as a decision between Go and No-go.For cases where the randomised basket trial is analysed in a Bayesian framework (Spiegelhalter et al., 1994), we follow Freedman et al. (1984) and consider testing the hypothesis of treatment effect that corresponds to a range of null θ k 's in each module: In addition, a threshold δ U will be needed to represent the magnitude of improvement required to conclude the new treatment is sufficiently promising to warrant further trials.It is not uncommon to set δ U a value greater than 0. Conclusions may be drawn by relating the inferences concerning the module-specific parameters θ k , say, the posterior distribution of θ k , to the threshold δ U .We will now define a formal criterion as follows: (i) A Go decision will be adopted for module (ii) otherwise, a No-go will be taken.
This decision criterion consists of a probability threshold ζ ∈ [0, 1] to represent our confidence in the advantages of the new treatment over the control.We note choices such as ζ = 0.90 may be appropriate for the context.In what follows, we move on to present a novel analysis methodology to enable robust borrowing of information between consistent modules of a basket trial.

Methods
In this section, we consider a formal incorporation of external module data into the analysis of a contemporary module in the Bayesian framework.Specifically, information from an external module will be used to formulate a commensurate predictive prior (CPP), first proposed by Hobbs et al. (2011Hobbs et al. ( , 2012)), for the parameter that underpins the treatment effect in the module of the current analysis interest.
We suppose the patient-level trial data can be modelled using a linear regression in the form of (1), and θ k that underpins the treatment effect specific to a module is a continuous location parameter.Let x k denote the data collected from module k.The information contained in module k can be represented by a posterior distribution that is proportional to the product of the likelihood and an initial vague prior, denoted by π 0k (θ k ): We label the module of our current analysis interest with k .For estimating each θ k , there are (K − 1) CPPs, each specified based upon x k from an external module k = k , respectively.Following Hobbs et al. (2011), we introduce a precision parameter, denoted by ν kk , which parameterises the consistency between an external parameter θ k and the target parameter θ k explicitly.Each of the CPPs for our target parameter θ k , conditional on the knowledge about information consistency, can be given by where θ k is inferred from x k , and the precision ν kk would control the degree of borrowing.This then leads to a CPP for θ k , jointly with the unknown precision ν kk : where Φ(•) is the standard normal probability density function, and g k (ν kk ) is the prior for the one-toone precision ν kk that characterises the commensurability between modules k and k .If the external module is consistent, i.e., given that ν 2 kk is far greater than 0, the marginal CPP for θ k converges to the posterior π k (θ k |x k ) that is updated from π 0k (θ k ), so that information contained in module k will be largely incorporated.
A spike-and-slab distribution (Mitchell and Beauchamp, 1988) has been found suitable as a prior for the normal precision (Hobbs et al., 2012(Hobbs et al., , 2013)).This is a discrete mixutre prior with two components, which can provide us a means for robust borrowing.Specifically, it is defined as locally uniform between two limits, 0 ≤ B 1 < B 2 , except some portion of probability mass concentrated at a point, S > B 2 , such that and where w kk is the probability that ν kk ∼ Unif(B 1 , B 2 ).In situations of there being an external module k with sufficiently consistent treatment effect, we expect strong borrowing of information from x k for estimating θ k .This requires that the normal precision ν kk of the CPP take a large value, which is possible when the "spike" prior has a large probability mass, that is, when w kk is sufficiently small.Otherwise, information should be down-weighted by allocating more probability mass to the "slab" prior.Instead of setting a prior for w kk , we propose linking it with a discrepancy measure, which quantifies the divergence between the probability density distributions, π k (θ k |x k ) and π k (θ k |x k ), arising from a same initial prior π 0k (θ k ) and π 0k (θ k ) by including the module-wise data x k and x k , respectively.
One viable option is the Hellinger distance (Dey and Birmiwal, 1994), which characterises discrepancy between two probability densities, say, π k (θ k |x k ) and π k (θ k |x k ): Derivable from the Cauchy-Schwarz inequality, the computed Hellinger distance d φ H (π θ k , π θ k ) will strictly fall into the interval of [0, 1], which is convenient for quantifying probability.We may then relate the "slab" prior probability w kk with the computed Hellinger distance, simply by stipulating In an extreme case that the evidence from the two modules are perfectly consistent, i.e., d φ H (π θ k , π θ k ) → 0, the whole probability mass will be concentrated at the "spike", S. In turn, this will result in a small normal variance 1/ν 2 kk of the CPP in (4) such that the information from an external module k can be fully incorporated.As well as these reasons for preferring the Hellinger distance to other distributional discrepancy measures, it also has the desirable property of being invariant to any one-to-one transformation, for example, logarithmic, exponential, or inverse of square root, of both densities (Jeffreys, 1961).Imagine that the linear treatment-efficacy model in the form of (1) may be established using a different way of parameterisation, where changes in the continuous outcome are explained by, say, the exponential of the model parameter θ k , the location invariant property ensures In other words, computing the Hellinger distance between the densities of θ k and θ k truly informs the discrepancy between the treatment effect on the clinical endpoint across patient subgroups.
We now turn our attention to combining the (K − 1) CPPs for obtaining a proper prior for our target parameter θ k .As the CPPs are robust in that inconsistent information from any external module k can be down-weighted through the "slab" prior, each is capable to describe well the parameter space of θ k .Recall that if values of the precision ν kk are appropriately specified, we may obtain (K − 1) normal predictive priors marginally on θ k : where each one separately may be represented as a N (λ k , ξ 2 k ) distribution for the ease of notations.For any θ k , there exist (K − 1) external modules as the sources where the possible values can be drawn upon.We may thus see θ k as a weighted sum of random variables normally distributed with N (λ k , ξ 2 k ).Specifically, the linear combination of (K − 1) terms is stipulated as where we suppose the hypothetical random variables θk ∼ N (λ k , ξ 2 k ) and each set of the weights p k = (p 1k , p 2k , . . .), containing (K − 1) elements, satisfies p k = 1.This further gives in which x (−k ) denotes the entire trial data collected from all modules except module k .With this, one is able to leverage information from multiple external modules.This marginal predictive prior, denoted by π MPP (θ k |x (−k ) ), is updated to the posterior, using the subtrial data from module k as follows: This allows decision making for module k .
To determine the weight allocated to each external module, we expect p kk to take a large value if information from module k is consistent with module k .Recall that we have used the Hellinger distance to describe the pairwise commensurability of information between modules.Choices of these weights p kk may be guided by the computed pairwise Hellinger distance contained in a K × K symmetric matrix: , where for ease of notation we have replaced We can break the distance matrix into K vectors: each column k = 1, . . ., K describes the information consistency between θ k and θ k , for k = k .These vectors can then be normalised into a series of weights p kk ∈ [0, 1].For this, we will simply consider an analogue of the normal distribution likelihood by stipulating where a pre-defined s 0 governs how much influence the Hellinger distance has on the weight to be computed.With a very large value s 0 → ∞, nearly the same weight will be allocated irrespective of the pairwise Hellinger distance that could be very different from each other.Whereas, with s 0 → 0 + , the weight corresponding to a Hellinger distance close to 0 tends to be 1.Weights converted from the pairwise Hellinger distance following (11) can then be used to weight each θk , as was stipulated in (8).
We would like to add one more note here.The stipulation of weights p kk summing to 1 does not restrict the potential of full borrowing of information in stiuations, where all the (K − 1) external modules are perfectly consitent with module k .In such a scenario, the Hellinger distance d kk = 0, suggesting that the CPPs marginally on θ k , represented by N (λ k , ξ 2 k ), have identical mean and variance to those of π k (θ k |x k ).Moreover, equal weights, i.e., p kk = 1 K−1 , will be allocated to the modules external to module k , respectively.Following (9), a predictive prior π MPP (θ k |x (−k ) ) would be obtained with its mean as λ k and variance as 1 K−1 ξ 2 k .With the inclusion of x k , the posterior mean and variance become λ k and 1 K ξ 2 k , respectively.This indicates the external module data x (−k ) have been fully incorporated, and our methodology converges to the approach of complete pooling in the case of perfect information consistency.

Simulation study
In this section, we illustrate applications of the proposed analysis methodology, and compare it with alternative Bayesian models that may be used for analysing basket trials through a simulation study.
Our trial examples are hypothetical, but can represent the situation of a phase II basket trial, for which the analyses are performed to enable sharing of information.The main characteristics of the study we simulate are based on the motivating PBC and PD trial described in Section 2. For illustrative purposes, we assume six modules instead of three, as typically a fairly large number of patient subgroups would be examined; see, for example, Hyman et al. (2015) and Hyman et al. (2018) which report the results from basket trials with six and nine modules, respectively.

Basic settings
We simulate basket trials with K = 6 modules of unequal sample sizes: n k ∈ {10, 14, 10, 20, 16, 20}, respectively.Within each module, patients are allocated equally to receive the new treatment or a placebo.We simulate two covariates for each patient as z ik1 ∼ N (6, 0.1 2 ) and z ik2 ∼ N (4, 0.1 2 ), for i = 1, . . ., 90.In particular, z ik1 is assumed to be the baseline measurement of the endpoint of clinical interest at the time of randomisation.We generate the trial data from a linear regression: for i = 1, . . ., 90, k = 1, . . ., 6, where we set the "true" parameter values for effect of covariates to γ 1k = 1 and γ 2k = 1.3, and σ = 0.2.
It is optional to borrow information across modules or not at all for estimating the module-specific parameters other than θ k .Throughout, we consider random effects for γ 1k and γ 2k (thus information is also shared for estimating these coefficients): where inverse gamma priors Γ −1 (0.001, 0.001) are used for the variances 2 1 and 2 2 that account for the between-module heterogeneity, respectively.For χ 1 and χ 2 , an uninformative normal prior N (0, 10 2 ) is placed on each.
To implement our methodology for estimating θ k , we choose setting B 1 = 0.001, B 2 = 10 and S = 100 for the spike-and-slab prior on each ν kk .The "slab" prior is thus very uninformative, and the "spike" prior would essentially lead to a full incorporation of subtrial data from an external module in situations of perfect information commensurability.An initial vague prior π 0k (θ k ) is used for θ k , k = 1, . . ., 6; we use N (0, 10 2 ) such that the prior mean, together with the 95% credible interval, is 0 (-19.560,19.560), covering a wide range of possible θ k .To yield a large (small) weight p kk corresponding to a small (large) Hellinger distance, we let s 0 = 0.1 to combine information from all available modules.Whilst θ k would have been estimated using alternative Bayesian analysis models, we are interested in comparing the performance of our methodology with 1.Standard hierarchical model (HM) that assumes fully exchangeable parameters: with µ ∼ N (0, 1 2 ) and τ ∼ HN (0.25).Here, HN (z) is a half-normal distribution, defined as a N (0, z 2 ) truncated to cover the interval (0, ∞).The use of a half-normal prior is consistent with the findings written by Cunanan et al. (2019).
2. Bayesian model with no borrowing of information.Trial data are stratified by modules for standalone analyses, setting each θ k ∼ N (0, 10 2 ).Since random effects for γ 1k and γ 2k cannot be estimated due to the stratification, we stipulate an uninformative prior N (0, 10 2 ) on each for implementation.
3. EXNEX model by Neuenschwander et al. (2016), with equal prior probabilities of exchangeability (EX) and non-exchangeability (NEX).The EX distribution has the same parameter configuration as what was stipulated for the standard HM above, and the six NEX distributions are all set to be N (0, 10 2 ).
Comparison is in terms of the precision of their posterior point estimates, more specifically, the posterior means, for θ k that could be measured by an analogue of bias and mean squared error (MSE): where M represents the total number of replicates in the simulation study, and θm k denote the posterior means of θ k for the m-th simulated basket trial.These metrics will be reported by module.We also compare these Bayesian analysis models with respect to the trial operating characteristics, such as the module-wise error rates.Corresponding to the frequentist type I error rate and statistical power, we will report proportions of the simulated trials with • an erroneous Go decision in a module for the "true" θ k ∈ (−∞, 0], and • a correct Go decision in a module for the "true" θ k > 0, respectively. Eight simulation scenarios are evaluated with the treatment effect θ k varying between 0 and 1.38.Table 1 lists these simulation scenarios, which reflect varying degrees of heterogeneity on the treatment effect across modules.We note that all sets of the "true" values of θ k are realisations from distinct multivariate normal distributions, where we stipulate a high pairwise correlation coefficient (0.8) for θ k of the consistent modules and a low pairwise correlation coefficient (0.1) between θ k of an extreme module and of one else from the rest.We highlight the modules with a 0 or low treatment effect (0 < θ k ≤ 0.35).By way of contrast, in the global null scenario we set all θ k = 0. Scenarios 7 and 8 can be seen as "mixed null" scenarios.In such scenarios, we may be particularly interested in the proportion of trials with an erroneous Go decision for modules with a zero treatment effect.We note that the computed module-wise error rates are subject to the decision criterion.For example, the error rates would differ by varying the value of the bound δ U , above which the new treatment is concluded to be superior than a placebo.
Each of the four Bayesian analysis models will be applied to analyse every simulated basket trial.Decision making on a Go versus No-go for every module is based on the entire posterior distribution for θ k .More specifically, a module is allocated with a Go decision if the interval probability P(θ k > 0.25) > 0.90, and a No-go otherwise.Results will be summarised by averaging across 1000 replicates of the hypothetical basket trial by module.The Bayesian analysis models are implemented in R version 3.4.4using the R2OpenBUGS package based on two parallel chains, with each running the Gibbs sampler for 10,000 iterations that follow a burn-in of 3000 iterations.OpenBUGS code, together with R functions, to implement each of the Bayesian analysis models is available at https://github.com/BasketTrials/Bayesian-analysis-models.

Results
Figure 1 compares the performances of the posterior estimators based upon the four Bayesian models.It shows that the proposed analysis methodology produces smaller bias and MSE than the standard HM and EXNEX, across nearly all simulation scenarios.Point estimators based on the standard HM and EXNEX work well in scenarios 1, 5 and 6, as the small-to-moderate variability between θ k s can be addressed by setting τ ∼ HN (0.25).The proposed analysis methodology, in contrast, distinguishes the heterogeneity more sensitively.Much smaller bias and MSE are yielded when estimating the low treatment effect, for example, in modules 1 and 3, regardless of the large treatment effect in module 4 of scenario 2. In situations where information from other modules should be largely discounted, referring to modules in scenarios 7 and 8, our methodology generates similar bias to the no borrowing approach but, overall, with a smaller MSE.This is because information from modules with a non-zero treatment effect, for example, in scenario 8, can be largely discounted to formulate the marginal predictive priors for θ 3 and θ 4 .Not surprisingly, less borrowing of information in scenarios with extreme subgroups results in characteristics more comparable to those of the trials that would have been analysed using the no borrowing approach.This explains why our approach and the no borrowing approach outperform the standard HM and EXNEX in some scenarios.
We have also compared the Bayesian analysis models in terms of the average width of the posterior credible intervals for θ k obtained based on each of the four analysis models.In contrast to the alternative Bayesian models, the proposed methodology yields posterior estimates with narrower credible intervals when there is at least one consistent external module in the same trial.Such results are summarised in Figure S1 of the Supplementary Materials.When using the proposed analysis methodology for borrowing of information, investigators may be interested in the weight eventually allocated to each external module for obtaining the marginal predictive prior.In Figure S2 of the Supplementary Material, we comment with regards to scenarios 4 and 5 on the weight allocation based on the assessed pairwise commensurability.
Table 2 quantifies the impact of using different Bayesian models on the error rate control under the null hypothesis.For the global null scenario, where we have simulated the trial data setting "true" θ k = 0, all the four Bayesian analysis models control the error rate well.Nevertheless, the approaches that enable borrowing of information, i.e., standard HM, EXNEX and the proposed methodology, have smaller chances to adopt an incorrect Go, when compared with the approach of no borrowing, since incorporating consistent information from external modules reassures the investigator that P(θ k > 0.25) > 0.90 highly unlikely.Our approach produces slightly higher error rates than standard HM and EXNEX, as for some simulated trials information from modules with a similar low treatment effect may be shared, leading to a higher chance to reject the null θ k 's.In scenario 8 where some modules have large treatment effect, we observe substantial inflation of the error rate for trials analysed using the standard HM approach and moderate inflation for trials using EXNEX.Due to the We note that a difference in the sample sizes of modules 3 and 4 (for all scenarios) leads to disparate performances of the same approach in the same scenario with null θ k 's.More explicitly, when reacting to a prior-data conflict, a larger sample size of module 4 provides more evidence to evaluate the plausibility of down-weighting; estimation of θ 4 may thus have more chances to avoid being overwhelmed by external information.
What may be more interesting to investigators is the potential increase in statistical power to demonstrate the treatment effect, by incorporating information from external modules.Figure 2 visualises the comparison of the Bayesian analysis models in terms of correctly declaring a clinical benefit in module k.Across nearly all the modules of the simulated basket trials in scenarios except 6 and 7, the Bayesian approaches of borrowing show substantial advantages over the approach of no borrowing.When comparing between the Bayesian approaches of borrowing, we may be particularly interested in checking how the chance would be for a module with low treatment effect to be concluded with a correct Go decision.For example, for modules 1, 4 and 6 in scenario 3 as well as module 4 in scenario 4, our approach presents very similar behaviours to EXNEX, while standard HM implements strong borrowing of information for there being more than one module with large treatment effect.
For all modules of the simulated trials in scenarios 5 and 6, our approach outperforms both standard HM and EXNEX for leveraging the consistent information from all external modules.Given our criterion that only P(θ k > 0.25) > 0.90 will result in a Go decision, we note scenario 6, under which the trial data are simulated setting θ k = 0.30, is particularly a hard scenario.Using the approach of no borrowing seems to yield slightly higher proportion of the basket trials with a correct Go decision, compared with standard HM and EXNEX.However, this does not mean the no borrowing approach is more powerful.Instead, standard HM and EXNEX produce estimates of θ k with smaller posterior variances than the approach of no borrowing, as we may observe from Panel (ii) of Figure 1 and Figure S1 of the Supplementary Materials.Nevertheless, it is possible that the interval probability computed using the no borrowing approach with a diffuse prior for θ k has comparable or even slightly higher chances to exceed the level γ = 0.90.The proposed approach, however, can implement effective borrowing of information in such situations due to the use of Hellinger distance, which characterises commensurability of the trial data across modules.When the consistent "true" θ k s increase from 0.30 (scenario 6) to 0.45 (scenario 5), we see more advantages of implementing Bayesian models that permit borrowing of information than no borrowing, whilst our approach still appears to be superior to its alternatives.
In scenario 7, due to the prior specification of τ ∼ HN (0.25) being capable of accounting for the variability across modules, both standard HM and EXNEX may shrink θ 5 and θ 6 excessively towards the population mean.This in turn dilutes the treatment effect in corresponding modules.Consequently, it appears to be better to implement the approach of no borrowing rather than standard HM or EXNEX: at least the former would not miss the chance to identify the treatment effect.Our approach presents slightly higher power than the no borrowing approach as there is some consistent information to be incorporated from an external module.In scenario 8, our approach performs similarly to the no borrowing approach for module 1, as information from all other modules is largely discounted, but slightly better than its comparators for modules 2, 5 and 6 due to incorporation of consistent information from external modules.Standard HM and EXNEX improve the statistical power for module 1, in the presence of strong treatment effects in modules 2, 5 and 6.However, this sacrifices the type I error rate control for modules 3 and 4.

Discussion
The paradigm shift towards precision medicine opens new avenues for novel trial designs and analysis methodologies to deliver more tailored healthcare to patients.Basket trials emerge as a new class of efficient approaches to drug development in the era of precision medicine, offering a framework to evaluate the treatment effect together with its heterogeneity in various patient subgroups.In this paper, we have presented a commensurate prior approach for basket trials to make good use of the trial data collected from multiple modules without requiring a priori clustering of similar subgroups.By including an information discrepancy measure, it can discern the degree of borrowing from other subgroups to improve inferences in one that is of local analysis interest.In particular, the Hellinger distance plays a dual role in our methodology: (i) it gauges the maximum amount of information that could be leveraged from an external module k = k when estimating the target parameter θ k ; (ii) when there are K ≥ 3 modules, it determines the weight allocation to external modules so that the relative importance could be properly reflected.
The Bayesian analysis methodology in Section 3 has been developed assuming the basket trial generates continuous response data.However, it could be easily generalised to analyse other types of data that can be fitted using a generalized linear model for non-Gaussian error distributions.For example, it would be readily applicable to analysing phase II basket trials that use binary endpoints: after fitting the patient-level data per module with a logistic regression model, our approach may be considered to stipulate commensurate predictive priors, informed by the pairwise Hellinger distance, for the module-specific parameters to discuss borrowing of information from the most consistent modules.For down-weighting in cases of data conflict suggested by the Hellinger distance, we did not delve into calibration of the "slab" prior but simply use a very uninformative uniform prior for the normal precision.As this could be quite relevant to the investigation by Mutsvari et al. (2016) about choosing the diffuse component of a mixture prior for robust inferences, we believe it is worth further research on the performance of our approach.The exploration may be closely linked with the users' stipulation of the probability weight, which is based upon the Hellinger distance, to be attributed to the "slab" prior.
In our simulation study, we have considered imbalance module sizes.Simulation results show that our methodology can down-weight inconsistent information from a module that has larger sample sizes.For illustrative purposes, we have supposed equal randomisation ratio between treatment groups within a module.Investigators can pragmatically determine the randomisation ratio as well as the module-wise sample size for a basket trial that may base decision making on our analysis methodology.Potentially, more dosage groups of the same treatment in each module can be considered.Also, many have shown a great interest in sequential basket trials (Simon et al., 2016;Cunanan et al., 2017;Hobbs and Landin, 2018) with interim look(s) incorporated for possibility of, say, terminating enrollment of patients in ineffective subgroups.We note the proposed Bayesian approach can be implemented with any number of analyses following a flexible timescale for interim decision making.There is no requirement of a minimum sample size per module to carry out an interim look, due to the use of an initial vague prior π 0k (θ k ) for the subgroup-specific model parameter for computing the pairwise Hellinger distance.
Throughout, we have restricted our focus onto basket trials, where the subtrials use the same endpoint across patient subgroups.However, in many diseas areas multiple endpoints (FDA, 2017) may often arise, as it could involve various dimensions to conclude on the clinical benefit.One common situation is to continue monitoring toxicity in addition to the assessment of efficacy (Bryant and Day, 1995;Tournoux et al., 2007).With regards to this, our approach could be extended in several ways.For instance, in cases where the set of multiple endpoints remain the same across subgroups, it would be straightforward to establish a joint probability model and derive the pairwise Hellinger distance between multivariate probability densities (Pardo, 2005).Suitable alternatives include separating the discussion about borrowing of information by endpoint.A unified utility function may then be adopted for trial decision making based on evidence on multiple endpoints.In another more complex setting where the efficacy endpoint, for example, could be distinct but correlated across subgroups, one might need to translate the subtrial data onto a common scale in order to adapt the presented approach.Ideas could be drawn from Zheng et al. (2019), where incorporation of external data recorded on a different measurement scale has been discussed in the context of phase I clinical trials.

A. ADDITIONAL SIMULATION RESULTS
In addition to the bias and MSE comparisons in Section 4, we consider reporting the width of the 95% posterior credible intervals (CIs) for θ k yielded by the Bayesian analysis models to give a general sense how precise the posterior estimates are.Averaging across the 1000 simulated basket trials by module, Figure S1 graphically represents the median width of the 95% posterior CIs using height of the bars, together with the 10th and 90th percentiles using endpoints of the error bars.We observe that the approach of no borrowing results in the widest posterior CIs across all scenarios.In contrast, the proposed approach of borrowing based on distributional discrepancy generates the narrowest posterior module-wise estimates, when there exist at least one module with commensurate treatment effect.In scenarios of information consistency across all modules, such as scenarios 5 and 6, the proposed approach outperforms the competing Bayesian models by producing the narrowest posterior CIs, together with the nearly unbiased posterior means shown in Figure 1 of the main paper.In some scenarios such as modules 1 and 6 in scenario 2 and module 1 in scenario 8, the proposed approach leads to posterior CIs of comparable widths with those of EXNEX or standard HM, but much longer error bars.This suggests that our approach may be highly sensitive to the disparate treatment effect on subgroups demonstrated by trial data. 1 It would be interesting to see how well the proposed approach can differentiate borrowing of information from other subgroups when the treatment effect is disparate across patient subgroups.Specifically, we expect it can (i) identify the most consistent subgroup(s) and, accordingly, allocate the largest weight(s) in a scenario of heterogeneous treatment effect, and (ii) weight the subgroups equally in a scenario of consistent treatment effect.
Figure S2 gives an overview about the weight allocation, which is a function of the computed pairwise commensurability, for scenarios 4 (some subgroups are more similar between themselves than with others) and 5 (equally similar).The subfigures show our approach can correctly identify the most consistent subgroups.Referring to Subfigure S2(a) for the weight allocation to external modules given their pairwise commensurability with module 3, the medians suggest the order is: module 2, module 5, module 6, module 1 and module 4, which is consistent with the assessed pairwise commensurability of information.Nevertheless, summarising across the 1000 simulated basket trials, we see the interquartile range of the weight allocated to module 2 is much larger than the pairwise commensurability for module 2 and module 3 in the same plot.This is because the pairwise commensurability concerns only the information between any two modules, whilst the normalised weights are computed based on all the pairwise commensurability measurements due to our stipulation.In other words, the weight allocated to each module reflects the relative importance of a corresponding subgroup for offering strength to borrow.When the weights allocated to other modules are variable, the weight to module 2 will inevitably be affected.Referring to Subfigure S2(b) for the weight allocation in situations of equally consistant, we observe that the weight allocation is highly balanced.Due to there being K = 6 modules, the weights allocated to external modules are concentrated at 20%, which is 1/(K − 1).When all the subgroups demonstrate the same treatment effect, meaning that all external subgroups are equally important, equal weights would be assigned correspondingly, as is shown in Subfigure S2(b).

B. LEVERAGING INFORMATION FROM THE MOST CONSISTENT SUBGROUP(S)
We believe it would be helpful to show how Hellinger distance can pick up the most consistent subgroup(s) by visualising the marginal prior for a target parameter θ k , π MPP (θ k |x −k ), through some hypothetical data examples.
We will use the 'true' θ k in scenarios 4 and 5 to simulate two datasets, representing situations where (i) some subgroups demonstrate more similar treatment effect between themselves than with others, and (ii) all subgroups demonstrate equally similar treatment effect.In particular, we expect our approach to effectively identify two cliques given the dataset simulated from setting θ k = (0.59, 1.02, 1.17, 0.13, 0.95, 0.75), potentially with θ 2 , θ 3 and θ 5 constituting a clique.Figure S3: Update of the marginal predictive priors for θ k to their marginal posterior, using the proposed methodology for borrowing of information from consistent modules.Subfigures (a) and (b) correspond to the situations of (i) varying treatment effect and (ii) equally similar treatment effect, respectively.
Figure S3 visualises the marginal predictive priors for the target parameters θ k and how these π MPP (θ k |x −k ) would be updated to the posteriors.As we can see, the marginal predictive priors for θ 2 , θ 3 and θ 5 appear to be more informative than those for θ 1 and θ 4 within Subfigure S3(a), for there being consistent subgroups for information sharing.The marginal prior for θ 4 is centred around θ k = 0.59, suggesting that module 1 has been picked as (relatively) the most consistent subgroup while information from other modules will be largely discounted.Due to the use of spike-and-slab prior, the posterior for θ 4 is bimodal, when there is a prior-data conflict.Comparing Subfigure S3(b) with Subfigure S3(a), we observe more informative marginal priors have been formed due to information commensurability and each posterior become more informative by including data from all external, consistent modules.

Figure 1 :Figure 2 :
Figure 1: Bias and mean squared error of the module-wise estimators for θ k based on the four Bayesian analysis models, respectively.

Figure S1 :
FigureS1: Summaries about the width of the posterior 95% posterior credible interval for θ k , estimated using each of the Bayesian analysis models.

Figure S2 :
Figure S2: Boxplots of the weight allocation to all external modules (q = k) for leveraging only the most consistent information in the module k of current analysis interest.Subfigure (a) visualises the simulation results of scenario 4, and subfigure (b) of scenario 5.The dashed line indicate the weight of 20%.

Table 1 :
Simulation scenarios with specification of the 'true' treatment effect θ k to compare the Bayesian analysis models.The figure in bold indicates a 0 or low treatment effect.