A Bayesian space-time model for clustering areal units based on their disease trends.

Population-level disease risk across a set of non-overlapping areal units varies in space and time, and a large research literature has developed methodology for identifying clusters of areal units exhibiting elevated risks. However, almost no research has extended the clustering paradigm to identify groups of areal units exhibiting similar temporal disease trends. We present a novel Bayesian hierarchical mixture model for achieving this goal, with inference based on a Metropolis-coupled Markov chain Monte Carlo ((MC)$^3$) algorithm. The effectiveness of the (MC)$^3$ algorithm compared to a standard Markov chain Monte Carlo implementation is demonstrated in a simulation study, and the methodology is motivated by two important case studies in the United Kingdom. The first concerns the impact on measles susceptibility of the discredited paper linking the measles, mumps, and rubella vaccination to an increased risk of Autism and investigates whether all areas in the Scotland were equally affected. The second concerns respiratory hospitalizations and investigates over a 10 year period which parts of Glasgow have shown increased, decreased, and no change in risk.

Appendix A -Additional data description for the two case studies

A.1 Measles susceptibility case study
The study region is Scotland, and a map with the main towns and cities is presented in Figure   1. Scotland has a population of 5.4 million people, and for this study has been partitioned into K = 1235 non-overlapping intermediate zones (IZ), which are a key geography designed by the Scottish Government for distributing small-area statistics. The spatial pattern in the proportion susceptible (un-vaccinated) {θ kt } in 2004 is displayed in Figure 2, which shows that the less densely populated regions in the north and west of Scotland appear to have higher susceptibility rates compared to the more densely populated regions, such as Glasgow and Edinburgh. One of the reasons for this could be that general practitioners (doctors) and health centres are not as accessible in these remote rural regions, and so in combination with the negative press towards the MMR vaccine, parents in those areas may have been less likely to have their children vaccinated.

A.2 Respiratory hospitalisation case study
The study region for the respiratory hospitalisation case study is the Greater Glasgow and Clyde Health Board, which is displayed as the shaded region in Figure 4. The region has been partitioned into K = 271 non-overlapping intermediate zones, and the spatial pattern in the SMR,θ kt , in 2005 is displayed in Figure 5. The spatial pattern shows elevated SMR values in the east of the city and just south of the river Clyde (the white line running south-east), which are some of the poorest parts of the study region. The temporal trends are displayed in panel (b) of where γ Cmi is the complete vector of parameters comprising the S trend functions. Then the (MC) 3 algorithm is defined as follows. i. Sample a proposal valueψ Cmi from a proposal distribution q(ψ Cmi |ψ Cmi ).
ii. Acceptψ Cmi as the next value in the chain, that is set ψ Cmi+1 =ψ Cmi with probability min(1, α 1 ), where and p(·) denotes the full conditional distribution ofψ Cmi or ψ Cmi .
(b) To couple the chains randomly select two of the chains, sayΨ Cji andΨ C ki .
i. The proposal values for a Metropolis type move areΨ Cji = Ψ C ki andΨ C ki = Ψ Cji .
ii. Swap the chains with probability min(1, α 2 ), where Note that for parameters that can be Gibbs sampled the temperatures do not apply, and the acceptance rate is always equal to one. From the sampling scheme it is clear that increasing the temperature level T m increases the acceptance probability α 1 . The acceptance rates for accepting a new proposed value,ψ Cmi , are controlled by updating the standard deviation of the proposal distribution q(·), such that an acceptance rate of 40-50% is achieved. Through simulation testing we found that swapping chains with an acceptance rate of 20-30% was sufficient to remove the multimodality issue, which is similar to the results of Atchadé and others (2011). In order to obtain such an acceptance rate geometric spacing of the temperatures was used, that is T m+1 = c · T m , where T 1 = 1 and 0 < c < 1. The constant c is updated during the algorithm to ensure the spacing between the temperatures of the chains provides an acceptance rate of 20-30%, such that, if the acceptance rate is greater than 30%, the difference between the temperatures will widen, and hence shrink if the acceptance rate of swapping chains is less than 20%. Choosing the number of chains M is an extra choice to be made in this algorithm and can be application specific (Altekar and others, 2004), and here M = 4 appeared to work well following exploratory runs of the algorithm for both the simulated and real data sets. Tuning of the proposal distributions to have the acceptance rates discussed above is automatically undertaken in the algorithm, meaning the user does not need to manually tune the algorithm. Table 1 displays the computation times for fitting the mixture model proposed in the main paper to data of various sizes, which illustrates the scalability of our approach. Run times are provided for both the MCMC and the (MC) 3 algorithms, and in all cases the models are run with S = 4 temporal trends on regular spatio-temporal grids. Inference in all cases was based on 100, 000 iterations, and the (MC) 3 algorithm had M = 4 parallel chains (as used in the simulation and real data studies) resulting in 400, 000 iterations in total. The timings were carried out using an Apple iMac computer with a 3.2 GHz Intel Core i5 processor and 16 GB MHz DDR3 memory.

Appendix C -Computation time and scalability
The table displays run times for data sizes between 1,000 (10 × 10 spatial grid with 10 time periods) and 50,000 (50 × 50 spatial grid with 20 time periods) data points, and the maximum run time for 100,000 iterations is under 18 hours. The Table shows that run times increase with increasing numbers of data points as expected, and that the (MC) 3 algorithm is around 3 times slower than the MCMC algorithm despite generating 4 times as many iterations (it is based on 4 parallel chains). Appendix D -Data generation and additional results for the simulation study

D.1 -Data generation
The data generated in the simulation study come from the following Poisson log-linear variant of the general model (3.1) in the main paper.
where E kt = 100 ∀ k and t. The random effects φ are generated from the Leroux CAR model outlined by (3.2) in the main paper, with ρ = 0.99 and τ 2 = 0.001. The temporal trend functions f s (t|γ s ) used to generate the data are presented in Figure 2 in the main paper, and show constant, linearly increasing, linearly decreasing and change point forms.

D.2 -Accuracy of the trend estimation
The trends f s (t|γ S ) estimated in the simulation study are represented by: 1. Constant trend -β 1 .
Tables 2 (scenarios (i) and (ii)) and 3 (scenarios (iii) and (iv)) display the bias, root mean square error (RMSE), and coverage probabilities of the 95% credible intervals for the parameters (β 1 , γ 1 , . . . , γ 4 ) associated with the 4 temporal trends, from using both the (MC) 3 and MCMC algorithms. Note, that in allocation mechanism (B) the only trend present in the data is the linearly decreasing one, and hence (γ 1 , γ 3 , γ 4 ) do not have true values for the computation of the metrics described above. Unsurprisingly, the results mirror those observed in the classification percentages presented in the main paper, and have the following general trends. where as for the MCMC algorithm this is only true for scenario (iv). For scenarios (i) to (iii) the MCMC algorithm produces coverage probabilities between 11% and 89%, which is due to the relatively poor parameter estimation as measured by RMSE.

D.3 -Results from not including a true trend in the model
The simulation study presented above has fitted model (3.1) with all temporal trends that generated the data included as possible trends in the model. However, this does not illustrate Table 3: Bias, root mean square error (RMSE) and coverage probabilities of the 95% credible intervals from the parameters of the temporal trends and the intercept term for scenarios (iii) and (iv) and allocation mechanisms (A) to (C) using the (MC) 3 (left) and MCMC (right) algorithms. what happens if an important trend that is in the data is excluded in the model fitting. Therefore we repeated the simulation study by fitting model (3.1) without the linearly decreasing trend using the (MC) 3 estimation algorithm.
The results are displayed in Table 4   The results from omitting the linearly decreasing trend when fitting the model. Section (a) summarises which trends the areas with a true decreasing trend were assigned to. Section (b) summarises the overall correct classification percentages for areas whose true trend was included in the model.   the increasing trend have positive δ k estimates.