-
PDF
- Split View
-
Views
-
Cite
Cite
Jon Wakefield, Disease mapping and spatial regression with count data, Biostatistics, Volume 8, Issue 2, April 2007, Pages 158–183, https://doi.org/10.1093/biostatistics/kxl008
Close -
Share
Abstract
In this paper, we provide critical reviews of methods suggested for the analysis of aggregate count data in the context of disease mapping and spatial regression. We introduce a new method for picking prior distributions, and propose a number of refinements of previously used models. We also consider ecological bias, mutual standardization, and choice of both spatial model and prior specification. We analyze male lip cancer incidence data collected in Scotland over the period 1975–1980, and outline a number of problems with previous analyses of these data. In disease mapping studies, hierarchical models can provide robust estimation of area-level risk parameters, though care is required in the choice of covariate model, and it is important to assess the sensitivity of estimates to the spatial model chosen, and to the prior specifications on the variance parameters. Spatial ecological regression is a far more hazardous enterprise for two reasons. First, there is always the possibility of ecological bias, and this can only be alleviated by the inclusion of individual-level data. For the Scottish data, we show that the previously used mean model has limited interpretation from an individual perspective. Second, when residual spatial dependence is modeled, and if the exposure has spatial structure, then estimates of exposure association parameters will change when compared with those obtained from the independence across space model, and the data alone cannot choose the form and extent of spatial correlation that is appropriate.
1. INTRODUCTION
In this paper, we consider the analysis of population and health counts, aggregated over a set of disjoint geographical areas; recent reviews of methods for spatial epidemiological data in general may be found in Lawson and others (1999), Elliott and others (2000), and Waller and Gotway (2004). We critically review a number of approaches for the analysis of spatially aggregated count data, propose a number of refinements to currently used models, and describe a procedure for prior choice. We consider two distinct aims: “disease mapping” to obtain relative risk estimates for each study area and “spatial regression” to estimate the association between relative risk and potential risk factors. In general, counts in areas that are geographically close will display residual spatial dependence; “residual” here acknowledges that known confounders have been included in the analysis model. In a disease mapping context, this dependence may be exploited in estimation of risk summaries, by smoothing across “neighboring” areas. In a regression context, the dependence must be acknowledged since conventional statistical analysis techniques are inappropriate for dependent data.
Disease mapping has a long history in epidemiology (Walter, 2000) as part of the classic triad of person/place/time. A number of statistical reviews are available (see for example, Smans and Esteve, 1992, Clayton and Bernardinelli, 1992, Mollié, 1996; Wakefield and others, 2000). There are numerous examples of both cancer atlases (see for example, Kemp and others, 1985; Devesa and others, 1999) and mapping studies for specific cancer sites; for example, Toledano and others (2001) report spatiotemporal trends for testicular cancer, and Jarup and others (2002) carry out mapping for prostate cancer. Similarly, numerous ecological correlation studies have been reported. For example, the contribution of environmental factors to cancer risk has been summarized by Boffetta and Nyberg (2003), who report evidence from ecological studies including three that considered mesothelioma and lung cancer in relation to asbestos, eight that investigated lung cancer and proximity to various industries that produce air pollution, and six which assessed the association between nitrates in drinking water and stomach cancer. Exposures may be directly measured in air, water, or soil, or indirect surrogates such as distance from a point source of risk such as an incinerator (e.g. Elliott and others, 1996) or a foundry (e.g. Lawson and Williams, 1994), or a line source such as a road.
As motivating example, we examine incidence rates of lip cancer in males in 56 counties of Scotland, registered in 1975–1980. These data were originally reported by Kemp and others (1985). The data consist of the observed and expected numbers of cases (based on the age population in each county), a covariate measuring the proportion of the population engaged in agriculture, fishing, or forestry (AFF) (exposure to sunlight is a risk factor for lip cancer, and the AFF variable is related to exposure to sunlight), and the standardized morbidity ratio (SMR), which is the ratio of the observed to expected cases. The AFF variable was read from Figure 3.6 of Kemp and others (1985) by Clayton and Bernardinelli (1992) and only takes one of six values. The data include the centroids of each area under the Great Britain National Grid projection system. This is a “conformal” projection that preserves local shape when moving from three-dimensional to two-dimensional coordinates for the purposes of mapping. Waller and Gotway (2004) provide details on this and other projections. The data, along with a figure displaying the labeled centroids of each county, are available as supplementary material at Biostatistics online (http://www.biostatistics.oxfordjournals.org).
The structure of this paper is as follows: In Sections 2 and 3, we describe and critique models for disease mapping and spatial regression, respectively, illustrating their use with the Scottish data. In Section 4, we analyze a more comprehensive version of these data, obtained from the original source. We conclude with a discussion in Section 5.
2. DISEASE MAPPING
2.1 Motivation
Disease mapping is often carried out to investigate the geographical distribution of disease burden. Area-specific estimates of risk may inform public health resource allocation by estimating the disease burden in specific areas, and the informal comparison of risk maps with exposure maps may provide clues to etiology or generate hypotheses. An additional use is to provide a context within which specific studies may be placed; for example, surveillance will be greatly helped if we have a knowledge of the variability in residual spatial risk and the nature of that variability (spatial versus nonspatial), in order to know the “null” distribution, that is, the distribution in the absence of a “hot spot.” In a similar vein, regression will be aided if we have a “prior” on the magnitude and forms of the nonspatial and spatial background variability.
2.2 Drawbacks of simple approaches
We assume that the study region is partitioned into n nonoverlapping areas. Though the total burden of disease is of interest, control for confounding allows the “residual” geographical distribution of risk to be investigated and modeled. Consider a confounder with K strata and let Nik and Yik be the population size and number of cases in area i, stratum k,i = 1,…,n,k = 1,…,K. A starting model is E[Yik|pik] = Nikpik, where pik is the probability of disease in area i and confounder stratum k. For small-area studies in particular, the estimation of n × K probabilities is not feasible, and for a rare disease, it is usual to assume that the effect of being in area i is to multiplicatively change “reference” stratum-specific risks, pk, by a constant, i.e.

The assumption Yik|pik∼Poisson(Nikpik) leads to

where Ei = ∑k = 1KNikpk are the “expected numbers” of cases in area i, based on the confounder-specific populations. This procedure is known as indirect standardization.
The SMR is given by SMRi = Yi/Ei, and is the maximum likelihood estimator (MLE) of the area-specific relative risk, θi in Model (2.2), i = 1,…,n. Figure 1(a) shows the SMRs for the Scottish lip cancer data, and indicates a large spread with an increasing trend in the south–north direction. The variance of the estimator is var(SMRi) = SMRi/Ei, which will be large if Ei is small. For the Scottish data, the expected numbers are highly variable, with range 1.1–88.7. This variability suggests that the extreme SMRs may be based on small expected numbers; many of the large, sparsely populated rural areas in the north have high SMRs.
Raw and smoothed estimates in 56 counties of Scotland. On this and subsequent maps, the Shetland Isles has been moved south by 100 km in order to use space more efficiently.
Raw and smoothed estimates in 56 counties of Scotland. On this and subsequent maps, the Shetland Isles has been moved south by 100 km in order to use space more efficiently.
Maps showing p-values of excesses of 1 are even less informative than maps of SMRs. Although they account for sample size, they do not show the extent of the risk, and areas with large populations may provide statistically significant SMRs, even for a small excess of 1.
The above considerations led to methods being developed to “smooth” the SMRs using random-effects models that use the data from all the areas to provide more reliable estimates in each of the constituent areas. We first describe models that do not use spatial information (Tsutakawa and others, 1985), (Manton and others, 1989) before turning to models that allow both spatial and nonspatial variability (Clayton and Kaldor, 1987), (Besag and others, 1991).
2.3 Nonspatial models
We begin by describing a simple Poisson–Gamma two-stage model that offers analytic tractability and ease of estimation, and is useful for exploratory analyses, for example, to decide on the form of the area-level risk exposure model. At the first stage, we assume that the likelihood is given by

where μi = μ(xi,β) describes a regression model in area-level covariates xi. At the second stage, we assume that across the map the deviations of the relative risks from the mean, μi, are modeled by

a gamma distribution with mean 1 and variance 1/α. The marginal distribution of Yi|β,α is negative binomial with mean and variance

so that the variance increases as a quadratic function of the mean, and the scale parameter α can accommodate “overdispersion.” This form is substantively more reasonable than the naive Poisson model; it is important to consider extra-Poisson variability resulting from unmeasured confounders, data anomalies in numerator and denominator, and model misspecification (Wakefield and Elliott, 1999).
A Bayesian approach to inference would consider the posterior distribution p(θ1,…,θn,β,α|y) and carry out inference via the marginal distributions p(θi|y). Unfortunately, the latter are unavailable in closed form, but an “empirical Bayes” approach obtains estimates and then proceeds as if these are known, i.e. considers . Estimates usually correspond to the MLEs from ∏i = 1nPr(Yi|β,α). If the aim is to gain clues to unexplained variability, θi may be examined; here we report the relative risk, RRi = θiμi, where relative is with respect to Ei. Using Bayes theorem the conditional posterior is yielding empirical Bayes estimates

a weighted combination of the estimate , and the SMR in area i. The “weight”

on the observed SMR increases with Ei so that for areas with large populations the data dominate. If α is large, then the random effects have a tight spread, and there is more shrinkage, since SMRs that are far from unity are inconsistent with the total collection of estimates. This behavior illustrates both the potential benefits and hazards of smoothing; the estimates will be less variable than the SMRs, but an outlying estimate that is not based on a large expected number will be shrunk, and we may miss an important excess. Conlon and Louis (1999) provide a discussion of the inherent bias due to shrinkage of random-effects estimators.
The above model is subtly different from the alternative Yi|RRi∼indPoisson(Ei×RRi) with RRi∼indGa(α☆μi,α☆) which has mean E[RRi] = μi and variance var(RRi) = μi/α☆. The subtlety is that this model implies first two moments of E[Yi|β,α☆] = Eiμi and var(Yi|β,α☆) = μi(1 + Ei/α☆), so that the mean coincides with (2.5), but the variance differs. For this model, empirical Bayes estimates differ from (2.6) and are given by

where
One would be concerned if the two models gave significantly different estimates, revealing a general issue: there are multiple choices for the manner in which random effects may be incorporated, and the data will often be too sparse to decide between competing options.
A Poisson lognormal nonspatial random-effect model is given by

where Vi are area-specific random effects that capture the residual unexplained log relative risk in area i, i = 1,…,n. The marginal distribution of this model is not available in closed form though the variance agrees with (2.5); the addition of spatial random effects is straightforward, however. Empirical Bayes is not so convenient for this model, and so we resort to a fully Bayesian approach, for which prior distributions are required.
2.4 Prior choice for nonspatial model
For a rare disease, a log-linear link is a natural choice: logμ(xi,β) = β0 + ∑j = 1Jβjxij, where xij is the value of the jth covariate in area i. For regression parameters β = (β0,β1,…,βJ), an improper prior p(β)∝1 is often used, but may lead to an improper posterior (an example with a linear link is given in Section 3.8). If there are many covariates, or there is high dependence among the elements of x, then more informative priors will be beneficial. In this case, it is convenient to specify lognormal priors for positive parameters exp(βj), since one may specify two quantiles of the distribution, and directly solve for the two parameters of the lognormal. Denote by LN(μ,σ) the lognormal distribution for a generic parameter θ with E[logθ] = μ and var(logθ) = σ2, and let θ1 and θ2 be the q1 and q2 quantiles of this prior. Then it is straightforward to show that

As an example, suppose that for the ecological relative risk eβ1 we believe there is a 50% chance that the relative risk is less than 1, and a 95% chance that it is less than 5; with q1 = 0.5,θ1 = 1.0 and q2 = 0.95,θ2 = 5.0, we obtain μ = 0 and σ = log5/1.645 = 0.98.
It is not straightforward to specify a prior for σv, which represents the standard deviation of the log residual relative risks, a difficult parameter to interpret epidemiologically. The choice of a gamma distribution, Ga(a,b), for the precision τv = 1/σv2, is convenient since it produces a marginal distribution for the residual relative risks in closed form. Specifically, the two-stage model

produces a marginal distribution for Vi which is t2a(0,b/a), a Student's t distribution with 2a degrees of freedom, location zero, and scale b/a; this is equivalent to the residual relative risks following a log t distribution. To determine a and b, we specify the range exp(±R) within which the residual relative risks lie with probability q, and use the relationship
, where tq2a is the qth quantile of a Student's t random variable with 2a degrees of freedom, to give b = R2a/(tq/22a)2. For example, if we assume a priori that the residual relative risks follow a log t distribution with 2 degrees of freedom, with 95% of these risks falling in the interval (0.5, 2.0), we obtain the prior, τv∼Ga(1,0.0260), an exponential distribution. In terms of σv, this results in (2.5%, 97.5%) quantiles of (0.084, 1.01) with posterior median 0.19.
It is important to assess whether the prior allows all reasonable levels of variability in the residual relative risks, in particular small values should not be excluded. As pointed out by Kelsall and Wakefield (1999), the prior Ga(0.001,0.001), which has previously been suggested, should be avoided for this reason; this choice corresponds to relative risks which follow a log t distribution with 0.002 degrees of freedom.
2.5 Nonspatial analysis of the Scottish lip cancer data
Figure 2 shows relative risk estimates from a variety of models, with the SMRs on the left, referenced as position 0. At position 1, the empirical Bayes estimates obtained without the use of the covariate AFF are displayed. The weights on the SMR, (2.7), range between 0.45 and 0.99, with median 0.83. For these data, the residual variability is large, from (2.4) the standard deviation of the random effects is
, and is estimated as 0.73, with 90% interval for residual relative risks (0.16, 2.4).
Relative risk estimates for Scottish lip cancer data: 0 denotes the SMRs; 1 the empirical Bayes nonspatial estimates without the use of AFF; 2 the empirical Bayes nonspatial estimates with a log-linear model in AFF; 3 the empirical Bayes nonspatial estimates with a log-linear cubic model in AFF; 4 the fully Bayes nonspatial estimates with a log-linear cubic model in AFF; 5 the joint model with a log-linear cubic model in AFF; 6 the initial ICAR model with a log-linear cubic model in AFF; and 7 the refined ICAR model with a log-linear cubic model in AFF. Plotting symbol is county number.
Relative risk estimates for Scottish lip cancer data: 0 denotes the SMRs; 1 the empirical Bayes nonspatial estimates without the use of AFF; 2 the empirical Bayes nonspatial estimates with a log-linear model in AFF; 3 the empirical Bayes nonspatial estimates with a log-linear cubic model in AFF; 4 the fully Bayes nonspatial estimates with a log-linear cubic model in AFF; 5 the joint model with a log-linear cubic model in AFF; 6 the initial ICAR model with a log-linear cubic model in AFF; and 7 the refined ICAR model with a log-linear cubic model in AFF. Plotting symbol is county number.
In position 2, empirical Bayes estimates using a log-linear model in AFF, logμi = β0 + β1xi, are displayed. Four of the counties (4, 6, 14, and 32) have proportion in AFF equal to 0.24 (the highest value) and we see that the estimates for these counties are all moved upward relative to the no covariate model (position 1). The latter is worrying, and we see the reason in Figure 3; the log-linear model (dashed line) does not fit the data well for large values of AFF. This suggests that we use a more flexible model; exploratory work suggests the cubic form
Plot of Y/E versus proportion in AFF, x, with plotting symbol county number. Solid line corresponds to a model with identity link and linear in x; dashed line to a log link and linear in x; and dotted line to a log link and cubic in x.
Plot of Y/E versus proportion in AFF, x, with plotting symbol county number. Solid line corresponds to a model with identity link and linear in x; dashed line to a log link and linear in x; and dotted line to a log link and cubic in x.

Figure 3 shows that this cubic model provides a better fit to the data (dotted line), and in particular flattens off for larger values of x. With the linear and cubic covariate models, the standard deviations of the random effects are 0.58 and 0.53, respectively. We might expect the standard deviation to be reduced in size when we add an important covariate but this does not have to happen; for an explanation see Price and others (1996). In position 3 of Figure 2, estimates under the cubic model are plotted, and we see that for counties 4, 6, 14, and 32 the estimates appear more reasonable. This illustrates the importance of deciding how much local smoothing is appropriate. A similar issue is relevant to the extent and nature of spatial smoothing.
We now report a Bayesian version of the normal model, (2.8), with log-linear cubic model (2.10) using Markov chain Monte Carlo (MCMC). The covariates are centered in (2.10) to reduce dependence in the posterior distribution, thereby reducing the dependence in the Markov chain. Flat priors were placed on β0,β1,β2,β3 and the previously discussed gamma prior, Ga(1,0.0260), was assumed for σv − 2.
We see that the estimates under the empirical Bayes gamma and Bayesian lognormal model, at positions 3 and 4, respectively, each with cubic mean model, are very similar. This illustrates that the most important aspect is not the inferential method or the choice of gamma or lognormal random effects, but a judicious choice of the covariate model. We define the random variables exp(±1.96 × σv) as the endpoints of a 95% interval for the residual relative risks. Posterior mean estimates of these endpoints are (0.35, 2.96), showing that the posterior interval is considerably wider than the prior interval of (0.5, 2.0). A 95% posterior interval for σv is (0.40, 0.73) with median 0.55.
2.6 Spatial models
In general, we might expect residual relative risks in areas that are “close” to be more similar than in areas that are not “close,” and we would like to exploit this information in order to provide more reliable relative risk estimates in each area. This is analogous to the use of a covariate x, in that areas with similar x values are likely to have similar relative risks. Unfortunately, the modeling of spatial dependence is much more difficult since spatial location is acting as a surrogate for unobserved covariates; we need to choose an appropriate spatial model, but do not directly observe the covariates whose effect we are trying to mimic.
We first consider the model

with

where Si = (Si1,Si2) denotes spatial location, represented as the centroid of area i, and g(Si,γ) is a regression model that we may include to capture large-scale spatial trend. The random effects Vi|σv2∼iidN(0,σv2) represent nonspatial contributions to the overdispersion, and Ui the spatial contributions. We describe two forms for the latter.
We may assume that U = (U1,…,Un) arise from a zero mean multivariate normal distribution with variances var(Ui) = σu2 and correlations corr(Ui,Uj) = ρdij, where dij is the distance between the centroids of areas i and j and ρ is a parameter that determines the extent of the correlation. This model is “isotropic” since it assumes that the correlation is the same in all spatial directions. We refer to this as the “joint” model, since we have specified the joint distribution for U.
To define a “conditional” model, we need to specify a rule for determining the “neighbors” of each area. A number of authors have taken areas i and j to be neighbors if they share a “common boundary.” This is reasonable if all regions are of similar size and arranged in a regular pattern (as is the case for pixels in image analysis where these models originated), but is not particularly attractive otherwise. Various other neighborhood/weighting schemes are possible, for example Cressie and Chan (1989) assumed that the neighborhood structure was a function of the distance between area centroids. For area i, we let ∂i denote the indices of the set of neighbors of area i. Besag and others (1991) suggested a model that included both a nonspatial and a spatial random effect and assigned the spatial random effects an intrinsic conditional autoregressive (ICAR) prior. Under this specification where mi is the number of neighbors of area i and is the mean of the spatial random effects of these neighbors. The parameter ωu2 is a conditional variance and its magnitude determines the amount of spatial variation. The variance parameters σv2 and ωu2 are on different scales, σv is on the log relative risk scale while ωu is on the log relative risk scale “conditional” on Uj,j∈∂i. Hence, they are not comparable, in contrast to the joint model in which σu is on the same scale as σv. Note that if ωu2 is “small,” then although the residual is strongly dependent on the neighboring value, the overall contribution to the residual relative risk is small. This is a little counterintuitive but stems from spatial models having two aspects: the “extent” and “total amount” of spatial dependence, and in the ICAR model there is only a single parameter controlling both aspects. In the joint model, the extent of spatial dependence is determined by ρ and the total amount by σu2. A nonspatial random effect should always be included along with ICAR random effects since this model cannot take a limiting form that allows nonspatial variability; in the joint model with Ui only, this is achieved as ρ→0. If the majority of the variability is nonspatial, inference for the ICAR model might incorrectly suggest that spatial dependence was present. Leroux and others (1999) showed via a simulation study that if the data were truly independent, a model with ICAR random effects and no nonspatial random effects produced a serious overestimation of ωu2, which led to very poor efficiency in the estimation of regression coefficients. In terms of implementation, both models require MCMC, but the conditional model needs far less computation than the joint model, for which n×n matrix inversions are typically necessary at each iteration.
Unfortunately, both the joint and conditional models suffer from a level of arbitrariness in their specification because in the current context the areas are not regular in shape or equal in size. For both models, normality of random effects can be replaced by other choices such as Laplacian and Student's t distributions; see the discussion of Besag and others (1991) and Best and others (1999). The simple correlation structure described for the joint model can be extended to more complex forms, the Matérn class for example; see Matérn (1986), and the discussion of Diggle and others (1998). For the Scottish data, in common with many applications, the data are spatially sparse, and little can be learned about even a single parameter, hence, we do not proceed to more complex forms here. Prior specification on the variances and spatial parameters also requires careful thought, as we discuss in Section 2.7. Various other residual spatial models have been proposed (see for example, Cressie and Chan, 1989, Besag and others, 1991, Clayton and others, 1993, , diggle:tawn:moyeed:98, Leroux and others, 1999, Best and others, 2000, Mugglin and others, 2000, Knorr-Held and Raßer, 2000, Kelsall and Wakefield, 2002, Fernández and Green, 2002, Christensen and Waagepetersen, 2002; Green and Richardson, 2002). Richardson (2003) provides an excellent review of this literature. A number of comparisons between spatial models have been carried out (see for example Lawson and others, 2000; Best and others, 2005).
We have concentrated on Bayesian methods. A number of frequentist approaches are possible, though have not been extensively investigated. Thurston and others (2000) describe a negative binomial additive model that potentially offers a useful alternative to the models described here; the negative binomial aspect would allow for overdispersion, while the generalized additive model allows flexible modeling of latitude and longitude to model large-scale spatial variability. Recent work on generalized linear models with splines may also be applicable, see for example Lin and Zhang (1999) and Gu and Ma (2005). Allowing for small-scale residual spatial dependence in these models would be desirable, however.
2.7 Prior choices for spatial models
Previously, priors have been specified for each of the variance components separately, but it is more practical to represent beliefs about the total variability. Proper priors are required for the parameters of the spatial model, see Berger and others (2001) for a discussion in the context of the joint model.
For the joint model in which a multivariate normal distribution is assigned to U, we have Vi∼iidN(0,σv2) and, independently, Ui∼iidN(0,σu2) so that the residual relative risk eVi + Ui is lognormal with parameters 0 and σv2 + σu2. If we specify inverse gamma priors for σv2 and σu2, the implied prior for σv2 + σu2 is not inverse gamma so that we cannot easily control the total residual relative risk. We write the total precision as τT = (σv2 + σu2) − 1, and as in Section 2.4 specify τT∼Ga(a,b) so that marginally we have a log t distribution for the total residual relative risks. We let p = σu2/(σu2 + σv2) represent the proportion of the total residual variation that is attributable to the spatial component, and assign a beta prior, Be(c,d), to p, and transform from (σT2,p) to (σv2,σu2) via

This prior allows us to control the amount of total residual variability, a quantity for which prior knowledge is available, and induces positive dependence in the joint prior for (σv2,σu2).
Rather than consider the parameter ρ, we specify a lognormal prior, using (2.9), for the distance at which the correlations fall to a half, d1/2 = log2/logρ. For example, if we believe there is a 5% chance that the correlation falls to a half in less than 4 km, and a 95% chance that it falls to a half in less than 125 km we obtain d1/2∼LN(3.11,1.052).
Given its conditional interpretation, it is not straightforward to specify a prior for the ICAR parameter ωu2. Specifying an ICAR model for the spatial effects does not define a proper n-dimensional joint distribution, and none of the marginal distributions for Ui exists. Rather

where Q is the n×n matrix with Qij = − 1/ωu2, if areas i and j≠i are neighbors, Qij = 0 otherwise, and Qii = mi/ωu2.
For prior specification, we follow an approximate strategy and consider the n − 1 random variables Z = (Z1,…,Zn − 1), where Zi = Ui − Un, i = 1,…,n − 1. Hence, Z = AU, where A = [I| − 1], I is the (n − 1)×(n − 1) identity matrix, and − 1 is an (n − 1)×1 vector of − 1‘s. The joint distribution of Z exists, and is an (n − 1)-dimensional normal distribution with mean 0 and precision matrix with a generalized inverse of A, and 0 the (n − 1)×1 vector of 0′s; Besag and Kooperberg (1995) give further details, see Lemma 3.1 and Corollary 3.2. The marginal variance for Zi is var(Zi) = aiωu2, where the constants ai are determined by the neighborhood structure, and are the diagonal elements of . We let represent the average marginal variance, and specify a prior for , which induces a prior for ωu2. Once the calibration between ωu2 and has been carried out, we specify priors for and p, as described for the joint model, and then take σv2 = (1 − p)τT − 1 and . This procedure is approximate in a number of ways. We have considered Z rather than U, and Ui is not marginally normally distributed, but we have found it more useful than previously available prescriptions (see for example, Bernardinelli and others, 1995).
2.8 Spatial models for the Scottish lip cancer data
We assign improper flat priors to each element of β, and for the joint spatial model assume that τT∼Ga(1,0.0260), p∼Be(1,1), and d1/2∼LN(3.11,1.052). Figure 4 shows smoothed marginal densities based on samples from these priors, including induced quantities of interest such as the residual relative risk, exp(Ui + Vi), and ρ10, the correlation at a distance of 10 km. The induced dependence between σv and σu is apparent.
Priors for the joint spatial model. First row: univariate and joint marginals for σv and σu. Second row: the residual relative risk exp(Vi + Ui) margin, the distance at which correlations fall to a half, d1/2, and the correlation between areas whose centroids are 10 km apart, ρ10.
Priors for the joint spatial model. First row: univariate and joint marginals for σv and σu. Second row: the residual relative risk exp(Vi + Ui) margin, the distance at which correlations fall to a half, d1/2, and the correlation between areas whose centroids are 10 km apart, ρ10.
For the ICAR model, the same priors were assumed and we set , where for the Scottish geography with a common boundary neighborhood scheme. This neighborhood scheme is not particularly appealing for Scotland because of the irregularity of the areas. Following other authors (e.g. Thomas and others 2000), we initially assume that the three islands, which have no common boundary neighbors, only have a nonspatial random effect.
Positions 5–7 of Figure 2 show estimates from spatial models, each with the cubic model in AFF. In the nonspatial model, we have shrinkage to the overall regression model, but for the spatial model we have, in addition, local smoothing so that estimates can move away from the regression model.
A striking feature of Figure 2 is the differences in the estimates for areas 8 and 11 under the joint spatial (position 5) and the ICAR (position 6) models. The explanation is that for the three islands without neighbors under the ICAR formulation, there are only nonspatial contributions to the relative risk. Table 1 reports posterior summaries for the parameters of the random effects distributions, and shows that the majority of the total variability is spatial for these data. Hence, we see large shrinkage for the three islands since we are assuming a “common” nonspatial model across islands and non-islands, resulting in too much shrinkage for the islands. There are a number of possibilities for refining the model. One is to assume Vi∼iidN(0,τT − 1) for the islands so that we have the same total variability as non-islands, but with all this variability assumed to be nonspatial. Given our parameterization of the prior, it is straightforward to fit this model. The resulting estimates are shown in position 7, and differ little from those in position 6, which is reassuring. This model may be useful in general circumstances in which there are areas which have no neighbors due to physical boundaries. Further possibilities include defining neighbors for the islands as the nearest points of the mainland (or the nearest island), or assuming a distinct nonspatial distribution for the islands. However, with only three islands this option is not feasible here.
Figure 1(b) shows relative risk estimates under the joint model; the smoothness compared to the SMRs in Figure 1(a) is apparent. Under a Bayesian sampling-based approach, it is straightforward to carry out inference for functions of interest. As an illustration, Figure 5(a)–(c) shows the posterior probabilities that the relative risk in each area exceeds the values 2, 3, and 4. We see a number of areas with high probabilities, suggesting that these be examined more closely to discover the characteristics of the individuals, or health hazards that are present, in these areas to explain the excesses. Such plots are also useful for reflecting the uncertainties inherent in smoothed maps.
Posterior probabilities of exceeding different thresholds under the joint model.
Posterior probabilities of exceeding different thresholds under the joint model.
In Table 1, we examine the sensitivity of estimates of the nonspatial and spatial contributions of residual relative risk, by comparing the original gamma prior for τT to the alternative Ga(1,0.1399). This prior gives relative risks that follow a log t distribution with 2 degrees of freedom, and fall within the range (0.2, 5) with probability 0.95. We see that across these prior scenarios the majority of the residual variability, 82–86%, is explained by the spatial component. Overall, there is little sensitivity of the parameters in Table 1 to the priors considered, though the joint and ICAR models can give quite different estimates in particular areas. Interval estimates for d1/2 are very wide, reflecting the lack of information on the strength of the residual spatial variability. For example, for the prior choice in row 1 of Table 1, a 95% interval for d1/2 is (32, 243) km.
Sensitivity of spatial model parameters to prior choice for τT = (σu2 + σv2)−1, p is the proportion of the total variability that is spatial
| Spatial model | Prior specification | σv | Posterior median | ||
| σu | p | d1/2 (km) | |||
| Joint | τT ∼ Ga(1, 0.0260) | 0.23 | 0.48 | 0.82 | 79 |
| Joint | τT ∼ Ga(1, 0.1399) | 0.24 | 0.49 | 0.82 | 80 |
| ICAR | τT ∼ Ga(1, 0.0260) | 0.23 | 0.53 | 0.85 | — |
| ICAR | τT ∼ Ga(1, 0.1399) | 0.22 | 0.54 | 0.86 | — |
| Spatial model | Prior specification | σv | Posterior median | ||
| σu | p | d1/2 (km) | |||
| Joint | τT ∼ Ga(1, 0.0260) | 0.23 | 0.48 | 0.82 | 79 |
| Joint | τT ∼ Ga(1, 0.1399) | 0.24 | 0.49 | 0.82 | 80 |
| ICAR | τT ∼ Ga(1, 0.0260) | 0.23 | 0.53 | 0.85 | — |
| ICAR | τT ∼ Ga(1, 0.1399) | 0.22 | 0.54 | 0.86 | — |
Sensitivity of spatial model parameters to prior choice for τT = (σu2 + σv2)−1, p is the proportion of the total variability that is spatial
| Spatial model | Prior specification | σv | Posterior median | ||
| σu | p | d1/2 (km) | |||
| Joint | τT ∼ Ga(1, 0.0260) | 0.23 | 0.48 | 0.82 | 79 |
| Joint | τT ∼ Ga(1, 0.1399) | 0.24 | 0.49 | 0.82 | 80 |
| ICAR | τT ∼ Ga(1, 0.0260) | 0.23 | 0.53 | 0.85 | — |
| ICAR | τT ∼ Ga(1, 0.1399) | 0.22 | 0.54 | 0.86 | — |
| Spatial model | Prior specification | σv | Posterior median | ||
| σu | p | d1/2 (km) | |||
| Joint | τT ∼ Ga(1, 0.0260) | 0.23 | 0.48 | 0.82 | 79 |
| Joint | τT ∼ Ga(1, 0.1399) | 0.24 | 0.49 | 0.82 | 80 |
| ICAR | τT ∼ Ga(1, 0.0260) | 0.23 | 0.53 | 0.85 | — |
| ICAR | τT ∼ Ga(1, 0.1399) | 0.22 | 0.54 | 0.86 | — |
2.9 Conclusions
The preferred model here would be one which includes a cubic model in AFF and a spatial component, since the association with AFF is strong and there is significant residual spatial dependence. A full analysis would examine the sensitivity of the relative risk estimates to the prior specifications. There is a large amount of residual variability for these data, which is not surprising since we have no information on risk factors such as smoking, alcohol, and diet. Although it is important to consider models that include residual spatial dependence for small-area studies, empirical Bayes nonspatial models are very useful for exploratory purposes, particularly for choosing an appropriate mean model. Estimates from these models, along with the SMRs, provide baseline estimates which may be compared with those from spatial models.
3. SPATIAL REGRESSION
Spatial regression differs from disease mapping in that the aim is to estimate the association between risk and covariates, rather than to provide area-specific relative risk estimates. This distinction has important implications for modeling both the mean function and the residual variability.
3.1 Drawbacks of approaches under independence
In the usual implementation of regression models, the standard errors are calculated under the assumption that the response data are independent, after control for known covariates. In a spatial context, and particularly when the areas are small, one would expect “residual” dependency between counts in areas that are geographically close, due to unmeasured risk factors or errors in the data that have spatial structure.
3.2 Nonspatial models
We begin by fitting models with no spatial dependency, in order to see the subsequent effect of including such dependency. A naive starting point is the Poisson regression model Yi|β∼Poisson(Eiμi), where μi = μ(xi,β), for i = 1,…,n. As discussed in Section 2.3, this model will almost always be inappropriate for spatial count data, since the Poisson model does not have a variance parameter. An easy way of extending this model is to use quasi-likelihood (McCullagh and Nelder, 1989) and specify the first two moments

where κ represents the level of non-Poisson variability; fitting is straightforward with identical point estimates to the Poisson model, and standard errors multiplied by κ1/2. This model does not assume a distribution for the data, and so is not helpful in the context of disease mapping where prediction is required. The parametric disease mapping models described earlier, in particular the Poisson models with gamma, (2.4), and lognormal, (2.8), random-effects models may also be used in a regression setting.
3.3 Nonspatial regression models for the Scottish lip cancer data
The exposure variable AFF only takes one of six values since it was read from a map key, and hence AFFi is measured with error and, in theory, an errors-in-variables model could be built based on the widths of the cut-points. In common with the other authors who have examined these data, including Clayton and Kaldor (1987), Clayton and Bernardinelli (1992), Breslow and Clayton (1993), Yasui and Lele (1997), Leroux and others (1999), Conlon and Louis (1999), Leroux (2000), Lee and Nelder (2001), Banerjee and others (2004), and Waller and Gotway (2004), we begin by assuming the log-linear model logμ(xi,β) = β0 + β1xi. We provide a discussion of the appropriateness of this model in Section 3.7, where we also provide a more careful interpretation of the coefficients of the model; the ecological interpretation is that exp(β1) is the multiplicative change in risk between an area with all individuals in AFF and another area with no individuals in AFF.
Table 2 gives estimates from a number of regression models, in all cases d1/2∼LN(3.11,1.052). The simple Poisson regression model gives an estimate of 7.4 so that the area-level multiplicative difference in risk between areas with proportion in AFF 0.24 and 0 (the range of the observed data) is exp(7.4×0.24) = 5.9. Model (3.1) gives , a considerable amount of overdispersion, giving a more than doubling in the standard error. The negative binomial parametric version of this model with first two moments (2.5) gives a slight reduction in the size of the coefficient and a very similar standard error.
Estimates and standard errors for ecological regression coefficient, β1, in log-linear model in AFF
| Model | Further specifications | Estimate | Standard Error |
| Poisson | — | 7.4 | 0.60 |
| Quasi-likelihood | — | 7.4 | 1.3 |
| Quasi-likelihood | Northings | 5.5 | 1.2 |
| Quasi-likelihood | Eastings and northings | 5.6 | 1.2 |
| Negative binomial | — | 7.2 | 1.3 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260) | 6.8 | 1.5 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260), β1 ∼ N(0, 4.21) | 6.1 | 1.4 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.1399) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.5 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.0260) | 4.9 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.1399) | 4.9 | 1.4 |
| Model | Further specifications | Estimate | Standard Error |
| Poisson | — | 7.4 | 0.60 |
| Quasi-likelihood | — | 7.4 | 1.3 |
| Quasi-likelihood | Northings | 5.5 | 1.2 |
| Quasi-likelihood | Eastings and northings | 5.6 | 1.2 |
| Negative binomial | — | 7.2 | 1.3 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260) | 6.8 | 1.5 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260), β1 ∼ N(0, 4.21) | 6.1 | 1.4 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.1399) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.5 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.0260) | 4.9 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.1399) | 4.9 | 1.4 |
Estimates and standard errors for ecological regression coefficient, β1, in log-linear model in AFF
| Model | Further specifications | Estimate | Standard Error |
| Poisson | — | 7.4 | 0.60 |
| Quasi-likelihood | — | 7.4 | 1.3 |
| Quasi-likelihood | Northings | 5.5 | 1.2 |
| Quasi-likelihood | Eastings and northings | 5.6 | 1.2 |
| Negative binomial | — | 7.2 | 1.3 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260) | 6.8 | 1.5 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260), β1 ∼ N(0, 4.21) | 6.1 | 1.4 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.1399) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.5 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.0260) | 4.9 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.1399) | 4.9 | 1.4 |
| Model | Further specifications | Estimate | Standard Error |
| Poisson | — | 7.4 | 0.60 |
| Quasi-likelihood | — | 7.4 | 1.3 |
| Quasi-likelihood | Northings | 5.5 | 1.2 |
| Quasi-likelihood | Eastings and northings | 5.6 | 1.2 |
| Negative binomial | — | 7.2 | 1.3 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260) | 6.8 | 1.5 |
| Poisson lognormal | σ−2v ∼ Ga(1, 0.0260), β1 ∼ N(0, 4.21) | 6.1 | 1.4 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.1399) | 3.4 | 1.3 |
| Poisson joint | τT ∼ Ga(1, 0.0260) | 3.5 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.0260) | 4.9 | 1.3 |
| Poisson ICAR | τT ∼ Ga(1, 0.1399) | 4.9 | 1.4 |
A plot of AFF versus eastings and northings shows that there is spatial structure in the exposure, with low values in the heavily populated urban areas in the Glasgow region and a general increase further north (Figure 6). Leroux and others (1999) and Leroux (2000) include northings (latitude) in the log-linear model. When we include the northing projection centroids in the model, the AFF coefficient is reduced because of the south–north trend in exposure; the inclusion of eastings gave a far smaller change since there is little east–west gradient in the exposure surface (Table 2). The decision of whether to include a large-scale trend in the risk surface is a difficult one in a regression setting since exposures of interest will often have spatial structure and so estimates of relative risks of interest will in general change. Non-adjustment for the trend will attribute any such trend to the exposure estimate, and may also invalidate the assumption of stationarity of spatial models such as the joint specification. Ideally the choice will be based on epidemiological grounds; if it believed that the trend is due to plausible unmeasured factors such as socioeconomic status, then the trend should be included, though it is clearly preferable to have a direct measure of the variable responsible for the trend.
We now turn to the Poisson lognormal model, (2.8). We choose improper flat priors on β0 and β1, and a Ga(1,0.0260) prior on σv − 2. This yields a posterior mean and standard deviation of 6.8 and 1.5, showing reasonable agreement with the comparable quasi-likelihood/negative binomial models. To illustrate the use of a proper prior on β1, suppose we believe that the 50 and 95% points of the relative risk between areas with 10 and 0% in AFF are 1 and 2. This yields a normal prior for β1 of N(0,4.21), and a posterior mean and standard deviation of 6.1 and 1.4, illustrating the effect of the prior in reducing both the estimate of the association and the standard error.
3.4 Spatial models
Unfortunately, there are currently no simple ways of fitting frequentist fixed-effects, nonlinear models with spatially dependent residuals, and so we concentrate on random-effects models and the form given in (2.11) and (2.12), with no large-scale trend. It would be desirable to be able to perform sandwich estimation in a spatial regression setting, but unfortunately the non-lattice nature of the data does not easily allow any concept of replication across space, as was used by Heagerty and Lumley (2000) in the case of lattice data.
3.5 Spatial regression models for the Scottish lip cancer data
We specify identical priors for σv2, σu2, d1/2, and ωu2 as in the disease mapping analyses. Improper flat priors were placed on β0 and β1.
Table 2 shows that the estimated relative risks are greatly attenuated under each of the spatial models. The explanation is spatial dependence in AFF and in the disease counts. The choice of the level of spatial smoothing is a difficult one without knowing the true spatial dependence model. Here, the use of an informative prior distribution based on another study region might be useful. It is interesting to see that the standard error is reduced when spatial dependence is acknowledged; perhaps in conflict with intuition, Wakefield (2003) provides more discussion of this issue. Under the first joint model in Table 2, the prior and posterior 5, 50, and 95% quantiles of the distance at which correlations fall to a half are (4, 22, 125) km and (35, 72, 199) km, respectively, again illustrating that there is relatively little information in the data on the range of the correlation. Spatial variation accounts for 77% of the total, with posterior means for σv and σu of 0.26 and 0.52.
3.6 Ecological bias
There is a vast literature describing sources of ecological bias (see for example, Richardson and others, 1987, Piantadosi and others, 1988, Greenland and Morgenstern, 1989, Greenland, 1992, Greenland and Robins, 1994, Morgenstern, 1998, Wakefield, 2003). Ecological bias occurs because of within-area variability in exposures and confounders, and there are a number of distinct consequences of this variability; we discuss pure specification bias, confounding, and standardization.
Pure specification bias.
So-called pure specification bias arises because a nonlinear risk model changes its form under aggregation. To illustrate, we specify a model at the level of the individual and then aggregate to determine the implied ecological form. Let Yij denote the individual binary disease outcome with Yij = 0/1 representing noncase/case, and xij the exposure of individual j in area i, i = 1,…,n, j = 1,…,Ni. For simplicity, we assume a univariate exposure and no confounders and let p(α,x) denote the risk for an individual with exposure x as a function of parameters α. The individual outcome, Yij, is Bernoulli with probability of disease pij, written as Yij|xij∼indBern(pij). The implied aggregate (average) risk is

where pij = p(α,xij); (3.2) clearly shows that to avoid ecological pure specification bias, we require the average of the individual risks, rather than the risk associated with the average exposure, which would be used in a naive model. If Ni is large, then an alternative derivation is to assume that exposures xij are drawn independently from a distribution fi(x|ϕi), where ϕi denote the parameters of this distribution. It then follows that, marginally, Yij|ϕi is Bernoulli with probability of disease

for a continuous exposure, and

for a K-level discrete exposure. In a disease mapping context, Knorr-Held and Besag (1998, p 2050) considered a discrete exposure with fi(x|ϕi) the distribution of a generalized Bernoulli random variable with ϕi = (υi1,…,υiK) and υik representing the probability of falling in exposure category k in area i. In this case, we obtain pi = ∑k = 1Kpikυik as the risk associated with an individual randomly selected from area i and with no knowledge of their exposure. In an ecological regression context, Richardson and others (1987) and Plummer and Clayton (1996) considered (3.3) with fi(x|ϕi) a normal distribution with . As an illustration of the problems that arise due to pure specification bias, we present a simple example using the normal model. For a rare outcome, a common disease model is pij = p(α,xij) = eα0 + α1xij and (3.3) assumes the closed form

This may be compared with the naive ecological model where we have used β0,β1 in the ecological model to emphasize that in this model we are not estimating the individual-level parameters α0,α1. To gain intuition as to the extent of the bias, we observe that in (3.5) the within-area variance si2 is acting like a confounder, and there is no pure specification bias if the exposure is constant within each area, or if the variance is independent of the mean exposure in the area. The expression (3.5) also allows us to characterize the direction of bias. For example, if α1 > 0 and the within-area variance increases with the mean, then overestimation will occur. In general, there is no pure specification bias if the disease model is linear in x or if all the moments of the within-area distribution of exposure are independent of the mean. Bias will also be small if α1 is close to zero, since the risk model is then approximately linear.
Confounding.
Between-area confounding is analogous to conventional confounding, since the area is the unit of analysis, and so the implications are relatively straightforward to understand. Within-area confounding is more complex. In an ecological study, we need to control for the complete within-area “distribution” of exposures and confounders. We illustrate in the simplest situation in which we have a binary exposure, x1, a binary confounder, x2, and assume the individual-level risk model: p(α,x) = exp(α0 + α1x1 + α2x2 + α3x1x2). Then Lasserre and others (2000) show that the aggregate form is

where xix1x2 is the proportion of individuals in area i in exposure/confounder stratum x1,x2. Letting xi1 + and xi + 1 represent the proportion of individuals in area i who have x1 = 1 and x2 = 1, respectively (the marginal prevalences), we may rewrite the average risk (3.6) as

showing that the marginal prevalances alone, which often constitute all the available information in ecological data, are not sufficient to characterize the joint distribution unless x1 and x2 are independent, in which case x2 is not a within-area confounder.
Standardization.
If the response is standardized with respect to confounders, then the exposure must be also; this is known as mutual standardization. Rosenbaum and Rubin (1984) provide a discussion in the context of direct standardization. To illustrate the problem in the context of ecological studies, consider a continuous exposure x and suppose the individual-level model is given by pk(α,γ,xik) = exp(α0 + α1xik + γk), for k = 1,…,K stratum levels with associated relative risks eγk, with γ = (γ1,…,γK) and γ1 = 0; we will refer to the confounder as age. For an individual randomly selected in stratum k, the aggregate risk is

where fik(x) is the distribution of the exposure in area i and confounder stratum k, with parameters ϕik. If the population in area i stratum k is Nik, with Yik cases, then summing over stratum

If we assume a common exposure distribution across stratum, fi(x|ϕi), so that age does not correspond to a within-area confounder, then (3.7) simplifies to

where Ei = ∑k = 1KNikeγk. This mean is often used in conjuction with a Poisson model; we have standardized for the confounder via indirect standardization, but for this to be valid we need to assume that the exposure is constant across age groups. The correct mean model is given by (3.7), and requires the exposure distribution by age, which will rarely be available.
The above discussion makes it clear that to prevent ecological bias, we need individual-level data to control for the within-area distribution of confounders and exposures. Prentice and Sheppard (1995) described a very powerful method for reducing ecological bias based on subsamples of individual exposure–confounder data, but not individual disease outcomes, within each area. Haneuse and Wakefield (2005), Haneuse and Wakefield (2006) described a hybrid design in which case–control and ecological data are combined.
3.7 Ecological bias in the Scottish lip cancer data
We build a model from the level of the individual in order to aid interpretation of an ecological association. Let Yij be an indicator of lip cancer with Yij|pij∼Bern(pij), and let the risk for individual j in area i, who is in age group kij∈{1,…,K} be given by

with xij an indicator of whether individual j in area i works in AFF; the second form illustrates that we have a linear model in the exposure xij. The parameters exp(γk) are the risks associated with being in age stratum k, k = 1,…,K, while exp(α1) is the parameter of primary interest and is the multiplicative change in risk for an exposed individual when compared to an unexposed individual, which is assumed the same across all age groups so that there is no interaction between exposure and age.
We now consider the effect of aggregation. Suppose the number of individuals in stratum k who are unexposed (exposed) in area i is Ni0k (Ni1k), and Yi0k (Yi1k) of these are cases. For a rare disease, Yixk|α,γ∼indPoisson(Nixkeα0 + α1x + γk), and so Yi0k + Yi1k|α,γ∼indPoisson(Ni0keα0 + γk + Ni1keα0 + α1 + γk). We now assume that the proportions exposed, xi, and unexposed, 1 − xi, in area i are constant across age groups, that is Ni0k = (1 − xi)Nik, Ni1k = xiNik, where Nik is the population in confounder stratum k in area i. Summing over k: Yi|α,γ∼indPoisson([1 − xi]eα0∑k = 1KNikeγk + xieα0 + α1∑k = 1KNikeγk) to give

where Ei(γ) = ∑k = 1KNikeγk.
The addition of random effects to this model is straightforward; if we begin with the individual-level model pij = exp(α0 + α1Zij + γkij + Vi + Ui), we obtain

The above model is an example of inference over a collapsed margin. Byers and Besag (2000) considered such a situation with a Poisson model, but were apparently unaware that the likelihood is available in closed form, and instead introduced auxiliary variables within an MCMC scheme.
From an individual-level perspective, we see that there are a number of problems with the analyses reported in Section 3.5, and with those carried out previously. The most obvious is that Model (3.9) is linear, rather than log-linear, because the original individual-level model, (3.8), is linear and linearity is preserved under aggregation. The correct interpretation of the parameters in the log-linear model that was fitted in Section 3.5 is that exp(β1) is the relative risk associated with the “contextual” effect of the proportion of individuals who are exposed to AFF in each area; i.e. it is not individual occupation in AFF that is relevant to individual risk, but the proportion of individuals in the area who are employed in AFF, a model that does not make substantive sense. The case of a binary covariate has been extensively studied in the social sciences literature, see Wakefield (2004) for further details.
In order for Model (3.9) to be relevant, we also need to assume that the proportion in AFF is constant across age groups, which is unlikely to hold, but unfortunately we do not have data on the proportion in AFF by age group. Finally, if the expected numbers are age standardized using a priori internal standardization, rather than external standardization in which reference rates are based on data from another region, then the subsequent estimation of the AFF association will be distorted if age is a confounder, since some of the effect of AFF will already have been absorbed into the age effects. The disease and population counts are required by county and age stratum in order to fit a model in which age and AFF are simultaneously estimated.
3.8 Individual-level models for the Scottish lip cancer data
Table 3 contains estimated relative risks under a number of different models. A quasi-likelihood version of (3.9) is given by
Estimates and standard errors for individual relative risk, exp(α1), in linear model
| Model | Relative risk | Standard Error |
| Quasi-likelihood | 23 | 7.0 |
| Poisson lognormal | 21 | 7.5 |
| Poisson joint | 6.4 | 3.3 |
| Poisson ICAR | 9.9 | 3.3 |
| Model | Relative risk | Standard Error |
| Quasi-likelihood | 23 | 7.0 |
| Poisson lognormal | 21 | 7.5 |
| Poisson joint | 6.4 | 3.3 |
| Poisson ICAR | 9.9 | 3.3 |
Estimates and standard errors for individual relative risk, exp(α1), in linear model
| Model | Relative risk | Standard Error |
| Quasi-likelihood | 23 | 7.0 |
| Poisson lognormal | 21 | 7.5 |
| Poisson joint | 6.4 | 3.3 |
| Poisson ICAR | 9.9 | 3.3 |
| Model | Relative risk | Standard Error |
| Quasi-likelihood | 23 | 7.0 |
| Poisson lognormal | 21 | 7.5 |
| Poisson joint | 6.4 | 3.3 |
| Poisson ICAR | 9.9 | 3.3 |

where xi1 = Ei(1 − xi) and xi2 = Eixi, a linear model. Fitting this model gave with an estimate of the relative risk of 23 and standard error 7.0. The fit corresponding to this model is shown as the solid line in Figure 3, and appears reasonable.
Turning to a Bayesian approach, perhaps surprisingly, the use of an improper flat prior for α1 leads to an improper posterior. Consider the model Yi|α∼indPoisson(Ei{[1 − xi]eα0 + xieα0 + α1}). Assigning an improper uniform prior to α0, we integrate this parameter from the model to give

the likelihood contribution of which tends to the constant

as α1→ − ∞, showing that a proper prior is required. The constant (3.11) is nonzero unless xi = 1 in any area with yi≠0. The reason for the impropriety is that α1 = − ∞ corresponds to a relative risk of zero, so that exposed individuals cannot get the disease, which is not inconsistent with the observed data unless all individuals in area i are exposed, xi = 1, and yi≠0 in that area, since then clearly the cases are due to exposure. A similar argument holds as α1→∞, with replacement of 1 − xi by xi in (3.11) providing the limiting constant. Figure 7 illustrates this behavior for the Scottish lip cancer example, for which xi = 0 in five areas.
Log likelihood for α1 for the Scottish data; the horizontal line is the constant to which this function tends to as α1→ − ∞.
Log likelihood for α1 for the Scottish data; the horizontal line is the constant to which this function tends to as α1→ − ∞.
We now describe Bayesian analyses for which we have assumed that the relative risk was less than 1 with probability 0.5 and less than 50 with probability 0.95 to give the lognormal prior LN(0,2.382) for the relative risk eα1. Priors for the other parameters were the initial choices described above. The nonspatial Poisson lognormal model (2.8) gave posterior mean and standard deviation of 21 and 7.5, in close agreement with quasi-likelihood.
We fit joint and ICAR versions of (3.10) with the priors used previously. The posterior mean and standard deviation of the relative risk from the joint analysis are 6.4 and 3.3, while under the ICAR model the posterior mean and standard deviation are 9.9 and 3.3.
3.9 Conclusions
There are a number of problems with the spatial regressions described here. There are likely to be missing confounders as we have no information on lifestyle characteristics of the individuals, for example diet, smoking, and alcohol. A priori internal standardization was carried out, which may distort the regression coefficient. Finally, the assumption of constant proportions of individuals in AFF across age groups is highly dubious. However, the coefficients observed are very large, and we may conclude that there is an association between lip cancer incidence and occupations that lead to exposure to sunlight; placing a numerical value on the relative risk would be hazardous, however!
In general, an appropriate individual model should be aggregated to give the ecological model, so that the assumptions and aggregate data required for accurate inference can be clarified. The models representing spatial trends in risk, at both short and non-short scales, should also be carefully chosen, in order to decide on the manner in which spatial dependence is modeled.
4. POSTSCRIPT: ANALYSES WITH AUGMENTED DATA
In previous sections, we used the Scottish lip cancer data that are routinely available and have been analyzed by numerous authors. We now analyze data provided by Michel Smans of the International Agency for Research on Cancer. These data consist of disease counts Yik and populations Nik by county i and age group k, i = 1,…,56, k = 1,…,10. The first nine age strata were collapsed from the original 18 5-year strata due to sparsity of cases, so that the age categories are [0, 44), [45, 49), …, [80, 85), 85+. Two county-level covariates were also provided: the proportion in AFF and the proportion of all economically active households in Social class IV (partly skilled) and V (unskilled); these variables were obtained from General Register Office (1983). There were some small differences in the expected numbers constructed from the full data when compared to the widely used data, due to rounding errors. The new AFF x variable did not always agree with the widely used data, even accounting for measurement error, apparently due to the misreading of Figure 3.6 in the cancer atlas. For example, four values of 0.16 in the widely used data were actually 0.01.
With respect to disease mapping, the proportionality assumption (2.1) was assessed by splitting the data into two sets of age categories ( < 65, ≥ 65) and calculating SMRs for each. This resulted in very similar SMRs in each age category across areas, so proportionality appears reasonable for these two age groups. This could be examined more formally using regression modeling with interactions between age and county.
Fitting a quasi-likelihood model with the original expected numbers, the new AFF variable and the naive log-linear ecological regression model gave an estimate with standard error 1.15, compared to 7.4 with standard error 1.3 for the original data. Hence, we see attenuation when using the mis-measured x variable. The reduced standard error was due to a reduction in κ of 4.9–3.8, perhaps due to induced model misspecification when the mis-measured x is used.
To assess whether a priori internal standardization distorts the estimate of the AFF association, we carry out simultaneous quasi-likelihood estimation of age and AFF using the naive mean model E[Yik|β,γ] = Nikexp(β0 + β1xi + γk), and assuming var(Yik|β,γ) = κ×E[Yik|β,γ]. The estimate differs from the expected numbers estimate of 8.9, showing that age is a confounder, leading to bias due to a lack of mutual standardization.
We also analyzed the age-stratified data using the more appropriate individual model (3.9), and the new AFF variable with the joint spatial model and the same priors as previously, with flat priors on γk, k = 2,…,10. The posterior mean for the relative risk was 17 with posterior standard deviation 6.3, and 95% interval (7.2, 32), which is considerably larger than in Table 3. With an ICAR spatial model, the corresponding figures were 19 and 6.0 with 95% interval (9.5, 33). For each pair of analyses in Sections 3 and 4, the ICAR model gives less-attenuated relative risk estimates than the joint model, perhaps because the smoothing is more local under the ICAR model, and so less of the south–north trend in risk is being absorbed into the spatial residuals.
The social class variable may be seen, at least potentially, as an ecological surrogate for lifestyle characteristics such as smoking, diet, and alcohol consumption. Perhaps surprisingly, the inclusion of the social class variable did not lead to a significant change in the coefficient associated with AFF. This is only an attempt to assess between-area confounding, however. To examine within-area confounding would require the cross-classification of AFF and social class within each area, which is unavailable.
The final aspect of these data that we cannot access is within-area confounding due to age, since we do not have information on the proportion in AFF within each age stratum; if these were available then we would not be susceptible to ecological bias, at least due to this source, since we could completely characterize the within-area distribution of exposure and confounder.
5. DISCUSSION
Throughout this paper, we have stressed that although the models applied in disease mapping and spatial regression studies have similar features, the enterprises have very different aims and modeling strategies should reflect this. Disease mapping is an exercise in prediction and, therefore, the form of the regression model can be very flexible, and need not reflect any causal mechanism. The use of the raw SMR is statistically valid (as Ei increases, the SMRi tends to the correct area-level relative risk), but residual spatial dependence is potentially useful since it may be exploited to smooth estimates in neighboring areas. Area-level estimates should be carefully examined, however, to see that appropriate amounts of smoothing (in both regression and spatial models) have been used. Inappropriate smoothing may be reflected in unexpected changes in estimates when compared to the SMRs, as we saw with the Scottish data. Prior choice for the variance parameters is important, particularly if the number of areas is not large. In spatial regression, the aim is to estimate causal parameters, and the form of the mean model is of vital importance. Building an aggregate model from the level of the individual is an important step toward understanding potential sources of ecological bias. Valid inference in spatial regression also requires acknowledgment of residual spatial dependence. Careful modeling of residual spatial dependence, at both small and large scales, is required, however, since regression coefficients of interest will often be sensitive to the form of the dependence assumed. Whether to include large-scale trends should be decided on epidemiological considerations concerning likely unmeasured confounders. Short-scale dependence models should ideally be informed by priors based on previous studies of the disease under study. We have discussed prior choices in some detail, but we stress that each study must be considered separately, and we would not recommend uncritical use of the specific prior choices used here.
All fitting was carried out using the freely available R and WinBUGS packages; code to implement each of the models described here may be found at http://faculty.washington.edu/jonno/cv.html. The WinBUGS software is a very flexible piece of freely available computer software that uses MCMC algorithms to generate dependent samples from the posterior distribution of a user-specified model. The GeoBUGS software has a library of many common spatial models; see Thomas and others (2000) for further details. Rue and Held (2005) provide details of alternative MCMC algorithms.
Multilevel models provide a means for modeling dependencies in data in order to provide appropriate standard errors and to allow smoothing, but they cannot control for confounding. As argued in Wakefield (2003), in spatial regression studies more effort should be placed on confounding/within-area modeling than spatial dependence, as the latter will be of secondary importance. This was illustrated in the Scottish lip cancer data, where previous analyses had investigated a variety of spatial models, but with an incorrect mean function. In principle, residual spatial dependence is a problem for individual-level studies also, though the collection of individual-level confounder variables is likely to reduce the extent of shared unmeasured variables, and hence residual dependence.
We have seen that for disease mapping, great care is required in modeling both covariates and the large-term spatial trend. Splines are appealing in disease mapping studies, to model the effects of both covariates and spatial coordinates, as they provide flexible modeling and, at least for nonspatial models, can easily be incorporated into an empirical Bayes procedure. Computationally simple frequentist fitting is ideal for exploratory analyses, though incorporating residual spatial dependence is currently not straightforward.
Inferentially, we have described frequentist methods for inference in nonspatial models, since these are computationally convenient, and Bayesian methods for spatial models; for spatial regression fitting, an appropriate mean model is more important than the choice of any particular inferential paradigm.
I would like to thank Carol Gotway for supplying the Great Britain National Grid coordinates for the Scottish data, Michel Smans for the original Scottish data, and the editor for detailed comments on the paper. This work was supported by grant R01 CA095994 from the National Institutes of Health. Conflict of Interest: None declared.







