-
PDF
- Split View
-
Views
-
Cite
Cite
George Gerogiannis, Mark Tranmer, Duncan Lee, Thomas Valente, A Bayesian Spatio-Network Model for Multiple Adolescent Adverse Health Behaviours, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 71, Issue 2, March 2022, Pages 271–287, https://doi.org/10.1111/rssc.12531
- Share Icon Share
Abstract
The use of alcohol, cigarettes and marijuana among adolescents are major public health concerns, and a number of epidemiological studies have been conducted to understand the drivers of these individual health behaviours. However, there is no literature that jointly models these health behaviours with the aim of understanding the relative importance of individual factors, friendship effects and spatial effects in determining the prevalence of alcohol, cigarette and marijuana use among adolescents. To address this gap in the literature, we propose a novel multivariate spatio-network model for jointly modelling all three of these behaviours, with inference conducted in a Bayesian setting using Markov chain Monte Carlo simulation. The model is motivated by survey data from five schools in Los Angeles, California, and the results indicate the important roles that individual factors and friendship networks play in driving the uptake of these health behaviours.
1 INTRODUCTION
The consumption of alcohol, cigarettes and marijuana are some of the largest public health concerns around the world. There is a strong consensus in the scientific community that alcohol consumption can cause various forms of cancer, with the National Toxicology Program (NTP) of the US Department of Health and Human Services listing the consumption of alcohol as a known human carcinogen based on evidence from studies in humans (National Toxicology Program, 2016). Similarly, the epidemiological literature on the negative effects of cigarettes has an incredibly long history (Doll & Hill, 1950, 1954), and smoking causes cancer, heart disease, stroke, lung diseases, diabetes and chronic obstructive pulmonary disease (COPD) (United States Surgeon General, 2014). In contrast, the study of the health effects of marijuana is a relatively new field of research, and a review of studies suggest that marijuana users are significantly more likely than non-users to develop long-lasting mental health disorders such as schizophrenia (Volkow et al., 2016).
Thus it is of interest to understand the factors that drive people to partake in these inter-related adverse health behaviours, and a number of different factors have been identified. One of the most salient and consistent predictors of an individual's behaviours is peer influence (Leung et al., 2014), and many theories have been put forward to explain this mechanism. For example, Bandura and Walters (1977) propose a social learning theory, in which individuals acquire new behaviours through the observation and imitation of their peers. Individual factors have also been strongly associated with adverse health behaviours, such as educational attainment (Gilman et al., 2008) and stressful life events (Simantov et al., 2000). Finally, spatial effects have also been observed (Lee & Lawson, 2016; Pearce et al., 2009), with rates of adverse health behaviours being higher in some neighbourhoods than others. This then begs the natural question as to which of these factors has the greatest influence on adolescent health behaviours, which is the gap in the literature that this paper addresses. The identification of the drivers governing adolescent health behaviours is crucial in a public health context, because such information can be used in the design of intervention strategies. There are many examples of such interventions and policies aimed at reducing an individual's likelihood of partaking in adverse health behaviours (Lantz et al., 2000), many of which can be classified as neighbourhood (Lewit et al., 1997) or social network (Brown et al., 2019; Starkey et al., 2009; Valente, 2012) strategies.
The modelling approaches used to quantify the effects of these drivers on adverse health behaviours are typically based around generalised linear models (GLMs) and extensions thereof. For example, Castillo et al. (2017) used a GLM to study whether regional targeting of interventions that aim to reduce the frequency as well as volume of drinking may be effective, while Fujimoto and Valente (2015) sought to provide insight into how adolescent health behaviour is predicated. In contrast, multiple membership multiple classification (MMMC, Browne et al., 2001) approaches extend GLMs by explicitly modelling network effects via random effects representing peer influence in the model. Most notably, Tranmer et al. (2014) and Lorant and Tranmer (2019) use a univariate class of this model for investigating adolescent behaviours, and focus on estimating the relative share of variation of risky and protective health behaviours at different levels of the population structure: individuals, friendship networks and schools.
These existing studies are limited, however, in that they have not fully accounted for the combined effects of individual factors, network effects and spatial location, thus making them unable to discern the relative importance of these factors. Additionally, they typically consider only a single health behaviour, and thus are not able to quantify the similarities in these behaviours and their drivers. For example, Castillo et al. (2017) ignores the spatial dependence that may be present in the data, while Lorant and Tranmer (2019) only consider one health behaviour at a time. This study thus proposes a novel spatio-network modelling approach for inter-related adolescent health behaviours, which for the first time enables the relative importance of individual factors, friendship effects and spatial effects on multiple health behaviours to be quantified. The statistical novelty in our approach is the fusion of network models (Browne et al., 2001) with spatial areal unit models (Leroux et al., 2000), and we provide software via an R package to allow others to implement our methods on their own data. The software in fact provides a suite of models that allow the three components (individual factors, friendship network effects and spatial effects) to be included or excluded, allowing a direct comparison of how well each of these factors explain adolescent health behaviours.
This work is organised as follows: Section 2 describes the study that motivates the methodological development, while Section 3 outlines the novel Bayesian spatial network models proposed. Section 4 presents the application of the models to the motivating study data, while Section 5 provides a concluding discussion.
2 MOTIVATING STUDY
The data in this study come from the Social Networking Survey (Valente et al., 2013), which is based in Los Angeles, California (see left panel of Figure 1). The study surveyed 1068 10th grade adolescents (15–16 year olds) from this area who belong to one of five schools in the region, with between 158 and 284 adolescents belonging to each school. Further details of the study are available from Valente et al. (2013).
2.1 Health behaviour data
The health behaviour data are binary (no/yes) self-reported measures of alcohol, cigarettes and marijuana consumption for each 10th grade student in 2010. More specifically, whether an individual has: (i) consumed at least one drink of alcohol in the past 30 days; (ii) smoked at least one cigarette in the past 30 days; and (iii) ever tried marijuana. The numbers responding no/yes for each adverse health behaviour are summarised in Table 1, where the results are presented separately for each school. The table shows that overall the schools the highest prevalence was for drinking alcohol (29%), with marijuana use (27%) being the least common adverse health behaviour. There is greater variation in the prevalence of adverse health behaviours between schools, with school 2 having an average prevalence over all three responses of 37% compared with 19% for school 4. Finally, the table shows that there were no missing values in these adverse health behaviour responses.

Maps of California (left) and the spatial configuration of the Zip Codes (right). In the latter, the lines denote the neighbour relationships between two Zip Codes assumed when fitting the model.
. | School 1 . | School 2 . | School 3 . | School 4 . | School 5 . | All schools . |
---|---|---|---|---|---|---|
. | n = 284 . | n = 185 . | n = 158 . | n = 256 . | n = 185 . | n = 1068 . |
Responses | ||||||
Alcohol consumption (no/yes) | 203 / 81 | 119 / 66 | 98 / 60 | 215 / 41 | 120 / 65 | 755 / 313 |
Cigarette consumption (no/yes) | 208 / 76 | 115 / 70 | 109 / 49 | 197 / 59 | 133 / 52 | 762 / 306 |
Marijuana consumption (no/yes) | 211 / 73 | 117 / 68 | 109 / 49 | 211 / 45 | 133 / 52 | 781 / 287 |
Gender | ||||||
Female | 157 | 84 | 84 | 124 | 108 | 557 |
Male | 127 | 101 | 74 | 132 | 77 | 511 |
Exam grades | ||||||
Mostly A's | 48 | 17 | 10 | 38 | 17 | 130 |
Mostly A's and B's | 68 | 35 | 42 | 77 | 53 | 275 |
Mostly B's | 15 | 14 | 12 | 18 | 9 | 68 |
Mostly B's and C's | 79 | 50 | 54 | 61 | 55 | 299 |
Mostly C's or lower | 74 | 69 | 40 | 62 | 51 | 296 |
. | School 1 . | School 2 . | School 3 . | School 4 . | School 5 . | All schools . |
---|---|---|---|---|---|---|
. | n = 284 . | n = 185 . | n = 158 . | n = 256 . | n = 185 . | n = 1068 . |
Responses | ||||||
Alcohol consumption (no/yes) | 203 / 81 | 119 / 66 | 98 / 60 | 215 / 41 | 120 / 65 | 755 / 313 |
Cigarette consumption (no/yes) | 208 / 76 | 115 / 70 | 109 / 49 | 197 / 59 | 133 / 52 | 762 / 306 |
Marijuana consumption (no/yes) | 211 / 73 | 117 / 68 | 109 / 49 | 211 / 45 | 133 / 52 | 781 / 287 |
Gender | ||||||
Female | 157 | 84 | 84 | 124 | 108 | 557 |
Male | 127 | 101 | 74 | 132 | 77 | 511 |
Exam grades | ||||||
Mostly A's | 48 | 17 | 10 | 38 | 17 | 130 |
Mostly A's and B's | 68 | 35 | 42 | 77 | 53 | 275 |
Mostly B's | 15 | 14 | 12 | 18 | 9 | 68 |
Mostly B's and C's | 79 | 50 | 54 | 61 | 55 | 299 |
Mostly C's or lower | 74 | 69 | 40 | 62 | 51 | 296 |
. | School 1 . | School 2 . | School 3 . | School 4 . | School 5 . | All schools . |
---|---|---|---|---|---|---|
. | n = 284 . | n = 185 . | n = 158 . | n = 256 . | n = 185 . | n = 1068 . |
Responses | ||||||
Alcohol consumption (no/yes) | 203 / 81 | 119 / 66 | 98 / 60 | 215 / 41 | 120 / 65 | 755 / 313 |
Cigarette consumption (no/yes) | 208 / 76 | 115 / 70 | 109 / 49 | 197 / 59 | 133 / 52 | 762 / 306 |
Marijuana consumption (no/yes) | 211 / 73 | 117 / 68 | 109 / 49 | 211 / 45 | 133 / 52 | 781 / 287 |
Gender | ||||||
Female | 157 | 84 | 84 | 124 | 108 | 557 |
Male | 127 | 101 | 74 | 132 | 77 | 511 |
Exam grades | ||||||
Mostly A's | 48 | 17 | 10 | 38 | 17 | 130 |
Mostly A's and B's | 68 | 35 | 42 | 77 | 53 | 275 |
Mostly B's | 15 | 14 | 12 | 18 | 9 | 68 |
Mostly B's and C's | 79 | 50 | 54 | 61 | 55 | 299 |
Mostly C's or lower | 74 | 69 | 40 | 62 | 51 | 296 |
. | School 1 . | School 2 . | School 3 . | School 4 . | School 5 . | All schools . |
---|---|---|---|---|---|---|
. | n = 284 . | n = 185 . | n = 158 . | n = 256 . | n = 185 . | n = 1068 . |
Responses | ||||||
Alcohol consumption (no/yes) | 203 / 81 | 119 / 66 | 98 / 60 | 215 / 41 | 120 / 65 | 755 / 313 |
Cigarette consumption (no/yes) | 208 / 76 | 115 / 70 | 109 / 49 | 197 / 59 | 133 / 52 | 762 / 306 |
Marijuana consumption (no/yes) | 211 / 73 | 117 / 68 | 109 / 49 | 211 / 45 | 133 / 52 | 781 / 287 |
Gender | ||||||
Female | 157 | 84 | 84 | 124 | 108 | 557 |
Male | 127 | 101 | 74 | 132 | 77 | 511 |
Exam grades | ||||||
Mostly A's | 48 | 17 | 10 | 38 | 17 | 130 |
Mostly A's and B's | 68 | 35 | 42 | 77 | 53 | 275 |
Mostly B's | 15 | 14 | 12 | 18 | 9 | 68 |
Mostly B's and C's | 79 | 50 | 54 | 61 | 55 | 299 |
Mostly C's or lower | 74 | 69 | 40 | 62 | 51 | 296 |
2.2 Covariate data
In addition to data on the school that each adolescent attends, the survey also provides data on possible individual-level factors that may affect the prevalence of each adverse health behaviour. Following an exploratory model building exercise using simple Bernoulli logistic regression models, two covariates exhibited significant effects for all three health behaviours and are hence included in our analysis. These covariates are summarised in Table 1, and the first is the gender of the individual which is a well-known factor that influences an individual's health behaviour (Institute of Medicine et al., 2001). The table shows there are more females (52%) than males (48%) in the survey, although the difference is not large. The second covariate we consider is an individual's educational attainment because existing studies (Escobedo & Peddicord, 1996; Hu et al., 2006) have shown this can affect adverse health behaviours. The survey contained a categorical variable denoting the modal grade achieved in each individual's previous year of exams, and the table shows that overall 56% of adolescents achieved mostly B's and C's or lower in their exams, while 12% obtained mostly A's.
2.3 Friendship network structure
The friendship structure between the adolescents is captured in the survey by two questions: (i) Please think of your seven best friends in 10th grade; and (ii) Are there other people in the 10th grade who you consider a close friend? For the latter question, each individual was allowed to nominate up to 12 friends, which when combined with the first question resulted in up to 19 nominations in total. The friendship network structure for adolescents in school 1 is displayed in Figure 2 as a directed graph, and similar figures for the remaining schools are not shown for brevity. In the figure, each vertex represents an individual, while a line from individual A to individual B denotes that A nominated B as a friend. Note this is a directed graph because if A nominates B as a friend it does not necessarily follow that B also nominates A. The three panels of Figure 2 relate to the three adverse health behaviours, and in each case, a red vertex is an individual who does partake in the adverse health behaviour while a blue vertex denotes somebody who does not. Visually the figure shows that blue vertices are predominantly connected to blue vertices, while red vertices are mainly connected to other red vertices. This appears to be the case for all three adverse health behaviours, and suggests that adolescents are more likely to partake in these behaviours if they are friends with somebody else who also partakes in the same behaviour.

The directed friendship network structure present in school 1. The individuals (vertices) are coloured by their alcohol (left), cigarette (middle) and marijuana (right) response—red for ‘yes’ and blue for ‘no’. A directed connection (line) between two vertices (individuals) denotes that one nominated the other as a friend.
To assess the presence of such friendship network effects more formally, the empirical probability of engaging in a specific health behaviour given that the individual has nominated at least one other individual that engages in that same health behaviour was computed. In the case of consuming at least one drink of alcohol in the past 30 days, these probabilities were 0.74, 0.62, 0.68, 0.60 and 0.8 across the five schools. In the case of smoking at least one cigarette in the past 30 days, these probabilities were 0.70, 0.57, 0.53, 0.68 and 0.79 across the five schools. Finally, in the case of ever trying marijuana, these probabilities were 0.73, 0.63, 0.65, 0.71 and 0.75 across the five schools. These conditional probabilities are much higher than the marginal probabilities of partaking in each adverse health behaviour, which across all five schools are: alcohol—0.29, cigarette—0.29 and marijuana—0.27. This suggests the presence of friendship (peer) network effects in the data, which we account for in our proposed model outlined in Section 3.
2.4 Spatial zip code structure
The adolescents surveyed collectively reside in S = 33 non-overlapping administrative areas known as Zip Codes, which contain very unequal numbers of survey responders. For example, the two Zip Codes with the largest number of adolescents surveyed are El Monte (91732—378 individuals) and South El Monte (91733—271 individuals), while there are 19 instances in which only 1 adolescent surveyed resides in the Zip Code. The spatial configuration of the S = 33 Zip Codes is displayed in the right panel of Figure 1, which shows that while most of the Zip Codes are grouped together in the middle of the region, there are a small number of isolated Zip Codes that are not close to the remaining ones.
The spatial closeness between each pair of Zip Codes is encoded in the model described in the next section by a binary neighbourhood matrix denoted , where the ijth element if Zip Codes (i, j) share a common border and otherwise (and for all i). This border sharing specification is the most commonly used neighbourhood matrix in spatial areal unit modelling (see, e.g. Bivand et al., 2013; Jack et al., 2019), because of its sparsity and simplicity of construction (e.g. it does not have a tuning parameter as the k-nearest neighbours rule does). However, Zip Codes that are isolated share no neighbours under this definition, which means the conditional autoregressive prior outlined in the next section for capturing the spatial correlation has improper full conditional distributions for these Zip Codes. Thus, we make the commonly used adjustment to A for each isolated Zip Code i to rectify this problem, which is to make them a neighbour of the Zip Code j that is geographically closest (e.g. set ). The final neighbourhood structure assumed when fitting the model is displayed by the connecting lines in the right panel of Figure 1, which shows that under this specification the Zip Codes comprise a single connected graph.
We assessed in an exploratory manner whether there are likely to be spatial effects in the data that is whether adolescents in different Zip Codes have differing propensities for partaking in adverse health behaviours. We do this by computing the empirical probability of engaging in a specific health behaviour given that the individual is from a certain Zip Code. However, as previously discussed, the distribution of individuals to the 33 Zip Codes is highly skewed, with a minimum, first quartile, median, third quartile and maximum of 1, 1, 1, 6 and 378 individuals, respectively. Thus, for a meaningful comparison, we only consider the five Zip Codes containing the most individuals, which are Temple City (91780—31 individuals), El Monte (91731—154 individuals), Rosemead (91770—167 individuals), South El Monte (91733—271 individuals) and El Monte, (91732—378 individuals). For these Zip Codes, the observed probabilities of consuming at least one drink of alcohol in the past 30 days are 0.16, 0.27, 0.15, 0.33 and 0.28, respectively. The corresponding probabilities of smoking at least one cigarette in the past 30 days are 0.13, 0.36, 0.17, 0.33 and 0.29 respectively, while for marijuana the probabilities are 0.16, 0.27, 0.15, 0.33 and 0.28. As these empirical probabilities show some variation by Zip Code, spatial effects are a plausible component to include in the model.
3 METHODOLOGY
We propose a novel spatio-network Bayesian hierarchical model for multiple adverse health behaviours, which simultaneously estimates the effects of covariate factors, friendship influence and spatial location.
3.1 Data likelihood model
Here denotes the binary ( - yes; - no) adverse health behaviour for the rth response for the ith individual who lives in the sth spatial unit, where r = 1, …, R( = 3), s = 1, …, S( = 33) and . The probability that individual i from spatial unit s partakes in adverse health behaviour r is denoted by , which is modelled on the logit scale by three separate components. The first is a p × 1 vector of covariates , which is accompanied by a p × 1 vector of fixed effect regression parameters that vary by health behaviour r. The prior for these fixed effect parameters is given by independently for each outcome r, where I is the p × p identity matrix. This specification is chosen to be weakly informative, thus allowing the parameter estimates to be largely informed by the data. The other two components in the systematic part of the model are a friendship network effect and a spatial effect, and these two different levels in the model are described below. The friendship network effect captures correlations between the three health behaviours and between individuals (friends), while the spatial effect captures correlations between neighbouring Zip Codes.
3.2 Friendship network effects
where is the cardinality of the set . Thus, the only non-zero entries in this matrix relate to friendships that one individual has with another individual. Note, this matrix is not necessarily symmetric because it represents a directed rather than an undirected graph. The values of the non-zero elements in this matrix are the reciprocal of the number of friends each individual nominates, which ensures that the matrix is row standardised (each row sums to 1).
Between health behaviour correlation is allowed for each and via the R × R covariance matrix Σ, which is assigned a weakly informative inverse Wishart prior distribution, which allows the estimation to mainly be informed by the data. We assume that and share the same covariance matrix Σ for convenience.
3.3 Spatial effects
Section 2.4 suggests there may be spatial effects in the data, which we model using spatially correlated random effects. These random effects are assigned a conditional autoregressive (CAR) prior distribution, which is the most common approach to modelling spatial correlation in areal unit data (see for example Banerjee et al., 2004). As our adverse health behaviour response is multivariate, a multivariate CAR model could be adopted to model both spatial and between health behaviour correlations. Multivariate CAR type models are an active research area, and numerous different approaches have been proposed including Gelfand and Vounatsou (2003), Jin et al. (2007), Martinez-Beneito (2013) and MacNab (2016). However, in this paper, we model the correlations between the three adverse health behaviours via the friendship network component of the model as described above, and thus modelling these correlations a second time may cause parameter identifiability issues in the model. In fact, as we show in the next section, the friendship network effects are a much more important driver of adverse health behaviours than the spatial effect, so the between behaviour correlations are more prominently captured in that component of the model.
Therefore, instead we specify independent CAR models for each adverse health behaviour r, which are specified as a prior distribution for a vector of spatial random effects for the S = 33 Zip Codes. Spatial correlation is induced into these random effects through the spatial neighbourhood matrix A described in Section 2.4, which is defined by the commonly used border sharing rule. We note here that as the spatial correlation structure is defined by A, all inferences about this part of the model are conditional on the choice of A. Following a comparative study of different CAR priors by Lee (2011), we use the CAR prior proposed by Leroux et al. (2000) due to its consistent superior performance. This prior has the joint distribution for each response r, where 1 is an S × 1 vectors of ones, I is the S × S identity matrix and diag(A1) is a diagonal matrix with diagonal elements obtained by the matrix product A1. Thus the joint distribution of the spatial random effects for all three health behaviours is a zero-mean multivariate normal distribution, whose covariance matrix is block diagonal with three S × S blocks given by for r = 1, 2, 3.
Here denotes the vector of S − 1 spatial random effects for outcome r excluding . In this prior, the spatial dependence parameter is assigned a non-informative prior on the unit interval, and if , the model simplifies to the intrinsic CAR prior for strong spatial correlation proposed by Besag et al. (1991) because the conditional expectation is the mean of the random effects in neighbouring areas. In contrast, if , it is trivial to see that the random effects are independent. Finally, a weakly informative (large variance) half normal prior centred on zero is specified for the spatial standard deviation as suggested by Gelman (2006), which again lets the data play the dominant role in the estimation of its value.
3.4 Inference and software
Posterior inference from the model was obtained using Markov chain Monte Carlo (MCMC) simulation, including both Gibbs sampling and Metropolis-Hastings steps. Software to implement the MCMC algorithm was written in C++ using the R package Rcpp (Eddelbuettel & François, 2011), and is then made user friendly for future users by providing an R (R Core Team, 2013) wrapper function. The software is freely available to download from https://github.com/GNG3/modelSoftwareMVBSN.
The model described above, as well as simplifications of it, are fitted to the study data outlined in Section 2, and the results are presented in the next section. In all cases inference is based on two parallel Markov chains each run for 400,000 iterations, 200,000 of which were discarded as burn-in and the 200,000 post-burn-in iterations were thinned by 20 on the basis of RAM limitations and creating samples which are less correlated. Convergence of the Markov chains was assessed by examining trace plots of the posterior samples for a selection of parameters, the Geweke diagnostic (Geweke, 1992) and the Gelman–Rubin statistic (Gelman & Rubin, 1992), and in all cases, the samples appear to have converged. Therefore, final inference is based on 20,000 MCMC samples for each parameter, 10,000 from each chain.
4 RESULTS FROM THE STUDY
We fit eight different models to the survey data, which allows us to examine the relative importance of covariate effects, friendship network effects and spatial effects in explaining an adolescents’ propensity to partake in adverse health behaviours. These eight models are denoted to and contain all possible combinations of the three different model components, ranging from which only contains an intercept term through to which is the full model given by Equation (1). A summary of the components included in each model is given in Table 2 for ease of reference. The covariates used in these models include the categorical variables gender, exam grades and school, which are summarised in Section 2.2.
4.1 Model comparison
The overall fit of each model is summarised in Table 2, which displays the deviance information criterion (DIC, Spiegelhalter et al., 2002) and the effective number of independent parameters (). A comparison of the single component models , and to the null (intercept only) model shows that the inclusion of the friendship network component leads to the greatest reduction in DIC compared to the intercept only model, as the DIC goes from 3821 to 2904, a reduction of 917. The sole inclusion of covariates has just under half this impact with a DIC reduction of 358, while the sole inclusion of a spatial component leads to a DIC reduction of just 23. Adding in the covariates () and then additionally the spatial component () to the friendship network model improves the fit to the data but only marginally, with the DIC reductions compared to the network only model () being only 13 and 21, respectively. Thus, while the full model with all three components has the lowest DIC value, the impact of adding in the covariates and the spatial components are small once the friendship network effects are included.
Model . | Covariates . | Space . | Network . | DIC . | . |
---|---|---|---|---|---|
– | – | – | 3821 | 3.0 | |
✓ | – | – | 3463 | 30.1 | |
– | ✓ | – | 3798 | 22.1 | |
– | – | ✓ | 2904 | 427.5 | |
✓ | ✓ | – | 3463 | 29.9 | |
✓ | – | ✓ | 2891 | 408.6 | |
– | ✓ | ✓ | 2907 | 417.7 | |
✓ | ✓ | ✓ | 2883 | 413.1 |
Model . | Covariates . | Space . | Network . | DIC . | . |
---|---|---|---|---|---|
– | – | – | 3821 | 3.0 | |
✓ | – | – | 3463 | 30.1 | |
– | ✓ | – | 3798 | 22.1 | |
– | – | ✓ | 2904 | 427.5 | |
✓ | ✓ | – | 3463 | 29.9 | |
✓ | – | ✓ | 2891 | 408.6 | |
– | ✓ | ✓ | 2907 | 417.7 | |
✓ | ✓ | ✓ | 2883 | 413.1 |
Model . | Covariates . | Space . | Network . | DIC . | . |
---|---|---|---|---|---|
– | – | – | 3821 | 3.0 | |
✓ | – | – | 3463 | 30.1 | |
– | ✓ | – | 3798 | 22.1 | |
– | – | ✓ | 2904 | 427.5 | |
✓ | ✓ | – | 3463 | 29.9 | |
✓ | – | ✓ | 2891 | 408.6 | |
– | ✓ | ✓ | 2907 | 417.7 | |
✓ | ✓ | ✓ | 2883 | 413.1 |
Model . | Covariates . | Space . | Network . | DIC . | . |
---|---|---|---|---|---|
– | – | – | 3821 | 3.0 | |
✓ | – | – | 3463 | 30.1 | |
– | ✓ | – | 3798 | 22.1 | |
– | – | ✓ | 2904 | 427.5 | |
✓ | ✓ | – | 3463 | 29.9 | |
✓ | – | ✓ | 2891 | 408.6 | |
– | ✓ | ✓ | 2907 | 417.7 | |
✓ | ✓ | ✓ | 2883 | 413.1 |
The results in Table 2 show that the effective number of independent parameters went down for compared to and , despite the former model being the most complex in terms of its parameterisation. The reason for this is that in the full model the variation in the data is jointly modelled by all three components, whereas in , for example only the network component is included. Thus, in , the network component is having to model less of the variation in the data compared to , due to the covariates and to a much lesser degree the spatial effect modelling some of this variation. This results in a reduction in the effective number of independent parameters for the network component in model compared to due to a reduction in the variation in the random effects , which thus causes the reduced . The remainder of this section present the results relating to the full model , so that the effects of all three components can be observed.
4.2 Model fit
In order to confirm that the model fits the data adequately, we simulate 1,000 trivariate samples from the posterior predictive distribution , where y denotes the observed data. As both are binary this posterior predictive check involves computing the probability that the observed data matches the simulated data generated from the posterior predictive distribution. Averaging over all individuals i, spatial units s, health behaviour r and posterior predictive samples j, the posterior predictive probability , suggesting that the model fits the data relatively well as it generates simulated data that are similar to the real data. The corresponding health behaviour-specific values are , and , suggesting that the model fits the marijuana response slightly better than the other two.
Additionally, Table 3 provides the posterior means and 95% credible intervals for the between health behaviour correlations from the friendship network component of the model, which allows us to examine the appropriateness of modelling all three health behaviours jointly. These correlations are captured in Σ, and for example, the correlation between alcohol and cigarettes is computed by . The table shows that the correlations are very high and close to one for each pair of adverse health behaviours, with posterior means ranging between 0.955 (alcohol and marijuana) and 0.975 (cigarettes and marijuana). These strong correlations thus support the use of a joint modelling approach for our adverse health behaviours. Finally, the posterior samples of these network correlation parameters yield , suggesting that the correlation between the cigarette and marijuana responses is likely to be greater than both the correlations between the alcohol and cigarette responses and the alcohol and marijuana responses.
Estimates and 95% credible intervals for the between health behaviour correlations
Adverse health behaviours . | Estimated correlation . |
---|---|
- alcohol and cigarettes | 0.956 (0.808, 0.996) |
- alcohol and marijuana | 0.955 (0.792, 0.997) |
- cigarettes and marijuana | 0.975 (0.890, 0.998) |
Adverse health behaviours . | Estimated correlation . |
---|---|
- alcohol and cigarettes | 0.956 (0.808, 0.996) |
- alcohol and marijuana | 0.955 (0.792, 0.997) |
- cigarettes and marijuana | 0.975 (0.890, 0.998) |
Estimates and 95% credible intervals for the between health behaviour correlations
Adverse health behaviours . | Estimated correlation . |
---|---|
- alcohol and cigarettes | 0.956 (0.808, 0.996) |
- alcohol and marijuana | 0.955 (0.792, 0.997) |
- cigarettes and marijuana | 0.975 (0.890, 0.998) |
Adverse health behaviours . | Estimated correlation . |
---|---|
- alcohol and cigarettes | 0.956 (0.808, 0.996) |
- alcohol and marijuana | 0.955 (0.792, 0.997) |
- cigarettes and marijuana | 0.975 (0.890, 0.998) |
4.3 Covariate effects
Table 4 displays the estimated covariate effects (posterior means) and 95% credible intervals for each adverse health behaviour, and all results are presented as odds ratios relative to the baseline level of the factor (the first one in the table denoted by a ‘–’). The table shows that in comparison to females, the baseline level, males had a significantly reduced odds of consuming alcohol in the past 30 days, with an estimated odds ratio of 0.57. In contrast, the 95% credible intervals for the male covariate relating to the cigarette and marijuana responses show no statistically significant gender effect.
Summary of the covariate effects as odds ratios and selected other parameters from model
. | Alcohol . | Cigarettes . | Marijuana . |
---|---|---|---|
Covariates | |||
Female | – | – | – |
Male | 0.57 (0.39, 0.83) | 1.31 (0.86, 1.99) | 0.91 (0.56, 1.46) |
A's | – | – | – |
A's and B's | 2.49 (1.10, 6.12) | 1.41 (0.61, 3.34) | 2.48 (0.86, 7.82) |
B's | 2.66 (0.91, 7.85) | 0.68 (0.19, 2.21) | 2.35 (0.59, 9.55) |
B's and C's | 4.16 (1.86, 10.01) | 2.60 (1.16, 6.07) | 5.73 (2.09, 17.04) |
C's or lower | 8.05 (3.61, 19.74) | 5.46 (2.47, 12.46) | 14.66 (5.47, 45.05) |
School 1 | – | – | – |
School 2 | 0.96 (0.35, 2.41) | 1.12 (0.35, 3.38) | 1.07 (0.28, 3.79) |
School 3 | 1.42 (0.48, 3.85) | 1.02 (0.27, 3.33) | 1.09 (0.24, 4.40) |
School 4 | 0.38 (0.12, 1.05) | 0.78 (0.20, 2.76) | 0.47 (0.10, 2.05) |
School 5 | 1.07 (0.29, 3.56) | 0.79 (0.15, 3.49) | 0.80 (0.11, 4.79) |
Space | |||
0 (0, ) | 0 (0, ) | 0 (0, ) | |
0.419 (0.02, 0.92) | 0.415 (0.02, 0.92) | 0.417 (0.02, 0.92) | |
Network | |||
6.21 (3.29, 11.01) | 10.32 (5.49, 19.59) | 14.92 (7.12, 27.76) |
. | Alcohol . | Cigarettes . | Marijuana . |
---|---|---|---|
Covariates | |||
Female | – | – | – |
Male | 0.57 (0.39, 0.83) | 1.31 (0.86, 1.99) | 0.91 (0.56, 1.46) |
A's | – | – | – |
A's and B's | 2.49 (1.10, 6.12) | 1.41 (0.61, 3.34) | 2.48 (0.86, 7.82) |
B's | 2.66 (0.91, 7.85) | 0.68 (0.19, 2.21) | 2.35 (0.59, 9.55) |
B's and C's | 4.16 (1.86, 10.01) | 2.60 (1.16, 6.07) | 5.73 (2.09, 17.04) |
C's or lower | 8.05 (3.61, 19.74) | 5.46 (2.47, 12.46) | 14.66 (5.47, 45.05) |
School 1 | – | – | – |
School 2 | 0.96 (0.35, 2.41) | 1.12 (0.35, 3.38) | 1.07 (0.28, 3.79) |
School 3 | 1.42 (0.48, 3.85) | 1.02 (0.27, 3.33) | 1.09 (0.24, 4.40) |
School 4 | 0.38 (0.12, 1.05) | 0.78 (0.20, 2.76) | 0.47 (0.10, 2.05) |
School 5 | 1.07 (0.29, 3.56) | 0.79 (0.15, 3.49) | 0.80 (0.11, 4.79) |
Space | |||
0 (0, ) | 0 (0, ) | 0 (0, ) | |
0.419 (0.02, 0.92) | 0.415 (0.02, 0.92) | 0.417 (0.02, 0.92) | |
Network | |||
6.21 (3.29, 11.01) | 10.32 (5.49, 19.59) | 14.92 (7.12, 27.76) |
Summary of the covariate effects as odds ratios and selected other parameters from model
. | Alcohol . | Cigarettes . | Marijuana . |
---|---|---|---|
Covariates | |||
Female | – | – | – |
Male | 0.57 (0.39, 0.83) | 1.31 (0.86, 1.99) | 0.91 (0.56, 1.46) |
A's | – | – | – |
A's and B's | 2.49 (1.10, 6.12) | 1.41 (0.61, 3.34) | 2.48 (0.86, 7.82) |
B's | 2.66 (0.91, 7.85) | 0.68 (0.19, 2.21) | 2.35 (0.59, 9.55) |
B's and C's | 4.16 (1.86, 10.01) | 2.60 (1.16, 6.07) | 5.73 (2.09, 17.04) |
C's or lower | 8.05 (3.61, 19.74) | 5.46 (2.47, 12.46) | 14.66 (5.47, 45.05) |
School 1 | – | – | – |
School 2 | 0.96 (0.35, 2.41) | 1.12 (0.35, 3.38) | 1.07 (0.28, 3.79) |
School 3 | 1.42 (0.48, 3.85) | 1.02 (0.27, 3.33) | 1.09 (0.24, 4.40) |
School 4 | 0.38 (0.12, 1.05) | 0.78 (0.20, 2.76) | 0.47 (0.10, 2.05) |
School 5 | 1.07 (0.29, 3.56) | 0.79 (0.15, 3.49) | 0.80 (0.11, 4.79) |
Space | |||
0 (0, ) | 0 (0, ) | 0 (0, ) | |
0.419 (0.02, 0.92) | 0.415 (0.02, 0.92) | 0.417 (0.02, 0.92) | |
Network | |||
6.21 (3.29, 11.01) | 10.32 (5.49, 19.59) | 14.92 (7.12, 27.76) |
. | Alcohol . | Cigarettes . | Marijuana . |
---|---|---|---|
Covariates | |||
Female | – | – | – |
Male | 0.57 (0.39, 0.83) | 1.31 (0.86, 1.99) | 0.91 (0.56, 1.46) |
A's | – | – | – |
A's and B's | 2.49 (1.10, 6.12) | 1.41 (0.61, 3.34) | 2.48 (0.86, 7.82) |
B's | 2.66 (0.91, 7.85) | 0.68 (0.19, 2.21) | 2.35 (0.59, 9.55) |
B's and C's | 4.16 (1.86, 10.01) | 2.60 (1.16, 6.07) | 5.73 (2.09, 17.04) |
C's or lower | 8.05 (3.61, 19.74) | 5.46 (2.47, 12.46) | 14.66 (5.47, 45.05) |
School 1 | – | – | – |
School 2 | 0.96 (0.35, 2.41) | 1.12 (0.35, 3.38) | 1.07 (0.28, 3.79) |
School 3 | 1.42 (0.48, 3.85) | 1.02 (0.27, 3.33) | 1.09 (0.24, 4.40) |
School 4 | 0.38 (0.12, 1.05) | 0.78 (0.20, 2.76) | 0.47 (0.10, 2.05) |
School 5 | 1.07 (0.29, 3.56) | 0.79 (0.15, 3.49) | 0.80 (0.11, 4.79) |
Space | |||
0 (0, ) | 0 (0, ) | 0 (0, ) | |
0.419 (0.02, 0.92) | 0.415 (0.02, 0.92) | 0.417 (0.02, 0.92) | |
Network | |||
6.21 (3.29, 11.01) | 10.32 (5.49, 19.59) | 14.92 (7.12, 27.76) |
In contrast, the effect of exam performance is much more consistent than that of gender, with decreasing exam performance being significantly associated with higher odds of partaking in each adverse health behaviour. Here the baseline level is mostly A's, and decreasing the grade category almost always exhibits an increased and significant odds ratio. For example, adolescents who scored mostly C's or below have significant odds ratios of 8.05 (alcohol), 5.46 (cigarettes) and 14.66 (marijuana), when compared to the baseline mostly A's category. Finally, the table shows that after accounting for all the other components in the model, there are no statistically significant school effects for schools 2, 3, 4 and 5 when compared to the reference level school 1, with all 95% credible intervals across the three responses containing the null odds ratio of 1.
4.4 Peer network effects
Table 4 provides the posterior means and 95% credible intervals for the variances relating to the network random effects in the model, which quantify the variation among the individual friendship network effects. The posterior means for alcohol, cigarettes and marijuana are respectively 6.21, 10.32 and 14.92, suggesting that the greatest level of variation is for marijuana. This finding is confirmed by the posterior probability that , suggesting a very clear size ordering among these variances with the variance relating to the marijuana response being the largest.
The isolation random effects for not nominating a friend are denoted by , and , and on the odds ratio scale their estimates and 95% credible intervals are given by: alcohol—2.31 (1.34, 4.06); cigarettes—3.32 (1.88, 6.11); and marijuana—4.57 (2.39, 9.39). All these estimates and 95% credible intervals are greater than one, suggesting that being isolated from others (i.e. not nominating a friend) increases the likelihood of drinking alcohol, smoking cigarettes and having used marijuana. The posterior mean marijuana isolation effect is the largest of the three but not significantly so, as all three 95% credible intervals overlap. However, that said there is relatively strong evidence of a clear size ordering in these isolation effects, because the model produced the following posterior probabilities: , and . Thus, it appears that isolation has the largest effect on the marijuana response.
Figure 3 displays the 95% credible intervals for the individual friendship random effects and the isolation effect relating to each response, namely (alcohol), (cigarettes) and (marijuana). The effects are ordered by posterior mean on the horizontal axis, and those in black are not significantly different from zero at the 5% level. The instances in green are significantly different from zero at the 5% level and contain only negative values. There was only one case of this for each of the three responses, all attributable to the same individual. Thus, holding everything else equal, having nominated this individual as a friend was observed to have decreased the likelihood of drinking alcohol, smoking cigarettes and having used marijuana. Those in red are significantly different from zero at the 5% level and contain only positive values. There were 38, 41 and 43 instances of this relating to the alcohol, cigarette and marijuana response, respectively, and nominating these individuals as friends increase one's propensity to smoke, drink and use marijuana.

The 95% credible intervals for the network effects relating to each response, (alcohol—top), (cigarettes—middle) and (marijuana—bottom). These effects are ordered by size, and those non-significant effects with 95% credible intervals that contain 0 are shown in black, those that contain values strictly less than 0 are shown in green, and those that contain values strictly greater than 0 are shown in red.
4.5 Spatial effects
The spatial standard deviation () and dependence () parameters are displayed in Table 4. The table shows that the spatial effect is essentially non-existent, as the posterior means for the conditional standard deviations are almost zero for all three adverse health behaviours, and the upper limit of the 95% credible interval is also very close to zero. This is further emphasised by the posterior means for the sets of spatial random effects , which range between and . This lack of a spatial effect after covariate and friendship network effects have been accounted for confirms the overall model fit results from Table 2, which shows that the DIC values for models and are almost identical.
5 DISCUSSION
This paper has proposed a novel spatio-network model for binary multiple health behaviour data, which jointly captures the potential effects of covariate factors, spatial location, friendship network effects and within-individual correlations between outcomes. The model is fitted in a Bayesian paradigm using MCMC simulation, and software in the form of R code is available through GitHub at https://github.com/GNG3/modelSoftwareMVBSN for the purpose of reproducible research. The main advantage of our model over existing alternatives is its flexibility in being able to capture this wide range of drivers of adolescent health behaviours, whereas existing models only account for a subset of them. This has allowed us to examine which of these drivers are the most important for explaining an adolescent's propensity to drink alcohol or smoke cigarettes or marijuana, and in our California-based study we have obtained a number of interesting findings.
Our main finding is that peer effects play a large role in determining whether adolescents partake in adverse health behaviours, as their addition to the null model leads to the greatest reduction in DIC when compared to just adding either the covariates or spatial component. Furthermore, once friendship effects are included in the model, there is only a small improvement in model fit when incorporating the covariates and/or spatial component. In future work, it would be interesting to see whether the relative importance of the three components is mirrored in terms of their out-of-sample predictive ability, for example using a cross-validation type approach. For the covariates, the only consistently significant effect on the participation in adverse health behaviours was school exam performance, with students having poorer exam results observed to be more likely to partake in these behaviours. In contrast, the spatial effect was essentially non-existent.
Our second main finding is that the effect that a friend has on an adolescent is strongly correlated across the three binary responses, with estimated pairwise correlations ranging between 0.956 and 0.975. These correlations support the notion of co-occurrence of risky behaviours in adolescents found by Hale and Viner (2016). Among the three pairs of jointly modelled friendship random effects, the results show that the pair relating to the cigarette and marijuana responses are the most correlated, although the correlations for the remaining pairs are only slightly smaller.
There are a number of avenues of future work that naturally result from this paper. The first of these is the development of a comprehensive R package that allows users to fit a range of univariate and multivariate spatio-network models, which will greatly increase the usability and hence influence of the general model class proposed here, allowing social scientists and other researchers to apply these models to their own data. Second, as the focus of the present study was binary health behaviours, the model developed was based on a binary logistic regression structure. However, spatio-network data are not necessarily binary, and we thus intend to extend the model class to include continuous and count based responses, with exam grade point averages being a natural example of a continuous response that may exhibit peer and spatial effects. A further avenue of future work is to extend the spatial model if, unlike the present study, the data suggest that space has a sizeable effect on the outcome being modelled. Such extensions could be to examine the effects of changing the specification of the neighbourhood matrix to see what impact this has on the results, as well as allowing for between outcome correlations via a multivariate CAR type model. Finally, a spatio-network interaction involving the sets of spatial and friendship network random effects could be explored, as it may be of interest to study whether friendship effects differ depending on the Zip Code in which an individual lives.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the helpful comments of the editorial team and two reviewers, which have improved the content of and motivation for this work. This publication was supported by the University of Glasgow's Lord Kelvin/Adam Smith (LKAS) PhD Scholarship.
REFERENCES
Author notes
Funding information University of Glasgow's Lord Kelvin/Adam Smith (LKAS) PhD Scholarship