Multivariate Bayesian spatio-temporal P-spline models to analyze crimes against women

Summary Univariate spatio-temporal models for areal count data have received great attention in recent years for estimating risks. However, models for studying multivariate responses are less commonly used mainly due to the computational burden. In this article, multivariate spatio-temporal P-spline models are proposed to study different forms of violence against women. Modeling distinct crimes jointly improves the precision of estimates over univariate models and allows to compute correlations among them. The correlation between the spatial and the temporal patterns may suggest connections among the different crimes that will certainly benefit a thorough comprehension of this problem that affects millions of women around the world. The models are fitted using integrated nested Laplace approximations and are used to analyze four distinct crimes against women at district level in the Indian state of Maharashtra during the period 2001–2013.


INTRODUCTION
The statistical toolkit for the analysis of spatial and spatio-temporal areal count data has been enriched during recent years with relevant advances in model proposals, algorithms for inference, and the realm of applications (see e.g., Lawson and others, 2016;Martínez-Beneito and Botella-Rocamora, 2019). One of the goals of these techniques is smoothing standardized incidence/mortality ratios or crude rates to uncover geographical patterns and temporal trends of the phenomenon under study. These techniques have experienced a great success to study incidence and mortality of different diseases (mainly cancer), and very recently, they have proven their utility to study crimes against women (see Vicente and others, 2020a).
where E itj denotes the expected number of cases computed using indirect standardization as E itj = n it m j . Here, n it is the female population at risk in area i and time t, and m j is the overall rate of crime j in the total area of study (state of Maharashtra in this article) and the whole period. In this article, we propose P-spline models to smooth the spatial and temporal patterns. That is where α j is the intercept for the jth crime, x 1i and x 2i are the longitude and latitude of the centroid of the ith area; x 3t indicates the time point t, and f j (x 1i , x 2i ) and f j (x 3t ) are a smooth spatial surface and a smooth temporal trend respectively for the jth crime that are well approximated using P-splines. Finally, δ itj is the spatio-temporal interaction term for crime j. The smooth surface for each crime is specified as f j (x 1 , x 2 ) = B s ψ (j) , where f j (x 1 , x 2 ) = (f j (x 11 , x 21 ), . . . , f j (x 1I , x 2I )) , and B s = B 2 B 1 is a two-dimensional B-spline basis of dimension I × k 1 k 2 arising from the row-wise Kronecker product, denoted by , of the marginal B-spline bases for longitude, B 1 , and latitude B 2 (see Eilers and others, 2006). More precisely, the row-wise Kronecker product is defined as B 2 B 1 = (B 2 ⊗ 1 k 1 ) (1 k 2 ⊗ B 1 ), where 1 k 1 and 1 k 2 are column vectors of ones of lengths k 1 and k 2 , respectively; k 1 and k 2 are the number of columns of B 1 and B 2 , respectively, and the symbol is the element-wise matrix product. Note that k 1 and k 2 are the number of columns of the marginal bases B 1 and B 2 , respectively, and depend on the number of knots and the degree of the polynomials used to construct these bases. Finally, the vectors ψ (j) = (ψ j 1 , . . . , ψ j k 1 k 2 ) , j = 1, . . . , J denote the set of coefficients of the spatial B-splines for each crime. To achieve smoothness, the following prior distribution with Gaussian Kernel is marginally considered for the coefficients of the two-dimensional B-splines p(ψ (j) |λ 1 , λ 2 ) ∝ exp − 1 2 ψ (j) P s ψ (j) . (2.2) Here, P s = λ 1 (I k 2 ⊗ P 1 ) + λ 2 (P 2 ⊗ I k 1 ), P 1 = D 1 D 1 , P 2 = D 2 D 2 , I k 1 and I k 2 are k 1 × k 1 and k 2 × k 2 identity matrices, λ 1 and λ 2 control the amount of smoothing in longitude and latitude, and D 1 and D 2 are difference matrices (of order 1 or 2) of dimension k 1 ×k 1 and k 2 ×k 2 , respectively. An in-depth explanation of this penalty can be found in Goicoa and others (2016). Similarly, the smooth temporal trend for each crime is specified as f j (x 3 ) = B 3 γ (j) , where now f j (x 3 ) = (f j (x 31 ), . . . , f j (x 3T )) , B 3 is the temporal B-spline basis of dimension T × k 3 , and k 3 depends on the number of knots and the degree of the polynomials in the basis. The vectors γ (j) = (γ j 1 , . . . , γ j k 3 ) , j = 1, . . . , J represent the coefficients of the temporal B-spline basis for each crime. Here, the following prior distribution with Gaussian kernel is marginally considered where P 3 = λ 3 D 3 D 3 , λ 3 controls the temporal smoothing and D 3 is a difference matrix (of order 1 or 2) of dimension k 3 × k 3 . According to Lang and Brezger (2004), we would like to emphasize that the prior distributions in (2.2) and (2.3) are the Bayesian analogues to the difference penalty on the coefficients proposed by Eilers and Marx (1996) to achieve smoothness. This is clear as the matrices P l = D l D l , l = 1, 2, 3 are the structure matrices of a first (RW1) or second (RW2) order random walk (see Rue and Held, 2005, pp. 95 and 110). Note that (2.2) is the combination of random walk priors on the B-spline coefficients of longitude and latitude (see Belitz and Lang, 2008). For simplicity, we will refer to RW1 or RW2 prior distributions on the coefficients of the spatial and temporal B-splines.
To take account of the potential relationships between the different crimes, correlations among the P-spline coefficients (spatial or temporal) of the different crimes are introduced in the model. Then, the following J × J covariance matrices between the sets of spatial and temporal P-splines coefficients are assumed Cov(ψ (j) , ψ (k) )) j,k=1,...,J ,γ = (Cov(γ (j) , γ (k) )) j,k=1,...,J .
The diagonal elements of ψ are ( ψ ) jj = σ 2 ψj , j = 1, . . . , J , and the off-diagonal elements are given by ( ψ ) jk = ρ ψjk σ ψj σ ψk , where ρ ψjk is the correlation between the coefficients of the spatial P-splines for crimes j and k. Similarly, the diagonal elements of γ are ( γ ) jj = σ 2 γ j , j = 1, . . . , J , and the off-diagonal elements are expressed as ( γ ) jk = ρ γ jk σ γ j σ γ k , where now, ρ γ jk represents the correlation between the coefficients of the temporal P-splines for crimes j and k. Finally, IT ) is the spatio-temporal interaction for the jth crime, j = 1 . . . , J . For the interaction term, the following prior distribution is assumed where M j is a precision matrix that can take different forms according to the four types of interactions defined by Knorr-Held (2000). For the Type I interaction, M j = τ j I IT , where I IT is the IT × IT identity matrix, and all the elements of the interaction are independent. In Type II interaction, where I I is the I × I identity matrix, and Q T is the precision matrix of a random walk of first or second order (see Rue and Held, 2005, pp. 95 and 110). This means that the elements of δ (j) are structured in time but not in space. In Type III interactions, the elements of δ (j) are structured in space but not in time.
Then, M j = τ j (I T ⊗ Q I ), where now I T is the T × T identity matrix, and Q I is the spatial neighborhood matrix defined as Q I (im) = −1 if areas i and m are neighbors and 0 otherwise, and the ith diagonal element Q I (ii) is the number of neighbors of the ith area. Finally, in Type IV interaction, the elements of δ (j) are structured in space and time, and M j = τ j (Q T ⊗ Q I ). Note that unlike the main spatial and temporal terms, the spatio-temporal interaction is not modelled with P-splines but with conditional autoregressive (CAR) priors. The interaction effect can be seen as a residual term which generally captures a small portion of variability, and hence using three-dimensional P-splines for this term could unnecessarily complicate the model. Similarly, and to simplify the proposal, we do not consider any covariance estructure between the interaction terms of the different crimes, that is (Cov(δ (j) , δ (k) )) j,k=1,...,J = I J , where I J is a J × J identity matrix. Our approach is somehow based on the multivariate proposal of Aitchison and Ho (1989) to capture overdispersion in multivariate count data, though these authors do not model spatio-temporal data.
For notation purposes and to incorporate the dependencies between different crimes in the model, we stack the set of spatial and temporal P-splines' coefficients for each crime one under the other as = (ψ (1) , . . . , ψ (J ) ) and = (γ (1) , . . . , γ (J ) ) . Similarly, the interaction terms are rearranged one under the other as = (δ (1) , . . . , δ (J ) ) . Using this notation, our multivariate P-spline model (2.1) can be expressed in matrix form as where α = (α 1 , . . . α J ) , R = (R 1 , . . . , R J ) , R j = (R 11,j , . . . , R I 1,j , . . . , R 1T ,j , . . . , R IT ,j ), j = 1, . . . , J , and 1 I and 1 T are column vectors of ones of length I and T , respectively. Then, once the between-crime dependencies are incorporated into the model, the resulting prior distributions with Gaussian kernel for the coefficients of the spatial P-splines , the coefficients of the temporal P-splines , and the interaction terms are (2.5) Priors (2.5) have two goals: on one hand the matrices P s and P 3 are introduced to produce smooth estimates of the spatial patterns and the temporal trends, respectively, and on the other hand the matrices ψ and γ cope with the between-crime dependencies. Similarly, the within-crimes remaining spatio-temporal variability is captured through matrices M j , and no between-crime covariance structure is considered for this term. The priors (2.5) depend on a set of hyperparameters given by the elements of ψ and λ 1 and λ 2 (for ), the elements of γ and λ 3 (for ), and τ 1 , . . . , τ J (for ). It is worth remarking that the λ 3 parameter in the precision matrix of the temporal P-splines is not identifiable as it can be subsumed in the matrix −1 γ (or σ 2 3 = 1/λ 3 is subsumed in γ ). Similarly, only the ratio λ 1 /λ 2 (or λ 2 /λ 1 ) is identifiable as one of such parameters can be subsumed in the matrix matrix −1 ψ . This is clear if one express the precision matrix of the spatial P-splines as P s = λ 2 λ 1 λ 2 (I k 2 ⊗ P 1 ) + (P 2 ⊗ I k 1 ) . Consequently, we set λ 2 and λ 3 equal to 1.

MODEL FITTING, HYPERPRIOR DISTRIBUTIONS, AND IDENTIFIABILITY ISSUES
In this article, the models are fitted using INLA (Rue and others, 2009) with the R-INLA package (Lindgren and Rue, 2015). INLA has become very popular as it avoids long computing time in comparison to MCMC techniques. INLA relies on integrated nested Laplace approximations and numerical integration for approximate Bayesian inference. It is particularly designed for latent Gaussian models where the latent field is Gaussian, the response is non-Gaussian, and the posterior marginals cannot be obtained in closed form. One of the main advantages of INLA is that many of the models in the literature are already included in the package R-INLA, and others can be implemented by means of generic functions with some extra-programming work. Our multivariate proposal based on P-splines is not directly available in R-INLA and here we implement it using the "rgeneric" construction in R-INLA. In particular, we build our model based on the function "inla.rgeneric.IMCAR.model" from the R package INLAMSM (https://CRAN.R-project.org/package=INLAMSM) developed by Palmí-Perales and others (2021) and designed for fitting multivariate extensions of intrinsic conditional autoregressive models (Besag, 1974) (see Section B of the Supplementary material available at Biostatistics online, for details about the implementation of our model). To fit the model in R-INLA we first define the likelihood (Poisson in our proposal as we model count data). The second stage is to define the latent Gaussian fields, which in our case are given by (α , , , ) . And the third stage is to set the hyperparameters that control the precision matrices in the latent fields. In our model the set of hyperparameters are the elements of ψ , γ , λ 1 , and the precision parameters for the interaction terms (τ 1 , . . . , τ J ).
The choice of prior distributions for the hyperparameters is an important issue in Bayesian statistics and some ideas are discussed here. Regarding the covariance matrices of the coefficients of the spatial and temporal P-splines for the different crimes, a Wishart distribution is considered to obtain valid covariance structures (Kuismin and Sillanpää, 2016). In particular, ψ ∼ Wishart(υ, σ 2 ψ I J ), and γ ∼ Wishart(υ, σ 2 γ I J ), where J indicates the number of crimes, υ represents the degrees of freedom, and σ 2 ψ and σ 2 γ are fixed values. The prior mean is υσ 2 I J and the greater the value of σ 2 , the less informative the prior is (see Chung and others, 2015, for a justification of this prior). The choice of priors on the covariance matrices is a challenge because some parameters are not identifiable and we need to set the degrees of freedom equal to υ = 2J + 1 to make the prior more informative (see Palmí-Perales and others, 2021). Regarding the smoothing/precision parameters λ's in the precision matrices of the P-spline coefficients, a uniform distribution in the interval (0, 100) is considered for 1/ √ λ 1 . As it has been said previously, the parameters λ 2 and λ 3 are set equal to one as they are subsumed in the matrices −1 ψ and −1 γ respectively. Also improper uniform distributions in (0, ∞) are considered for the standard deviations σ δ j = 1/ √ τ j of the spatio-temporal interaction (Ugarte and others, 2017). As pointed out by one reviewer, we do not have a formal proof about the posterior propriety of our model. Here we follow the ideas in Gelman (2006), where a noninformative uniform prior density on standard deviation parameters is recommended. Finally, we would like to remark that details on how to implement these prior distributions in R-INLA through our multivariate "rgeneric" construction can be found in Section B of the Supplementary material available at Biostatistics online. Our models incorporate crime-specific intercepts, P-splines for space, P-splines for time, and spatiotemporal interactions of the type proposed by Knorr-Held (2000). As the crime-specific spatial surfaces (spatial P-splines) and the crime-specific temporal trends (temporal P-splines) also include an intercept, identifiability issues arise, and constraints are necessary. All identifiability constraints would yield to the same functions up to additive constants. A possibility is to put sum to zero constraints on the coefficients As pointed out by one reviewer, these sum to zero constraints seem similar to those used in Markov random field models. Though there are some connections between Markov random fields and splines (see e.g., Belitz and Lang, 2008), the reason why these constraints look similar is because the coefficients of the P-splines are modeled as Markov random fields. In particular, the coefficients of the temporal P-splines are modelled with random walks of first or second order. Similarly, the prior for the coefficients of the spatial P-splines combine random walks for longitude and latitude. Though these constraints identify the model, according to Wood and others (2013) linear dependence with the intercept leads to excessively wide credible intervals, and it is recommended to make the smooth functions orthogonal to the intercept, that is, to center the smooth functions Also, the interaction effects overlap with the main spatial and temporal P-spline terms and additional constraints are required (see Goicoa and others, 2018). In particular, for the Type II interaction, the one selected for our real data, the required constraints are T t=1 δ itj = 0, ∀i = i, . . . , I , ∀j = 1, . . . , J if we consider the precision matrix of a first order random walk. Additionally, the λ 3 parameter in the precision matrix of the temporal P-splines is not identifiable as it can be subsumed in the matrix −1 γ (or σ 2 t = 1/λ 3 is subsumed in γ ). Similarly, the parameters σ γ j , j = 1, . . . , J are not identifiable, so inferences on these quantities should be precluded. Though we have set the parameter λ 3 equal to one, this is arbitrary and the parameters cannot be interpreted as standard deviations of the P-spline coefficients. On the contrary, the correlation parameters ρ γ jk are identifiable and can be perfectly interpreted (see Section C of the Supplementary material available at Biostatistics online). The same comments apply to the λ 2 parameter in the precision matrix P s of the spatial P-splines. Though it is set equal to one, the parameters σ ψj , j = 1, . . . , J loose their meaning of standard deviations. However, the correlation parameters ρ ψjk are identifiable. . Namely, rape, assault or criminal force to woman with intent to outrage her modesty, cruelty by husband or relatives of husband, and kidnapping and abduction. In the rest of the article, we will refer to these crimes by the shorter names rape, assault, cruelty, and kidnapping. A succinct description of these crimes is provided in Section D of the Supplementary material available at Biostatistics online. For a complete definition of the crimes, the reader is referred to the IPC.

Descriptive analysis
Maharashtra is located in the middle west of the Indian peninsula, see Figure 1, and according to the 2011 census (see https://www.census2011.co.in), it is the second most highly populated state in the country (112 374 333 inhabitants), surpassed only by Uttar Pradesh. It is also the third largest state of India with a total of 307 713 km 2 . The overall literacy rate is 82.34%, about 8 percentage points over the overall literacy rate in India (74.04%). Similar to all Indian states, male literacy rate (88.38%) is greater than female literacy rate (75.87%), though these figures are well above the Indian male (82.14%) and female (65.46%) literacy rates. Sex ratio (number of females per 1000 males) is 922, slightly lower than the sex ratio in whole India (933). The majority religion (79.83%) is Hindu.
In recent years, the number of crimes against women in Maharashtra has grown tremendously. According to National Crime Record Bureau, NCRB, (https://ncrb.gov.in/crime-in-india), in the study period the reported incidence of crimes against women has doubled, from 12 524 in 2001 to 24 895 in 2013. In terms of rates, the increase is even worse as the overall rate of crimes against women in 2013 (44.9) is about 3.5 higher than in 2001 (12.9).
One of the most endemic form of violence against women in different countries of the world is the abuse of women by their intimate partner. The problem is even more serious as it has one of the highest rates of underreporting (see Visaria, 1999;Heise and others, 1994, and the references therein). This is also the case of India, where rapes and assaults are also underreported because of shame or fear to incomprehension and social rejection (Koss, 1992). Unfortunately, the state of Maharashtra is not an exception. Figure 2 displays a circular barplot with the crude rates (per 100000 women between the age of 15 and 49 years) of the four different crimes considered in this paper by district. Clearly, cruelty by husband or relatives is the crime against women with the highest incidence, followed by assault. During the 13 years of the study period, a total of 21 049 rapes, 47 351 assaults, 12 727 kidnappings, and 88 905 cases of cruelty were committed in this state. Interestingly, differences are found among the districts. In general, the lowest rates of crimes correspond to the most western districts with the exception of Greater Bombay, whereas districts in the central and northeastern part of the state have the highest rates of crimes. Some descriptive statistics regarding the number of cases of the four crimes in the first and last year of the study period are shown in Table 1. Figure 3 displays the evolution of the standardized incidence ratio (SIR) for the four crimes. A SIR over one indicates that the number of cases for one crime in a particular year is greater than expected in comparison with the whole study period. A close inspection of this figure reveals rather flat trends for rape, assault, and cruelty until 2012. Then, a strong rise is observed in 2013 for rape and assault. Regarding kidnapping, a steady growth in SIR is observed along the period, with a pronounced increase in the last year. According to some authors, this increase may be attributed to an improvement in the victim support system (Raj and McDougal, 2014).
To identify any potential relationship between the crimes, we compute Pearson correlations between the overall spatial patterns of each crime. That is, we compute the SIR for each district and crime in the study period and calculate the corresponding correlations. The greatest correlations were found between rape and assault (0.79), rape and kidnapping (0.72), and assault and kidnapping (0.81). The correlation of these three crimes with cruelty are low. Similarly, we compute Pearson correlations between the overall temporal trends shown in Figure 3. Again, the highest correlations are found between rape and assault (0.99), rape and kidnapping (0.87), and assault and kidnapping (0.90). The high correlations between  these three crimes may be explained because all of them are related in a greater or less extent to sexual offences. However, the statistical analysis has to confirm these exploratory findings.

Model fitting using INLA
4.2.1. Model fitting and model selection The multivariate P-spline models proposed in Section 2 are implemented to analyze the incidence of the four crimes in Maharashtra. To overcome identifiability issues, the smooth functions are centered as this proposal provides narrower credible intervals. To fit the models, cubic B-splines with first and second order penalties are used for the spatial and temporal dimensions. For longitude and latitude, 10 equidistant internal knots are considered, and for the temporal dimension, 5 internal knots are chosen (see Ruppert, 2002). The final dimensions of the spatial and temporal Bspline bases are 34 × 144 and 13 × 7, respectively. Previous to fit the multivariate P-spline models, a set of univariate P-spline models for each crime has been run. In particular, different priors combining first-and second-order random walks for the spatial and temporal coefficients have been considered. The four types of interactions defined by Knorr-Held (2000) have also been examined. According to different model selection criteria, the Deviance Information Criterion (DIC) (Spiegelhalter and others, 2002) and the Watanabe Akaike Information Criterion (WAIC) (Watanabe, 2010), and the Logarithmic Score (LS) (Gneiting and Raftery, 2007), a measure of model prediction performance, first-order penalties for space and time together with Type II interactions for the spatio-temporal term were the best options for all crimes (results not shown). Consequently, the joint multivariate P-spline model given in (2.4) is fitted considering a Type II spatio-temporal interaction for all crimes. We also examine RW1 and RW2 prior distributions for the coefficients. Different values of (σ 2 ψ , σ 2 γ ) for the Wishart prior on ψ , and γ are considered: (1,1), (1,10), (10,1), (10,10), and (100,100). The model selection criteria, DIC, WAIC, and LS point towards the combination (1,10) and those are the values considered in this article.
The model accounts for potential relationships between the different crimes through the between-crime correlations among the coefficients of the P-splines (spatial and temporal). Additionally, and given that the temporal trends for some crimes are rather flat, we also consider models without correlation between the coefficients of the temporal P-splines, that is, γ = diag(σ 2 γ 1 , . . . , σ 2 γ J ). Table 2 displays model selection criteria for the multivariate P-spline models fitted with different combinations of RW1 and RW2 priors for the coefficients of the spatial and temporal P-splines. According to DIC, WAIC, and LS, the best candidate is a model accounting for correlation between the coefficients of the spatial and temporal B-splines, and a RW1 prior distribution for the coefficients. This is the model we finally select to analyze the four crimes.
In general, posterior medians of the quantities of interest (relative risks, spatial, temporal, and spatiotemporal patterns) are rather similar to those obtained using univariate models. However, estimates are more precise with the multivariate proposals.
We have computed 95% credible intervals for the relative risks, the spatial pattern, the temporal trends, and the spatio-temporal interaction term for each of the four crimes analyzed in Maharashtra. On average, the selected multivariate P-spline model provides narrower credible intervals for all quantities of interest. The most remarkable point here is that the constraints affect the credible intervals. In particular, using sum to zero constraints on the coefficients of the smooth functions leads to much wider credible intervals than centering the functions.
Computations were run on a twin superserver with four processors, Intel Xeon 6C and 96GB RAM, using the R-INLA version 21.02.23. All models in the article were fitted using the simplified Laplace strategy in INLA, which provides a good compromise between computing time and accuracy. Models were also fitted using the Gaussian strategy in INLA, which is much faster than the simplified Laplace strategy, though less precise. Some small differences were found in model selection criteria, but the same model was selected and results were pretty similar.

Joint analysis of four crimes against women in Maharashtra
In this subsection, the spatiotemporal patterns of the four crimes in the state of Maharashtra are examined using the selected multivariate P-spline model.
The underlying spatial patterns and the global temporal trends may be very informative as similarities between crimes could be detected. The district-specific spatial risk for each crime, exp(f j (x 1i , x 2i )), is related to the idiosyncrasy of the districts. It captures the risk associated to a spatial location, and it may reflect the effect of potential spatial risk factors such as certain traditions or demographic and socio-economic characteristics specific to certain districts or regions in the state of Maharashtra. Posterior medians of the district-specific spatial risk for each crime are displayed in Figure 4. The maps with the exceedance probability, that is, P(exp(f j (x 1i , x 2i )) > 1|O) are shown in Figure E.1 in Section E of the Supplementary material available at Biostatistics online. A Northeast-Southwest gradient (following the largest diagonal axis of the map) can be observed very clearly for rape, assault, and kidnapping, the pattern being smoother for kidnapping. Though some differences exists, the spatial patterns of these three crimes are rather similar. On the other hand, the spatial pattern of cruelty by husband and relatives is different. For this crime, districts with high risk are mainly located in the central part of the map, and a Northwest-Southeast gradient can be envisaged. The estimated posterior medians of the correlations between the coefficients of the spatial P-splines confirm these findings. Table 3 displays these posterior correlations below the main diagonal. Significant correlations are highlighted in bold. The posterior correlation between rape and assault is particularly high with a posterior median of 0.824. This may point towards underlying spatial factors affecting both crimes. The posterior correlations between rape and kidnapping, and between assault and kidnapping are weaker though significant, with estimated posterior medians 0.433 and 0.474, respectively. The spatial pattern for cruelty is different with most of the high risk districts mainly located in the central part of the map. In fact, as we move away from the center, the map is becoming increasingly lighter. The posterior correlations between the coefficients of the spatial P-splines of cruelty and assaults is 0.267, but it is on the verge of loosing the statistical significance. The posterior correlations between the coefficients of the spatial P-splines of cruelty and the rest of crimes are not significant.

Table 3. Estimated correlations (posterior medians and 95% credible intervals) between the spatial P-spline coefficients (below main diagonal) and between the temporal -spline coefficients (above main diagonal). Significant correlations are highlighted in bold.
The global temporal evolution of each crime in Maharashtra is revealed by the crime-specific temporal component, exp(f j (x 3t )), and may reflect if time-referenced events, such as certain policies, changes in Government, or general social transformations affect the incidence of the crimes in different ways. The posterior medians of exp (f j (x 3t )) for each crime and the 95% credible intervals are displayed in Figure 5. Trends for rape and assault are fairly flat from 2001 to 2012 with a marked growth in the last year. The temporal trend for cruelty is also rather flat, but it shows a wave-shape, something that is not observed in rape and assault. An slight upturn is observed at the end of the period. A different behavior is observed for kidnapping, which shows a steady growth throughout the period.
The posterior correlation between the coefficients of the temporal P-splines are displayed above the main diagonal in Table 3. Significant correlations are highlighted in bold. The posterior correlation between the coefficients of the temporal P-splines for rape and assault is high (0.766). Lower correlations are observed between rape and cruelty (0.191) and between cruelty and assault (0.493). Regarding kidnapping, the posterior correlation with assault is moderate-high (0.676), and it is moderate or mild with rape (0.448) and cruelty (0.224).
The interaction term δ itj allows a different time evolution for each area and disease. Although the same type of interaction (Type II) is considered for the four crimes, different precision/variance parameters are allowed for each crime. More precisely, the posterior medians of the standard deviations with a 95% credible interval are 0.110 (0.099, 0.128) for rape, 0.133 (0.117, 0.151) for assault, 0.150 (0.136, 0.165) for cruelty, and 0.184 (0.167, 0.209) for kidnapping. This indicates a different amount of smoothing for Fig. 5. Temporal pattern of incidence risks for rape, assault, cruelty, and kidnapping. each crime. Area-specific temporal trends, that is, the posterior medians of exp(δ itj ), (with 95% credible intervals) are shown in Figure E.2 in Section E of the Supplementary material available at Biostatistics online for three districts located in different areas of Maharashtra: Aurangabad (central part of the state), Garhchiroli (in the northeast corner), and Greater Bombay (in the middle western coast). The specific temporal evolution in each area is clearly different indicating that in some districts (Greater Bombay) this trend increases, whereas in other districts (Aurangabad and Garhchiroli) the area-specific temporal trend decreases or it is flat.
To save space, the evolution of the geographical distribution of the relative risk is provided in Section E of the Supplementary material available at Biostatistics online. Figures E.3, E.4, E.5, and E.6 of the Supplementary material available at Biostatistics online show the posterior medians of the relative risks, R itj , (top) and posterior probabilities of risk exceedance, P(R itj > 1|O) (bottom) in the study period, for rape, assault, cruelty, and kidnapping, respectively. In general, the risk distribution for rape, assault, and cruelty remains stable during the period with an increase in the last year. The pattern for kidnapping is different with a steady increase of the risk along the period. In summary, these risk patterns confirm the Northeast-Southwest gradient for rape, assault, and kidnapping, with most of the high risk areas in the northeast of the state, and a Northwest-Southeast gradient for cruelty with most of the high-risk areas concentrated in the central part of the map. Figure 6 displays the relative risk evolution of the four crimes in Aurangabad, Garhchiroli, and Greater Bombay. The relative risk of cruelty in Araungabad remains nearly constant and is about twice the risk of whole Maharashtra. Garhchiroli has low relative risks for all crimes, though an upturn is observed for rape and assault at the end of the period. Finally, Greater Bombay shows increasing and significantly high risk for rape, assault, and kidnapping. The risk for cruelty remains low during the study period.
To conclude the analysis, it would be helpful to identify outlying districts regarding their relative risk temporal evolution. Sun and Genton (2011) propose a functional boxplot to visualize functional data and to detect outlying functions. Figure 7 displays the functional boxplots of the final relative risk trends for rape (top left), assault (top right), cruelty (bottom left), and kidnapping (bottom right). According to the functional boxplot, the district of Wardha, in the northeastern corner of Maharashtra, is an outlying district with regard to rape and assault, with a risk greater than the rest of districts. Interestingly, any outlying district is found in relation to cruelty and kidnapping. It remains unknown why this particular district is an outlier regarding rape and assault, and it is a matter of investigation for social researchers and anthropologists. Some infrastructure indicators (Directorate of Economics & Statistics, 2017) in the period 2013-2014 (2013 is the last year of our study period) reveal that Wardha was one of the districts with the lowest number of total road kilometers, and also with less post offices. This could indicate some form of isolation that could favor a sense of impunity about sexual crimes. However, this is mere speculation and further insight into this district is needed.
A natural question about our proposed model is the sensitivity of the results to the number of knots and their location in space and time. For low rank splines without penalties the number and location of knots are very important (see e.g., Rice and Wu, 2001), but in penalized splines the penalty relaxes the relevance of the number and location of knots. We have run the analysis changing the number of knots in space (longitude and latitude) and in time to assess how sensitive the results are with respect to these choices. Increasing the number of knots in longitude and latitude beyond 10 increases computing time, does not reduce model selection criteria, and the estimated spatial patterns do not change. On the other hand, using more than five internal knots in time seems to overfit the temporal trends as little smoothing is observed. Regarding the posterior correlations, some sensitivity has been observed. If the number of internal knots decreases, some correlations between spatial patterns become nonsignificant. This is probably due to an oversmoothing. Something similar happens when we modify the number of knots in time. In general, the correlation between the spatial patterns do not change too much, but some differences are found in the posterior correlations between the temporal trends.
To conclude this section, and as suggested by one reviewer, we would like to compare our model with other multivariate approaches in the literature. There are interesting proposals that could be explored. Bradley and others (2018) propose a multivariate spatio-temporal model based on multivariate Gamma distributions that conjugate with Poisson distributions oriented to high dimensional data. This approach simplifies sampling for use with MCMC. A recent multivariate proposal for spatial disease mapping based on directed acyclic graphs (DAGAR) (see Datta and others, 2019) has been proposed by Gao and others (2021). Though this proposal depends on the order in which the different responses are modeled, the authors propose Bayesian averaging to overcome this order dependency. Here, we compare our P-spline approach with the M-models proposed by Botella-Rocamora and others (2015) in a spatial setting and extended to the spatio-temporal framework by Vicente and others (2020b). Both approaches are different as the M-model includes spatial random effects with CAR priors and temporal random effects with first or second order random walk priors. In particular, we consider intrinsic conditional autoregressive priors (ICAR) for space, random walks of first order for time, and a Type II interaction for the spatio-temporal interaction term. The DIC, WAIC, and LS are 13 024.642, 13 013.492, and 3. 847 respectively, which are smaller than those obtained with the P-spline models, but the decrease is about 0.5% in DIC and WAIC and about 0.8% in LS. In general, the M-models based on ICAR priors produce less smoothing than the P-spline models and hence the reduction in the model selection criteria. For the case study considered  here, we obtain pretty similar results with both procedures in terms of relative risks estimates and spatial patterns. The main difference is that as our multivariate P-spline model produces smoother temporal trends than the M-model (spatial patterns and temporal trends obtained with the M-model are provided in Figures E.7 and E.8 respectively in Section E of the Supplementary material available at Biostatistics online). One advantage of our procedure is that the between-crimes covariance matrix is parameterized in terms of the correlation parameters, so the posterior distributions of these parameters are directly obtained. On the contrary, the M-models are based on the matrix M which depends on J × J parameters. This leads to an overparameterization as the covariance matrix only requires J (J + 1)/2 parameters. This in turn produces less precise estimates of the correlation parameters. Table 4 displays the posterior correlations between the spatial (lower diagonal) and between the temporal (upper diagonal) patterns obtained with the M-model. Comparing these correlations with those in Table 3, we observe rather similar posterior medians for the spatial correlations, though the credible intervals are wider with the M-model. The differences are more pronounced in the correlations between the temporal patterns. The M-model leads to rather wide credible intervals for most of the correlations. Finally, some differences are expected as the P-spline approach provides correlations between the coefficients, whereas the M-model gives correlation between ICAR spatial random effects and temporal random effects modeled with random walk priors.

DISCUSSION
Violence against women is a social problem all over the world, though in some nations is exacerbated. India is one of such countries where this problem is in the spotlight because of the large number of affected women and the cruelty of some forms of crimes. However, even society in general is getting more and more concerned about this issue, factors that may be related to crimes against women are still unclear and they can be very different depending on the region. India is a paradigmatic case as the wide diversity of traditions, religious beliefs, and social practices hamper the identification of such factors. In this context, statistical methodology in general, and spatio-temporal models for areal count data in particular may help to understand this intricate phenomenon. In the problem considered here this is crucial given that we could not obtain spatio-temporal socio-demographic covariates at district level.
In this article, we present new methodology to model different crimes simultaneously. This is a clear benefit over univariate analyses as it is possible to establish associations between different forms of crimes that may help to understand connections between them, and hence to better comprehend the problem. In particular, multivariate spatio-temporal models based on P-splines are used to analyze jointly four crimes against women in the Indian state of Maharashtra. More precisely, we propose two-dimensional spatial Psplines and one-dimensional temporal P-splines to model the geographical patterns and the temporal trends respectively. Correlations among the coefficients of the P-splines for the different crimes are introduced to look into spatial and temporal associations between the crimes that could point towards connections between them. This joint modeling allows borrowing information from neighboring areas, time points, and also from other crimes, improving the precision of the estimates.
The models have been fitted using integrated nested Laplace approximations in R-INLA and have been built using the "rgeneric" construction. As far as we know, this is the first attempt to fit multivariate P-spline models with R-INLA. Some comments regarding the model complexity are important. First, the smoothing parameter of the temporal P-spline and one of the smoothing parameters of the spatial P-spline are subsumed in the covariance matrix between the coefficients of the temporal and spatial P-splines, respectively. Hence the variance parameters of these matrices are not identifiable. We have set those smoothing parameters equal to one, but this is arbitrary and the variance parameter are no longer interpretable. Consequently inference about the nonidentifiable parameters is avoided. However, the more relevant correlation parameters are identifiable. Second, the spatial and temporal effects include an intercept, and identifiability constraints are needed. Though the relative risk estimates are invariant to the constraints, the credible interval for the smooth functions can be greatly affected. In particular, sum to zero constraints on the coefficients lead to much wider credible intervals than forcing orthogonality between the intercept and the functions (centering the functions). This is particularly noticeable for the temporal P-splines. Consequently, in the real data analysis, the smooth functions have been centered. Additionally, the interaction terms are centered within each area.
To enrich the article, we have compared our multivariate P-spline model with other existing proposals in the literature. In particular, with the M-model based approach developed by Botella-Rocamora and others (2015). Both proposals are implemented in R-INLA and provide similar results. Though the M-models present slightly lower values of the different model selection criteria, the multivariate P-spline model presents some advantages. The first one is that the between-crime covariance matrices are parameterized in terms of the correlation parameters and the posterior distributions of these quantities are directly obtained with R-INLA, unlike the M-models in which the correlation parameters have to be obtained by a resampling procedure. Second, the M-model approach is overparameterized as it considers J × J parameters to obtain the between-crime covariance matrices when only J × (J + 1)/2 parameters are required. This leads to wider credible intervals for the correlation parameters and consequently, less precise estimates. Additionally, in Adin and others (2017), P-spline models show a good performance in terms of sensitivity (detecting true high risk areas) and specificity (discarding false positives), though we acknowledges that this issue requires further research in a multivariate framework.
A potential limitation of our model (and also of the M-model approach) is that the number of parameters in the between-crimes covariance matrices increases considerably with the number of crimes. For example, adding one more crime to our study leads to 15 parameters in the covariance matrices instead of 10. This is important as increasing the number of variance parameters may raise computing time in INLA. A solution to reduce dimension could be factors models, but this has to be explored. Regarding augmenting the number of areas and time points, we are studying a new strategy based on making groups of small areas and fitting the models in those smaller groups of regions (see Orozco-Acosta and others, 2021). This strategy has proven effective with univariate CAR models, but has to be further studied in multivariate settings and with P-spline models. Finally, as pointed out by one reviewer, INLA relies on numerical approximations instead of sampling, and trace plots to check convergence of the posterior distributions are not available. Model checking is particularly relevant with models such as the one proposed in this paper with a high number of hyperparameters whose distributions are not normal. In general, convergence issues can be detected in INLA when unreasonable large posterior standard deviations of the parameters of interests are obtained, and if the posterior marginal densities of the hyperparameters present spikes (see Martins and others, 2013, for the algorithm used in INLA for the posterior distribution of the hyperparameters). In these cases useless credible intervals (too wide) are obtained. In our experience, if similar results are obtained using the Gaussian and the simplified Laplace approximations, the posteriors are well approximated. In any case, we strongly recommend a careful inspection of the results to detect the aforementioned issues that indicate bad approximations of the posterior distributions of the quantities of interest. The last concern regarding our model that we would like to comment on is the prior distributions on the covariance matrix. Here, we choose Wishart priors on the covariance matrices (Chung and others, 2015). Given that some parameters of these matrices are not identifiable, we need to increase the degrees of freedom of the Wishart distribution to make the prior more informative and explore different values for the parameters σ 2 ψ and σ 2 γ . Regarding the real case study in Maharashtra, results are very interesting. The analysis reveals similarities between the spatial patterns of rape, assault and kidnapping with a Northeast-Southwest gradient, whereas the spatial pattern for cruelty is different and a Northwest-Southeast gradient is observed. With respect to temporal evolutions the greatest similarities are those of rape and assault. Our study also identifies districts with high risk for some or all crimes examined here. Moreover, functional boxplots discover Wardha as an outlying district with a risk of rape and assault greater than the rest of districts. We firmly believe that our findings will be useful for social researchers and anthropologists to disentangle the complex phenomenon of violence on women. Additional research in those districts could bring light to identify potential risk factors that may be related with the crimes. So far, we could only make hypotheses based on our results and existing literature. For example, it has been documented that in urban slums in Bombay, the risk of cruelty increases if the man in the household consumes alcohol (see e.g., Begum and others, 2015). Other descriptive studies in rural villages (see Jain and others, 2004) points towards the predominant role of men over women, economic stress, many people living in one room, or complaints of the mother in law as reasons for the abuse. Collecting information about these characteristics at area level would be the first step to assess their validity as potential predictors.
6. SOFTWARE R code, together with the data set and complete documentation is available in the GitHub of our research group (https://github.com/spatialstatisticsupna/Multivariate_spatio_temporal_P_spline).