-
PDF
- Split View
-
Views
-
Cite
Cite
Xu Chen, Surya T. Tokdar, Joint Quantile Regression for Spatial Data, Journal of the Royal Statistical Society Series B: Statistical Methodology, Volume 83, Issue 4, September 2021, Pages 826–852, https://doi.org/10.1111/rssb.12467
- Share Icon Share
Abstract
Linear quantile regression is a powerful tool to investigate how predictors may affect a response heterogeneously across different quantile levels. Unfortunately, existing approaches find it extremely difficult to adjust for any dependency between observation units, largely because such methods are not based upon a fully generative model of the data. For analysing spatially indexed data, we address this difficulty by generalizing the joint quantile regression model of Yang and Tokdar (Journal of the American Statistical Association, 2017, 112(519), 1107–1120) and characterizing spatial dependence via a Gaussian or t-copula process on the underlying quantile levels of the observation units. A Bayesian semiparametric approach is introduced to perform inference of model parameters and carry out spatial quantile smoothing. An effective model comparison criteria is provided, particularly for selecting between different model specifications of tail heaviness and tail dependence. Extensive simulation studies and two real applications to particulate matter concentration and wildfire risk are presented to illustrate substantial gains in inference quality, prediction accuracy and uncertainty quantification over existing alternatives.
1 Introduction
While quantile regression (QR, hereafter) has gained popularity in multiple areas of scientific applications (Buchinsky, 1994; Cade et al., 1999; Elsner et al., 2008), scant work exists in the domain of spatial data analysis. When observation units are spatially indexed, ignoring noise correlation between nearby records may lead to biased estimates of regression parameters. But the task of modelling, estimating and adjusting for spatial noise correlation runs into serious difficulties under the commonly held belief that the QR formulation (1) must be interpreted only locally, at a single prespecified quantile level (Koenker, 2017).
This belief does not square with common practice. Most scientific applications of QR examine whether and how the regression parameters vary with the quantile level, typically resorting to the questionable practice of patching up estimates and p-values gathered from separate analyses (see Tokdar & Kadane, 2012, for a detailed critique). Such patch-ups are especially questionable in the presence of spatial dependence even if noise correlation was accounted for in each separate analysis. Any assumed noise correlation model induces a joint distribution for the observed response values (Y1, …, Yn). But the induced joint distributions stemming from two different choices of the quantile level are typically distinct, and hence the two sets of estimates are mutually incongruous. Put it differently, the parameter estimates at different quantile levels, though presented together, are actually obtained under very different and potentially conflicting assumptions on the same set of data. We examine this issue more deeply by analysing the estimates obtained from an approach by Lum and Gelfand (2012) who extend the level-specific, asymmetric Laplace error model of Yu and Moyeed (2001) to a spatial process model (Sections 5 and 7).
This new formalism enables incorporating noise correlation within QR. The equivalence between(1) and(3) remains valid even if the random quantile levels U1, …, Un associated with the observation units are mutually dependent, as long as each Ui ∼ Unif(0, 1), i.e. their joint distribution must be a copula. In this paper, we combine Gaussian and Student-t-spatial copula processes with the joint QR estimation framework of Yang and Tokdar (2017) to design a novel estimation framework for spatially dependent QR. While a combination like this is unsurprising, it breaks important new ground by making QR practicable in spatial data analysis. We justify this claim with the following key research contributions.
Through extensive numerical experiments, the combined model is shown to greatly reduce estimation bias and improve uncertainty quantification in regression parameter estimation over existing alternatives which may or may not attempt to adjust for noise correlation (Section 5.1), substantiating the need and utility of adjusting for noise correlation before QR methods could be adopted to spatial data analysis. Moreover, the new model is shown to embed computationally and statistically efficient spatial smoothing which delivers model-based regression quantile kriging equipped with meaningful uncertainty quantification (Sections 4.1, 5.1 and 7) which could be a powerful tool in the analyst's toolbox to produce reliable prediction of extreme responses at new locations.
A great advantage of a model based QR analysis is that it could be customized to accommodate various scientific concerns; see Tokdar and Kadane (2012) for an application to hurricane intensity trends. A similar advantage for the proposed model is demonstrated through an important extension that targets heavy-tailed response distributions and associated tail dependence across observation units, an important and challenging problem in spatial analysis. It is shown that competing flavours of the model could be formally compared to one another by using an information criterion whose results are congruous with a substantially more expensive cross-validated assessment of hold-out quantile prediction accuracy (Sections 6 and 7).
In two real world applications involving air quality and wildfire risk, the new model is shown to provide excellent fit to hold-out data, detect important tail effects of key predictors, and, detect the presence of heavy tails and tail dependence in noise correlation (Section 7). These case studies underscore the tremendous potential of QR in delivering a detailed and insightful analysis of potentially heavy-tailed spatial data with heterogeneous predictor effects, over and beyond the capability of existing spatial regression models. Furthermore, our estimation method is shown to scale well to large data sets by incorporating commonly used approximation techniques from Gaussian spatial process literature, thus enabling actual deployment of the new model to serious scientific investigations (Section 3.3).
We end this introduction with a note that the problem of quantile regression under spatial noise correlation is quite different from the one of spatially varying quantile regression as explored by Reich et al. (2011) and Yang and He (2015). Statistical analysis in the latter problem relies on having essentially independent replications of the response measurement at identical or nearby locations and existing methods typically employ spatial smoothing of locally estimated quantile regression parameters. More fundamentally, the underlying theory posits that the structural equations representing the response-predictor relation vary across space. In contrast, our approach theorizes that a single quantile regression formulation as in (1) holds globally at all spatial locations, and, the learning of these global model parameters, while adjusting for noise correlation, is the primary goal of the analysis. However, our model does allow spatial quantile smoothing necessary for infill prediction at new locations where only the covariates are recorded. Although this type of smoothing has less shape flexibility than what is offered by fully spatially varying approaches, it can still offer a superior performance under moderate spatial variation of predictor effects (Section 4.2).
2 Joint Spatial Quantile Regression
2.1 Modeling spatial noise correlation
Our focus is on analysing spatial point-referenced data where paired predictor-response observations , i = 1, …, n, are collected at known locations . The joint QR model postulates , where the vector (U1, …, Un) follows a copula distribution. In the spirit of the Kolmogorov extension theorem, it is natural and useful to view the random quantile levels as Ui = U(si), i = 1, …, n, where is a stochastic process on such that U(s) ∼ Unif(0, 1) at every . Such an embedding within a spatial copula process is essential to formalize prediction at new spatial locations.
Towards an interpretable and practicable statistical analysis of dependence, it is pragmatic to consider a parametric family of spatial copula processes indexed by a low-dimensional parameter vector that describes both the nature and the strength of spatial correlation. For simplicity we let the correlation between any U(s) and U(s′) depend only on the spatial distance ‖s−s′‖. As this distance increases, the correlation must diminish to zero and near-independence must be realized within the range of the observed spatial domain. Any hope of reliable inference of the regression parameters rests on this assumption that at least at great distances within the observed spatial region, data could be taken as nearly independent realizations from the model.
In Equation (5) the variation in the underlying quantile levels is decomposed into two parts in a similar spirit as the basic spatial random effects model (BSRE; Banerjee et al., 2008; Cressie, 1993, see Equation (16). The process W(s) captures structural spatial association while ɛ(s) is uncorrelated pure error. The parameter α ∈ [0, 1] determines the proportion of variation that is spatially structured. When α = 1, very little independent information is expected from nearby observations. When α = 0, the model reduces to the independent noise model. Additionally, when and , the model becomes with GP(0,ασ2ρ(s,s′;(ν, ϕ))) and N(0,(1−α)σ2). This special formulation coincides with BSRE.
The use of Gaussian spatial copulae leads to a fair number of computational advantages, for example, ease of parameter estimation, analytical quantile kriging and scalability with sample size. We shall elaborate on these presently. But Gaussian copula are limited in their ability to model extreme tail dependence, which may be present in heavy-tailed response distributions. We return to this in Section 6 with a generalization to t-copula for an in-depth treatment of modeling possibly heavy-tailed data.
2.2 Prior specification for Bayesian estimation
To address the additional copula piece of our model, we propose the following prior specification on θ = (α, ϕ). We choose the Unif(0, 1) distribution as a generic prior for α. This is not an entirely automatic ‘low-informative’ choice. Indeed, our experimentation shows the uniform prior leads to adaptive estimation of the strength of spatial dependence (Section 5.2). However, if prior knowledge is available on the value or the range of the proportion of spatial correlation, a Beta or a truncated prior may be adopted.
More care is needed in choosing a prior for the correlation range parameter ϕ. It is useful to specify bounds for ϕ from the perspective of effective range of spatial correlation, that is, the shortest distance at which the spatial correlation falls below a small threshold (typically 0.05). As discussed earlier, any hope of reliable parameter estimation from spatially dependent data relies on the assumption of near independence between far away observation units. In any particular application, the bounds for ϕ can be determined according to one's belief of lower and upper limits of the effective range. Additionally, from a methodological perspective, the decay parameter is only weakly identified and an informative prior is needed for satisfactory and reproducible posterior inference and computation (Banerjee et al., 2008, 2014).
For a default choice, we follow the convention in the spatial modelling literature and specify the range of ϕ so that the effective range lies between one-fourth and three-fourths of the maximal pairwise distance among all locations (Banerjee et al., 2014). The prior for ϕ is taken to be a discrete uniform distribution over a dense grid of values within the specified range. This choice of range keeps ϕ away from 0 and eliminates the identifiability issue that manifests when α approaches 0. From a computational perspective, copula density evaluation (Section 3.1) is expensive as it requires flops in each MCMC iteration. By using a discrete uniform prior, only finitely many correlation matrices are required to be computed and stored before initiating MCMC computation. Consequently, the computational overhead can be reduced to . See Section 3.3 for additional reduction in computational complexity for large data.
3 Posterior Computation
3.1 Likelihood evaluation
3.2 MCMC approximation
An efficient log-likelihood evaluation makes our model amenable to Metropolis type Markov chain sampling from the posterior distribution. We deal with the issue of finite representation of the function valued parameters η0, …, ηp, exactly as in Yang and Tokdar (2017). Briefly, a set of knots are chosen, distinct from the likelihood grid T, and with m ≪ L. Each function ηj is represented by its T* based evaluations: . Next, any function evaluation ηj(τ), needed for the likelihood evaluation, is approximated by the associated predictive process interpolation (Banerjee et al., 2008; Tokdar, 2007). Due to the Gaussianity of ηj given (κj, λj) and the inverse-gamma prior on , the vector is conditionally distributed given λj according to a multivariate-t-distribution and the conditional mean is available in closed form. The prior on λj is discretized over a dense finite support to write the unconditional prior on as a finite mixture of t-distributions, and to evaluate as a finite, weighted sum of the conditional means.
It may be tempting to employ a Gibbs sampler alternating between updating parameters (γ0, γ, σ, η, ζ) of the marginal part and the copula parameters θ = (α, ϕ). However, based on our experience, the scale parameter σ2 and α can be highly correlated because both parameters affect the tightness of Ui's. Instead, we find the parametrization helps reduce autocorrelation between posterior samples and improves inference efficiency1. Notice that and are spatial variance and pure error variance respectively when our model degenerates to BSRE (16). With this reparametrization, one complete cycle of our Markov chain sampler could be schematically represented as follows2:
Given σs and ϕ, make a Gibbs update of the parameter (γ0, γ, σe, η, ζ) by using the (adaptive) block Metropolis steps of Yang and Tokdar (2017).
Given (γ0, γ, σe, η, ζ) and ϕ, update σs.
Given the current Ui's extracted from step 2 and σs, sample ϕ from its full conditional distribution.
Given other model parameters, jointly update σs and σe.
It is possible to marginalize out ϕ and run the sampler only on other model parameters. But in our experience, such marginalization does not seem to help with mixing and instead adds to computation complexity. Also, even though σe and σs get updated in steps 1 and 3, the additional joint update in step 4 greatly improves mixing. Lastly, the latent process realization W(s) is marginalized out during MCMC runs but can be recovered in post-processing for inference.
3.3 Reduced rank approximation for large n
The computational complexity of factorizing a n×n matrix, and the associated cost of storage can be prohibitive for implementing the above posterior computation scheme for data with a large sample size n. Various reduced rank approximations to Gaussian processes may be used to alleviate these computation bottlenecks; prominent examples include nearest-neighbour (NN) GP (Datta et al., 2016) and predictive process (Banerjee et al., 2008) with pivoting (Foster et al., 2009). Both models only require computation and memory for model fitting and prediction (Section 4.1). We implement NNGP using the algorithm described in Finley et al. (2019). On a dataset with 16,000 samples and 7 predictors, our algorithm under 5-NNGP with 10,000 MCMC iterations took about 55 min and the predicted conditional quantiles were statistically accurate. See Supplemental material B for a detailed comparison of computational speeds of our model with full GP and NNGP, and, an analysis of prediction accuracy of the model with NNGP.
4 Spatial Smoothing
4.1 Prediction of conditional quantile
Remark 1The quantile smoothing detailed above assumes is generated from the model (3) with the underlying random level given by . In other words, is taken to be a yet unobserved unit from the same batch of response realizations as Y1, …, Yn. If instead, was thought to arise from a completely fresh draw, then should be treated as independent of the entire process U(s), resulting in τ = τ* in Equation (12).
4.2 Spatial dependence and spatial variation
The conditional quantile function computed above varies smoothly in s*, resembling a spatially varying QR model. It is worthwhile to examine whether our model could be reinterpreted as , with conditionally independent observations Y1, …, Yn, as done in Reich et al. (2011). The answer is yes, but in a very limited sense. The limitation is useful both theoretically and practically.
In contrast, the fully spatially varying QR model of Reich et al. (2011) offers much greater shape flexibility in spatial quantile smoothing. But it is of little use when the primary aim is to learn a global relationship between the response and the predictors, unrelated to spatial location. Additionally, a fully spatially varying QR model could be extremely difficult to estimate. Indeed, the estimation method adopted in Reich et al. (2011) is akin to carrying out a post hoc Bayesian smoothing of intercept and slope functions estimated locally via the Koenker-Bassett method. Such an approach could be quite useful for analysing datasets with many repeated observations at each location, even though the two-stage estimation method is likely to offer uncertainty quantifications that are very difficult to interpret. Furthermore, in a simulation study detailed in Supplemental material F, we observed that our model offered excellent spatial quantile smoothing, outperforming Reich et al. (2011) when the spatial variation in regression coefficients was moderate.
5 Numerical Experiments
We carried out several simulation studies to compare the proposed model against existing methods on inference efficiency and prediction accuracy, and to examine the adaptability of our model to different correlation strengths. In Section 5.1, we demonstrate that in quantile modelling for spatial data, neglecting or inappropriately incorporating spatial dependence results in suboptimal statistical performance by including competing methods that are designed for independent data or offer limited adjustment for spatial dependence. To be comprehensive on simulation design, two examples are included: one with a univariate predictor (p = 1) and one with multivariate predictors (p = 7). To investigate robustness of our model against model misspecification, two dependency patterns, one of which deviates from the Gaussian copula process assumption, are considered for each example. In Section 5.2, we illustrate that our model is capable of adapting to different levels of dependence and recovering the true underlying correlation structure by assessing estimation accuracy of copula parameters and induced correlations.
5.1 Statistical performance assessment
5.1.1 Simulation setup
Spatially dependent data sets were generated on by extending the simulation setting of Yang and Tokdar (2017). Spatial locations s1, …, sn were sampled uniformly from . A 2 × 2 design with 100 replications each was used with two choices of the marginal QR model and 2 choices of the spatial copula process:
- M1. Simple regression with p = 1:
- M2. Multiple regression with p = 7:withwhere ϕ(·) denotes the pdf of N(0, 1).
where FAL(·;τ) is the CDF of asymmetric Laplace distribution AL(τ) with pdf fAL(x;τ)=τ(1 − τ) exp {−ρτ(x)}.
- C2. Gaussian copula process:
The two marginal QR models are the same as in Yang and Tokdar (2017). For either copula process, we set (α, ν, ϕ) = (0.7, 2, 0.3). Additionally, for C1, we fixed τ = 0.4. The resulting asymmetry in the true copula process was used as an opportunity to assess the robustness of symmetric Gaussian copula based JSQR under model misspecification. For M1, each synthetic data set consisted of a training set with n = 200 observations, and for M2, each set consisted of n = 500 observations. For each set, a test set containing 50 observations was used for hold-out validations.
The proposed joint spatial quantile regression model (JSQR3 ) was compared against four alternatives; KB: the classical optimization based method by Koenker and Bassett (1978), JQR: the joint quantile regression model by Yang and Tokdar (2017), ALD: Bayesian quantile regression model using asymmetric Laplace error distribution by Yu and Moyeed (2001), and, ALP: spatial quantile regression model using asymmetric Laplace process by Lum and Gelfand (2012). Among these methods, KB, JQR and ALD do not incorporate spatial dependence.
These competing methods were compared based on mean absolute errors (MAE) of point estimates of regression coefficients and corresponding coverage probabilities of 95% confidence (or credible) intervals. We also compared the MAE of predicted conditional quantiles averaged over all observations in the test set. Here is the predicted conditional quantile function given by different methods. For KB, JQR and ALD, this function is precisely . For ALP, we adopted the spatially adjusted quantile function by conditioning on the latent process. All these comparisons were performed at quantile levels {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} and averaged over all 100 data sets under each scenario. For Bayesian methods, posterior means were used as point estimates. The confidence intervals for KB were constructed by inverting a rank test proposed by Koenker (1994). Algorithm configurations are detailed in Supplemental material A.1. Results from M1 are presented in Figure 1 and those from M2×C1 are shown in Figure 2. Results from M2×C2, which are very similar to M2×C1 results, are included in Supplemental material A.2, Figure 1.

Inference efficiency of different methods for M1×C1 (top) and M1×C2 (bottom). In each row, the left two panels present the mean absolute errors of regression coefficients and the right two panels show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}

Inference efficiency of different methods for Example 2 with true generating process being asymmetric Laplace process. The top two rows present the mean absolute errors of regression coefficients while the bottom two rows show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}
5.1.2 Key findings
In every design, JSQR provided smallest averaged errors and highest (and approximately nominal) coverage probabilities for estimating the intercept and slope functions across all quantile levels. The three non-spatial methods, KB, JQR and ALD gave distinctly worse performance, particularly on coverage probability. KB and JQR were generally similar to one another, with the latter offering better performances for certain covariates. By design, KB and ALD gave same parameter estimates, but the latter clearly had lower coverage probabilities; this issue has been raised before in Tokdar and Kadane (2012).
The estimates by ALP were significantly worse than the rest of the group for quantile levels away from 0.5, both in terms of accuracy and coverage. ALP severely distorted the estimates of the intercept function and consequently offered poor estimates of slope functions at non-central quantile levels. Since ALP assumes an asymmetric Laplace process on errors instead of on quantile levels, the model is misspecified at any given quantile level, even though data in C1 were simulated under an asymmetric Laplace copula process.
A similar picture emerged on quantile kriging accuracy at hold-out data (Figure 3). JSQR offered substantial improvements across all quantile levels compared to the methods that did not adjust for spatial dependence. ALP came a distant second for central quantile levels, but actually performed worse at the extremities.

Mean absolute errors of conditional quantile function at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}
Together, this empirical evidence underlines the usefulness of a proper model based spatial QR estimation for improved parameter estimation and quantile kriging. It is noteworthy that ALP, despite its limitations, improves estimation and kriging accuracy, thanks to spatial dependence adjustment, at quantile levels for which the assumed residual process is not too different from the true data generating distribution. But its limitations show up dramatically for non-central, and particularly, extreme quantile levels. The power and flexibility of the joint QR model is evident in JSQR's superior estimation quality across all quantile levels. Moreover, JSQR appears reasonably robust against misspecification of the copula process.
5.2 Adaptation to dependence strength
In a second set of experiments, we examined the adaptability of JSQR to different strengths of spatial dependence. Specifically, 100 data sets, each with n = 500, were generated as in M1×C2, but for each set a distinct and random draw was made for the parameter values of (α, ϕ). The proportion parameter α was drawn from Unif(0, 1) and the decay parameter ϕ was randomly drawn from its prior range. The JSQR estimation method was employed exactly as in Section 5.1. For each dataset, we estimated α, ϕ and 10 pairwise correlations rij = αρ(si, sj ;(ν, ϕ)) between 5 randomly selected observed locations, by their respective posterior means. Mean absolute errors of estimates over 100 simulated data sets are reported in Table 1, which also includes the lengths and the coverage probabilities of the respective 95% credible intervals.
Left column: mean absolute errors of posterior means of α and ϕ and induced pairwise correlations rij = αρ(si, sj; (ν, ϕ)). Middle and right columns: coverage probabilities and average lengths of 95% credible intervals of α, ϕ and rij. Standard deviations are shown as subscripts
. | Absolute error . | CP of 95% CI . | Length of 95% CI . |
---|---|---|---|
α | 0.050.04 | 0.96 | 0.250.11 |
ϕ | 0.040.03 | 0.95 | 0.170.06 |
rij | 0.030.04 | 0.954 | 0.140.10 |
. | Absolute error . | CP of 95% CI . | Length of 95% CI . |
---|---|---|---|
α | 0.050.04 | 0.96 | 0.250.11 |
ϕ | 0.040.03 | 0.95 | 0.170.06 |
rij | 0.030.04 | 0.954 | 0.140.10 |
Left column: mean absolute errors of posterior means of α and ϕ and induced pairwise correlations rij = αρ(si, sj; (ν, ϕ)). Middle and right columns: coverage probabilities and average lengths of 95% credible intervals of α, ϕ and rij. Standard deviations are shown as subscripts
. | Absolute error . | CP of 95% CI . | Length of 95% CI . |
---|---|---|---|
α | 0.050.04 | 0.96 | 0.250.11 |
ϕ | 0.040.03 | 0.95 | 0.170.06 |
rij | 0.030.04 | 0.954 | 0.140.10 |
. | Absolute error . | CP of 95% CI . | Length of 95% CI . |
---|---|---|---|
α | 0.050.04 | 0.96 | 0.250.11 |
ϕ | 0.040.03 | 0.95 | 0.170.06 |
rij | 0.030.04 | 0.954 | 0.140.10 |
We note that accurately estimating induced pairwise correlations rij is potentially more important than accurately estimating the copula parameters α and ϕ. This is because, the induced correlations directly determine the dependence between observations and hence are more critical in adjusting for the spatial dependence toward a more accurate estimation of the marginal QR intercept and slope parameters. Induced correlations are also more identifiable than copula parameters because different combinations of α and ϕ may yield similar values of rij. This argument is verified by the results presented in Table 1 which show that our model accurately estimated rij. The 95% credible interval was narrow and its coverage was close to nominal level. Both absolute errors and length of 95% credible intervals were smaller than those of α and ϕ. As the prior range for ϕ was around (0.1, 0.4), both error of point estimate and uncertainty of ϕ were relatively greater than those of α and rij. This is expected as decay parameter is weakly identified. The results suggest that our model is capable of adapting to different levels of dependence and recovering the underlying correlation structure.
6 Modeling Heavy-Tailed Response Data
6.1 Tail dependence and t-copula
One of the biggest appeals of QR is that it enables analysing predictor effects at extreme quantile levels, which could be particularly relevant for analysing heavy-tailed response data. In order to model independent data with heavy-tailed response, it is straightforward to adapt JQR by adopting a heavy-tailed base distribution f0. However, the same adaptation alone for JSQR is inadequate for modelling heavy-tailed, spatially dependent response because Gaussian copula is tail independent (Sibuya, 1960) and hence would fail to account for possible dependence between extreme outcomes at nearby locations. This deficiency, which could lead to considerably biased prediction of extreme quantiles, may be rectified by employing a heavy-tailed base distribution in conjunction with a spatial copula process that admits tail dependence. We achieve this by using the so called t-copula process (Embrechts et al., 2001), a tail dependence encoding generalization of the Gaussian spatial copula.
Although the t-copula admits tail dependence, neither it nor the Gaussian copula can accommodate different lower and upper tail behaviours. To this end, skew-elliptical copulae, based on skew normal distributions (Azzalini & Valle, 1996) and skew t-distributions (Azzalini & Capitanio, 2003) may be considered, especially for applications where tail asymmetry is a primary concern, for example, in finance applications (McNeil et al., 2005). However, our results in Section 5.1 indicate that our model with the Gaussian copula is robust against mild asymmetries in true copula process. Interested readers are referred to Demarta and McNeil (2005) and Smith et al. (2012) on the construction of skew-copulae.
6.2 Model assessment and comparison
Because a JSQR analysis could be carried out with various combinations of the marginal and dependence models, a natural question arises as to how to compare such competing models and select one with better predictive quality. A desirable model comparison tool here should be capable of comparing models, with different sets of predictors, base distributions and copula processes, using a single, unified criteria. Apropos of this consideration, we advocate using the Watanabe-Akaike information criteria (Watanabe, 2010, WAIC) for model comparison and selection. WAIC is particularly attractive in complex Bayesian modelling like ours for at least two reasons. From a theoretical perspective, WAIC has the property of averaging over posterior distributions and is asymptotically equivalent to a Bayesian leave-one-out cross validation. From a practical view point, WAIC is computationally efficient as it can be simply calculated by reusing posterior samples (Gelman et al., 2014).
There is an important technical hurdle to applying WAIC directly in our context. The construction of WAIC relies on independence among observations so that the log-likelihood score can be decomposed as the sum of n individual contributions. This is not the case for JSQR because of the copula piece which captures dependence between observation. We circumvent this problem by invoking conditional independence between observation units given the smooth spatial process W, which may be treated as an additional model parameter for WAIC calculation.
Following Watanabe (2010), we establish an approximate mathematical equivalence between the conditional WAIC score proposed here and the conditional Bayesian leave-one-out cross validated log probability density score; details are provided in Supplemental material E. Using the proposed conditional WAIC, the results of model comparison for both simulated (Section 6.3) and real data (Section 7) demonstrate that the criteria is effective in picking up the true model and generally agrees with the actual prediction performance. Since our work focuses on modelling spatial extremes and spatial dependence, we highlight model comparisons between joint quantile regression models with different base distributions (Gaussian or t) and dependency structures (independence, Gaussian or t-copula processes). By contrasting these three dependency structures, we are able to determine whether incorporating spatial dependence and whether using the t-copula process result in model improvement. As selecting predictors is not our focus here, we refer interested readers to Cunningham et al. (2020) for the use of WAIC in that regard.
Remark 2The idea of conditional WAIC has been investigated in Li et al. (2016) and Millar (2018) in some widely used Bayesian hierarchical models (e.g. finite mixture model, random effects model) where they argue that the conditional WAIC could be unstable and an alternative ‘integrated WAIC’ (by further marginalizing out the latent variables associated with hold-out observations) is suggested. However, we do not observe any instability of the conditional WAIC under the proposed model and there is little difference between values of the conditional WAIC and ‘integrated WAIC’. Therefore, we suggest using the conditional WAIC in comparing JSQR models due to its simplicity.
Remark 3Several methods exist for selecting a copula model (Chen & Fan, 2006; Dissmann et al., 2013; Genest et al., 2006; Huard et al., 2006; Smith et al., 2010). However these techniques are unsuitable for our model with multivariate elliptical copula and latent quantile levels. Additionally, a unified criteria is required for determining various components of the JSQR model, for which a single numerical model scoring method, as given by the WAIC, is practically appealing.
6.3 Illustration: model comparison using WAIC
In each scenario, we calculated the differences in WAIC of other two models and that of the matching model. According to Table 2, JSQR-GP and JSQR-TP generally had comparable WAICs because they provided similar model fits if the data only contains a few extreme observations. Due to the penalization of more effective model parameters, the WAICs provided by two JSQR models were just slightly larger than JQR when observations were independent. The mean absolute errors of predicted conditional quantiles were computed over observations in the test set and presented in Figure 4, the last panel of which highlighted the differences between JSQR-GP and JSQR-TP in scenario (iii). It is clear that t-copula process improves over Gaussian copula process at extreme quantile levels by 10% to 20% when data are generated from a heavy-tailed marginal distribution with a tail-dependent copula.

First three panels: mean absolute errors of predicted conditional quantiles at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}. The fourth panel: A zoomed version of the third panel highlighting the comparison between JSQR-GP and JSQR-TP for scenario (iii)
Average Watanabe-Akaike information criteria (WAIC) scores (relative to the matching model) and model selection rates
Truth . | JQR . | JSQR-GP . | JSQR-TP . | |||
---|---|---|---|---|---|---|
ΔWAIC . | Selection . | ΔWAIC . | Selection . | ΔWAIC . | Selection . | |
Independent noise | — | 78% | +1.8 | 8% | +2.1 | 14% |
Gaussian copula | +442.9 | 0 | — | 56 | +0.5 | 44 |
t-copula | +402.7 | 0 | +1.8 | 20 | — | 80 |
Truth . | JQR . | JSQR-GP . | JSQR-TP . | |||
---|---|---|---|---|---|---|
ΔWAIC . | Selection . | ΔWAIC . | Selection . | ΔWAIC . | Selection . | |
Independent noise | — | 78% | +1.8 | 8% | +2.1 | 14% |
Gaussian copula | +442.9 | 0 | — | 56 | +0.5 | 44 |
t-copula | +402.7 | 0 | +1.8 | 20 | — | 80 |
Average Watanabe-Akaike information criteria (WAIC) scores (relative to the matching model) and model selection rates
Truth . | JQR . | JSQR-GP . | JSQR-TP . | |||
---|---|---|---|---|---|---|
ΔWAIC . | Selection . | ΔWAIC . | Selection . | ΔWAIC . | Selection . | |
Independent noise | — | 78% | +1.8 | 8% | +2.1 | 14% |
Gaussian copula | +442.9 | 0 | — | 56 | +0.5 | 44 |
t-copula | +402.7 | 0 | +1.8 | 20 | — | 80 |
Truth . | JQR . | JSQR-GP . | JSQR-TP . | |||
---|---|---|---|---|---|---|
ΔWAIC . | Selection . | ΔWAIC . | Selection . | ΔWAIC . | Selection . | |
Independent noise | — | 78% | +1.8 | 8% | +2.1 | 14% |
Gaussian copula | +442.9 | 0 | — | 56 | +0.5 | 44 |
t-copula | +402.7 | 0 | +1.8 | 20 | — | 80 |
The WAIC and MAE of predicted conditional quantile were consistent as their differences between any two models were small or large simultaneously. Also, the base models provided the smallest WAICs for most data sets in all three scenarios according to Table 2. These evidence suggest that WAIC accurately reflects the predictive performance of a model and is suitable for comparing joint QR models with different dependency structures.
7 Case Studies
Two case studies are presented to demonstrate the potential of JSQR models in real applications. In both examples, we showcase the superior prediction performance of the proposed models compared to existing others, and, the effectiveness of WAIC in selecting base distributions and copula processes. In the analysis of PM2.5 concentration data (Section 7.1) with 339 observations, we highlight the insightful heterogeneous predictor effects offered by JSQR in contrast to the classical BSRE model. In the wildfire risk analysis (Section 7.2), we feature the t-copula process in capturing spatial tail dependence with a substantially larger dataset with 1855 observations.
We clarify that this section serves as a validation of the proposed model in real data analysis. Both WAICs and prediction performances are evaluated using multi-fold validation studies to reduce randomness. However, readers who intend to use the JSQR model in their applications should fit the model on the whole dataset (without split) regardless of the purposes of model selection, predictor effect inference or quantile prediction.
7.1 Analysis of PM2.5 concentration data
Exposure to fine particulate matter (diameter less than 2.5 μ m) is positively associated with the risk for lung cancer and cardiovascular disease morbidity and mortality (Brook et al., 2010; Dockery et al., 1993; Pope & Dockery, 2006). Various regression models, for example, generalized additive models (Yanosky et al., 2008b), Cox hazard models (Jerrett et al., 2005; Pope et al., 1995) and conditional autoregressive models (Paciorek, 2013), have been proposed to predict PM2.5 concentration. These methods focus only on the response mean prediction. But, arguably, predicting high response quantiles is of a greater concern here. High PM2.5 concentration is more harmful as acute exposure could trigger cardiovascular morbidity (Bell et al., 2005). Additionally, the relationship between PM2.5 and many important predictors (detailed below) appears more complex than a homoskedastic linear dependence (Figure 5), rendering classical mean regression analyses rather inadequate. Quantile regression has been recently employed to analyse PM2.5 concentration (Barmpadimos et al., 2012; Porter et al., 2015). However, analyses based on KB estimates clearly overlook the spatial dependence of data from different monitoring stations as presented in Figure 5a.
![Averaged daily PM2.5 concentration (μg/m3) during 2001-2 across monitoring stations in northeastern United States (a) and against two potential relevant predictors (b)-(c); response data does not appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssb/83/4/10.1111_rssb.12467/1/m_rssb_83_4_826_fig-5.jpeg?Expires=1747995918&Signature=df9Fd8TjMykLU6fJmVlwUBMuKR~RTIBo6tBI4EWKUX5h5J-uXr-32MaicgoPZwwZWkHT2~kOJeBJWg7iq2tUVtaek5tlaorTu3~~8gAS-EfdDbTDqsHJandAOkxO8mnO6J3p6cb0NdH5d7nyiC4NOC-bwjv6kIv-ZLIzq1wiwcnOVv6otdRcekv4IJ6mLj7NqcmB1Xu0sdY2MLsk-1j2swImB4kz2OEl2465ozQ8hWqmoNAMjE-K3zpSobamRdtfceUQp19Tg7czkjDnEArnx0BO~sdTM7WStZVmyJblyhtrZSGc2dKOLU0ByubPCSRSaEKxl-IXpDjLSmvjCi94hg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Averaged daily PM2.5 concentration (μg/m3) during 2001-2 across monitoring stations in northeastern United States (a) and against two potential relevant predictors (b)-(c); response data does not appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]
We reanalysed data4 on PM2.5 reported in Paciorek (2013) using the proposed JSQR models. The dataset contained averaged daily PM2.5 concentrations (shown in Figure 5) from 2001 to 2002 monitored at 339 stations in the northeastern United States. In Yanosky et al. (2008a) and Paciorek et al. (2009), the following 5 covariates have been shown to have significant associations with PM2.5 or improve predictions and hence were included into our analysis: (i) logarithm of population density at county level (LCYPOP), (ii) logarithm of distance to A1 class roads (primary roads, typically interstates) (LDISTA1), (iii) proportion of urban land use within 1 kilometre of the station location (URB), (iv) logarithm of PM2.5 emissions within a 10 kilometre buffer (L25E10), and, (v) elevation of the station (ELEV). We refer interested readers to Yanosky et al. (2008a) for a detailed description of the dataset.
7.1.1 Assessment of infill quantile prediction
To examine the existences of heavy-tailedness and tail dependence, we adopted three JSQR models with different combinations of base distributions and copula processes: JSQR-GP1 with a logistic base and Gaussian copula, JSQR-GP2 with a t base and Gaussian copula, and, JSQR-TP with a t base and a t-copula. The methods in Section 5.1 were also applied for comparison. A 10-fold validation study was conducted to assess how well different spatial and non-spatial quantile regression methods performed in predicting response quantiles at hold-out locations. In each fold, the data were randomly partitioned into a training set and a test set with 270 and 69 observations respectively. All prior specifications and MCMC settings were kept same as those in Section 5. The accuracy of quantile prediction at a level τ was evaluated by the check loss averaged over all observations in test sets.
Among the four joint QR models, JSQR-GP1 had the smallest average WAIC (808), closely followed by JSQR-GP2 and JSQR-TP (both at 810) and trailed by JQR (1124). This relative ordering suggests the existence of spatial noise correlation among observation units, but the distribution of PM2.5 does not appear to be heavy-tailed (consistent with Figure 5d) or exhibit tail dependence. The ordering of WAICs was coherent with actual prediction performance, namely, JSQR-GP1 offered slightly smaller average check losses than other JSQR models and provided a clear improvement over JQR. Therefore, we adopted JSQR-GP1 as a representative of JSQR models for all subsequent analyses.
Prediction accuracy of QR models, measured by averaged check loss relative to BSRE, are summarized in the right panel of Figure 6. For QR models, JSQR-GP1 consistently offered the smallest averaged loss across all quantile levels on test sets. The losses given by JQR, KB and ALD were similar and typically the largest except for extreme quantile levels. ALP had smaller losses than the three non-spatial methods from τ = 0.2 to 0.8 but provided larger losses out of this range. Overall, these results are consistent with our findings in Section 5.1 and underline the necessity of adjusting for spatial dependence toward a more accurate QR analysis of PM2.5 concentration. We note that BSRE and JSQR offered comparable prediction accuracy for this data set. This is perhaps due to the fact that our data consisted of daily measurements averaged across two years, thus boosting Gaussianity of the joint distribution of predictors and response. However, JSQR did offer smaller prediction error at very high quantile levels (⩾ 0.9). Furthermore, JSQR offered a more nuanced view of the relationship between certain predictors and PM2.5, especially at high quantiles, which we turn to next.
![Left: PM2.5 regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssb/83/4/10.1111_rssb.12467/1/m_rssb_83_4_826_fig-6.jpeg?Expires=1747995918&Signature=gPLsuqs6FaWs8bwTDXvrmSzCIMIR76DnOmFJR2Lz7qNmnVZRq3u5y7tyQFppaOXI~UQbAJ9gbXCq3GXNuzUoYKZ35vEXg7b0uFn1HUP1FgIjqBZ23PaS1dTHb9sDfbz0d~5WaLaDJEOXX6Q8HEvx1RPU2rG3z5e3hddNqTrSAkKBYabzCd3pYzfIcLFsbmf~vtj5ZJJ1GSgBwnuE61ACci6e7UjnIXt~8DpqpDhx~fkGrL0~OP~vR~6RJcvh2YAsTOCNM3bVto8AMaKGr0UayN2CiQQo3fQXlbOdfOO5V5KSryx0VAi4~~d5HJ7eAGmcY1f0i28WWi0d8zFbgzyfSA__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Left: PM2.5 regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]
7.1.2 Heterogeneous predictor effect on response quantiles
Both BSRE and JSQR detected strong spatial noise correlation. The proportion of spatial variation was estimated to be 81% by BSRE and 86% by JSQR. The corresponding estimates of the decay parameter ϕ were 0.43 and 0.46. BSRE and JSQR provided similar median and interval estimates for the intercept function across all quantiles (Figure 6, left panel). The two models exhibited agreement to some extent on the effects of population density (LCYPOP) and elevation (ELEV). JSQR estimated the effects of these predictors to be relatively stable across all quantile levels. The signs of the regression coefficients matched expectation and were in line with existing literature (Yanosky et al., 2008b). Given other predictors in the model, the distance between monitoring station and A1 class roads (LDISTA1) was not found to have a significant effect by either model.
A difference between JSQR and BSRE emerged in the estimates of regression coefficients of the percentage of urban land use (URB) and PM2.5 emission within 10 kilometres (L25E10). BSRE estimated the effects of both these predictors to be significant and positive. JSQR estimated URB to have a severe impact at low and middle quantile levels but little effect on the high quantiles. It also estimated L25E10 to have no significant effect at low and middle quantile levels, but a fairly pronounced positive effect at high and extremely high levels. These differential effects were significant; the difference between the slopes at τ = 0.9 and τ = 0.1 was estimated to be −0.02 (95% credible interval = [−0.04, 0.00]) for URB and 0.15 ([0.03 0.29]) for L25E10.
The JSQR estimates could be interpreted as saying that while higher urban land use results in an increased baseline PM2.5 concentration, it is the amount of local PM2.5 emission that largely determines extreme levels of PM2.5 concentration. At the 95th percentile level, the regression coefficient of L25E10 was estimated to be 0.17. Accordingly, a rise of PM2.5 emission by 36 times within a 10km radius (one standard deviation increase of L25E10) would result in an increase of 0.6 μg/m3 of the 95th percentile of PM2.5 concentration which corresponds to a 7% (or 4%) increase for states such as Maine, New Hampshire and Vermont with low (or, Ohio, Pennsylvania and DC with high) PM2.5 concentrations.
7.2 Wildfire risk analysis
Wildfires cause significant, lasting damage to ecology (Moritz et al., 2014), economy (Butry et al., 2001) and human health (Reid et al., 2016). The burning index (BI) is a metric commonly used by National Fire Danger Rating System to monitor and forecast risk due to wildfires. The calculation of BI is sophisticated and requires measuring fuel characteristics, particle properties and moisture (Cohen & Deeming, 1985). Since collecting such measurements requires field-specific knowledge and equipment and is impossible to conduct for every location, model-based approaches are essential for analysing and predicting BI with data that can be conveniently obtained. Kreuzer et al. (2017) propose a simultaneous autoregressive model with t-distributed, spatially dependent random errors to analyse BI. While such a model is able to adjust for the heavy tails of the mean residuals, it fails to provide a detailed reconstructions of the extreme right tail quantiles of BI which are of direct interest because of the associated heightened risk.
We analysed daily wildfire risk data collected by Wildland Fire Assessment System (https://www.wfas.net/) on 19 August 2020. The data contained fire weather and fire danger observations at 1855 stations across United States (excluding Alaska and Hawaii). As expected, high values of BI were concentrated in California and the data suggested the potential existence of spatial dependence (Figure 7a). We adopted elevation and weather information as the only predictors because they are easy to collect and hence are particularly useful for surrogate BI calculation. Specifically, we included (i) elevation (ELEV), (ii) temperature (TEMP), (iii) relative humidity (RH), (iv) wind speed (WIND) and (v) precipitation (PPT) into our analysis. The focus on non-central quantiles of BI, the existence of spatial dependence, and, the heteroskedasticity of BI (Figure 7b, c) simultaneously motivated the use of JSQR models.
![Burning Index measured on 19 August 2020 across contiguous US (a), and plotted against wind speed (b) and relative humidity (c). Response data appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssb/83/4/10.1111_rssb.12467/1/m_rssb_83_4_826_fig-7.jpeg?Expires=1747995918&Signature=yPeXT4xin5WwxUUDCeJsIU1ey764LyKNQ~yfPu8sL1avEQWQFZ4wT6M1kXK9a4hMF6rmPZwinLgPEOqHddtdOrUxo054N-OE2V~UoC9ImBT3Xoz2CAe6Xgm4uCvwYix-wcsQrcbYf5lLOdAOrxUEZc3fQi71ivS2uaimm5DRXwzM3xEQH~-kc4xbQfC-jIcCN4UejBiQBr1ry8bPsPMAToUuRw5mVzOL72RGTAEzBhMCh-TcPR6dGM-2CfXPrlAFI~jLpwvGiojipxoC66sJPiqoUwYTQyptYUQL79WcH5WHd57swZfoUG5PoDKioZa14LJ8kjiLZH2LQ3sNooi0VQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Burning Index measured on 19 August 2020 across contiguous US (a), and plotted against wind speed (b) and relative humidity (c). Response data appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]
Similar to Section 7.1.1, we performed a 10-fold validation study with the same set of methods. In each fold, the data were randomly partitioned into a training set and a test set with 1000 and 855 observations respectively. Among the four joint QR models, JSQR-TP had the smallest averaged WAIC score (7957), followed by JSQR-GP2 (7970), JSQR-GP1 (7976) and JQR (8482), lending support to incorporating a heavy-tailed marginal distribution and spatial tail dependence into the analysis. JSQR-TP was chosen as a representative of JSQR models for subsequent analyses.
The regression coefficient estimates provided by JSQR-TP and BSRE were substantially distinct (Figure 8, left panel). The signs of estimated regression coefficients given by the two models generally agreed. Elevation, relative humidity and precipitation had non-positive effects at all quantile levels, whereas temperature and wind speed had non-negative effects. JSQR-TP detected clear heterogeneous predictor effects on TEMP, RH and WIND, estimating much stronger effects on the right tail despite increased uncertainty. These variables are known to be associated with the two major components determining BI, namely the energy release which increases with higher temperature and lower humidity, and, spread which is enhanced by high winds (Cohen & Deeming, 1985). Our QR analysis suggests that these three predictors have a particularly strong impact on the severity of extreme wildfires.
![Left: wildfire risk regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]](https://oup.silverchair-cdn.com/oup/backfile/Content_public/Journal/jrsssb/83/4/10.1111_rssb.12467/1/m_rssb_83_4_826_fig-8.jpeg?Expires=1747995918&Signature=hFjezuH4bX79SIoOc7IxknL-k1JCJiP9mN6ZQn86JiAyN6iJ6C8uKbv~Fn-8ldWCy3yE3kHTyUs1MeafHASPatX3WVATrPQhgQBdZcUosZoZ3EoA8lHn64pwNCboy8p-gfZkH4uNdhQtseET3gxBG~n9gIly5LnHJBir6dRu9sDFcxxoZb1QFBQ33lPe4FTYXHLTI~UTTYUIQoY77Mv3D~g8mc3D9fjwWfKjv2msGRalk2obH5ujELTB31PftPlQlwog7obgGAD-4BQVtZiUY-LmxUmRgdul4VQbxh4~8Rt4zJ0zrTpAam3jdTljegB2mdimJlsnNsRsijuq0mxqfQ__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Left: wildfire risk regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]
Figure 8 (right panel) compares the prediction accuracy of JSQR against other competitors (all figures shown relative to BSRE). JSQR-TP provided the smallest averaged loss consistently at all quantile levels. A key distinction from the PM2.5 study is that the QR models provided much smaller losses than BSRE, especially for non-central quantile levels. Particularly, JSQR-TP offered 20 to 40% improvement over BSRE at both lower and upper quantile levels. The Gaussian noise assumption underlying BSRE creates a handicap in modelling skewed, heavy-tailed daily BI measurement. But, even if this shortcoming were rectified with heavy-tailed error distribution model, the heterogeneous predictor effect estimates offered by JSQR could not be accomplished with such extensions.
8 Concluding Remarks
We have introduced here a novel generalization of the joint quantile regression model of Yang and Tokdar (2017) to adjust for noise correlation in point-referenced spatial data. We have provided ample evidence that such adjustment improves parameter estimation, uncertainty quantification and prediction. As an extension of BSRE, the proposed JSQR model offers a comprehensive and practicable solution to the problem of spatial regression beyond the mean. Through two case studies we have shown that JSQR produces interpretable estimates of potentially heterogeneous predictor effects, offers excellent fit to hold-out data and can successfully adapt to heavy-tailed response and tail dependence. An accompanying R package is under development; but a ready-to-deploy version maybe accessed at https://github.com/xuchenstat/JSQR.
Our development crucially depends on accepting the linear QR model assumption (1) simultaneously for all quantile levels τ ∈ (0, 1). This strong assumption, a ‘leap of faith’ according to Koenker (2017), should be given its due diligence before adopting a JSQR, or for that matter, any joint quantile regression analysis of data. It is conceivable that the global linear QR model is adequate only when a sufficiently rich set of nonlinear transformations of the original covariates are included as predictors. In particular, any serious application with JQR or JSQR should include diagnostics of model fit, which could be performed by assessing uniformity of the residual random levels V1, …, Vn introduced in Section 6.2. Additionally, iterative and interactive model improvement could be sought via addition of nonlinear transformations of original covariates as new or alternative predictors, and a formal model selection may be performed by using WAIC. See Cunningham et al. (2020) for recent development along this line.
Joint quantile regression models are a class of conditional density estimation models and are naturally related to the so called ‘density regression’ methods which attempt to directly estimate the conditional response density given predictors (De Iorio et al., 2004; Dunson et al., 2007; Tokdar et al., 2010). Clearly, one could obtain estimates of conditional quantiles under a density regression approach, just as one could easily obtain estimates of conditional densities under a joint QR model. Which approach is selected for a particular data analysis task depends primarily on the goals of the analysis. A major appeal of the joint QR approach is interpretability. The estimates of the slope functions give a clear picture of potentially heterogeneous predictor effects, which may be vital in a careful scientific analysis of predictor-response relationship (Abrevaya & Dahl, 2008; Cade & Noon, 2003; Wasko & Sharma, 2014). Density regression, on the other hand, offers more flexible quantile shape adjustment which could be more useful for applications focused on prediction. However, even for pure prediction purposes, joint QR, via nonlinear transformations of covariates, may be competitive against density regression methods which often struggle to account for sharp changes in the spread and skewness of the conditional densities, a hallmark of extremely heterogeneous predictor effects; see Tokdar (2013). Other than density regression, joint quantile regression is also a member of the distributional regression family, which posits distributional statistics of the response as functions of predictors (Firpo et al., 2009; Foresi & Peracchi, 1995; Klein et al., 2015, to name a few). The choice of statistics substantially determines flexibility and interpretation of the model. As our focus is on quantile regression, direct comparison with other distributional regression models is out of the scope of this work; see Koenker et al. (2013) for an interesting discussion.
Our results here are of empirical nature. A rigorous mathematical treatment of asymptotic properties of JSQR remains the subject of a future paper. The assumption of dependence between observations makes it quite challenging to study posterior consistency properties of the proposed Bayesian method. However, the conditional independence implied by the formulation (13) offers a path to addressing this challenge under an infill asymptotic setting where W is an additional infinite dimensional parameter to be estimated along with and β(·).
Finally, we note that extensions similar to JSQR could be sought to address a variety of other dependency structures within a joint quantile regression model, with applications to time series data, longitudinal data, or data where the response is vector valued. Such developments crucially depend on the choice of the copula families that adequately address relevant aspects of statistical estimation and computing. Details will be presented in a forthcoming paper.
Supporting Information
Additional supporting information may be found online in the Supporting Information section.
Effective samples sizes are boosted by 27% and 42%, respectively for p = 1 and p = 7 in the simulation studies of Section 5. Mean absolute errors and coverage probabilities of regression coefficients are also improved.
All function valued parameters are first updated in their finite representations and then used in likelihood evaluation as described.,
Without special notice, we use JSQR to denote our joint spatial quantile regression model (4) with a standard logistic density f0 and the Gaussian copula process (5).
The dataset is available at https://www.stat.berkeley.edu/users/paciorek/code/ejs/.
Acknowledgement
This research is supported by NSF grants DMS-1613173 and DMS-2014861. XC gratefully acknowledges support from the Katherine Goodman Stern Fellowship. We thank the editor, AE and two referees for their valuable suggestions for improving the manuscript.
References