Abstract

Linear quantile regression is a powerful tool to investigate how predictors may affect a response heterogeneously across different quantile levels. Unfortunately, existing approaches find it extremely difficult to adjust for any dependency between observation units, largely because such methods are not based upon a fully generative model of the data. For analysing spatially indexed data, we address this difficulty by generalizing the joint quantile regression model of Yang and Tokdar (Journal of the American Statistical Association, 2017, 112(519), 1107–1120) and characterizing spatial dependence via a Gaussian or t-copula process on the underlying quantile levels of the observation units. A Bayesian semiparametric approach is introduced to perform inference of model parameters and carry out spatial quantile smoothing. An effective model comparison criteria is provided, particularly for selecting between different model specifications of tail heaviness and tail dependence. Extensive simulation studies and two real applications to particulate matter concentration and wildfire risk are presented to illustrate substantial gains in inference quality, prediction accuracy and uncertainty quantification over existing alternatives.

1 Introduction

Linear quantile regression (Koenker & Bassett, 1978) relates a scalar response Y to a predictor vector XRpvia the “model”:
(1)
where QY(τ|x)=inf{a:P(Ya|X=x)τ}is the τth conditional response quantile given X = x. Given data (Xi,Yi)i=1n, the regression parameters β0(τ), β(τ)=(β1(τ),,βp(τ)), signifying level-specific intercept and slopes, are typically estimated by minimizing the sample averaged check loss n-1i=1nρτ(Yi-b0-Xib)in (b0, b), where ρτ(ε)=ε{τ-1(ε<0)}also depends on the chosen level τ. These estimates are more robust against heteroskedasticity and outliers than least squares regression estimates. More importantly, by varying the quantile level τ, the analyst could examine predictor effects that may be present only in the tails, thus capturing a richer variety of dependence than what is possible under ordinary mean or median regression; see Koenker (2005) for a review.

While quantile regression (QR, hereafter) has gained popularity in multiple areas of scientific applications (Buchinsky, 1994; Cade et al., 1999; Elsner et al., 2008), scant work exists in the domain of spatial data analysis. When observation units are spatially indexed, ignoring noise correlation between nearby records may lead to biased estimates of regression parameters. But the task of modelling, estimating and adjusting for spatial noise correlation runs into serious difficulties under the commonly held belief that the QR formulation (1) must be interpreted only locally, at a single prespecified quantile level (Koenker, 2017).

This belief does not square with common practice. Most scientific applications of QR examine whether and how the regression parameters vary with the quantile level, typically resorting to the questionable practice of patching up estimates and p-values gathered from separate analyses (see Tokdar & Kadane, 2012, for a detailed critique). Such patch-ups are especially questionable in the presence of spatial dependence even if noise correlation was accounted for in each separate analysis. Any assumed noise correlation model induces a joint distribution for the observed response values (Y1, …, Yn). But the induced joint distributions stemming from two different choices of the quantile level are typically distinct, and hence the two sets of estimates are mutually incongruous. Put it differently, the parameter estimates at different quantile levels, though presented together, are actually obtained under very different and potentially conflicting assumptions on the same set of data. We examine this issue more deeply by analysing the estimates obtained from an approach by Lum and Gelfand (2012) who extend the level-specific, asymmetric Laplace error model of Yu and Moyeed (2001) to a spatial process model (Sections 5 and 7).

An alternative and rapidly emerging viewpoint adopts the formulation (1) simultaneously across all quantile levels to construct a single estimating model for the entire data set (He, 1997; Reich et al., 2011; Tokdar & Kadane, 2012; Yang & Tokdar, 2017). The regression parameters are jointly estimated as smooth functions of the quantile level under the natural non-crossing constraint:
(2)
improving estimation, uncertainty quantification and prediction (Yang & Tokdar, 2017, YT17, hereafter). Here XRpis the domain of X, and is assumed to be bounded and convex. Equally important, this comprehensive view offers a novel and congruous formalism of noise in the context of quantile regression, taking it beyond the notion of residual around a single fitted line. The linear QR formulation (1) taken simultaneously for all τ ∈ (0, 1) and restricted to the non-crossing condition (2) is equivalent to the fully generative model for the response data:
(3)
where U is a random quantile level independent of X. This equivalence is a simple consequence of the so called ‘inverse CDF technique’ for random variable generation.

This new formalism enables incorporating noise correlation within QR. The equivalence between(1) and(3) remains valid even if the random quantile levels U1, …, Un associated with the observation units are mutually dependent, as long as each UiUnif(0, 1), i.e. their joint distribution must be a copula. In this paper, we combine Gaussian and Student-t-spatial copula processes with the joint QR estimation framework of Yang and Tokdar (2017) to design a novel estimation framework for spatially dependent QR. While a combination like this is unsurprising, it breaks important new ground by making QR practicable in spatial data analysis. We justify this claim with the following key research contributions.

Through extensive numerical experiments, the combined model is shown to greatly reduce estimation bias and improve uncertainty quantification in regression parameter estimation over existing alternatives which may or may not attempt to adjust for noise correlation (Section 5.1), substantiating the need and utility of adjusting for noise correlation before QR methods could be adopted to spatial data analysis. Moreover, the new model is shown to embed computationally and statistically efficient spatial smoothing which delivers model-based regression quantile kriging equipped with meaningful uncertainty quantification (Sections 4.1, 5.1 and 7) which could be a powerful tool in the analyst's toolbox to produce reliable prediction of extreme responses at new locations.

A great advantage of a model based QR analysis is that it could be customized to accommodate various scientific concerns; see Tokdar and Kadane (2012) for an application to hurricane intensity trends. A similar advantage for the proposed model is demonstrated through an important extension that targets heavy-tailed response distributions and associated tail dependence across observation units, an important and challenging problem in spatial analysis. It is shown that competing flavours of the model could be formally compared to one another by using an information criterion whose results are congruous with a substantially more expensive cross-validated assessment of hold-out quantile prediction accuracy (Sections 6 and 7).

In two real world applications involving air quality and wildfire risk, the new model is shown to provide excellent fit to hold-out data, detect important tail effects of key predictors, and, detect the presence of heavy tails and tail dependence in noise correlation (Section 7). These case studies underscore the tremendous potential of QR in delivering a detailed and insightful analysis of potentially heavy-tailed spatial data with heterogeneous predictor effects, over and beyond the capability of existing spatial regression models. Furthermore, our estimation method is shown to scale well to large data sets by incorporating commonly used approximation techniques from Gaussian spatial process literature, thus enabling actual deployment of the new model to serious scientific investigations (Section 3.3).

We end this introduction with a note that the problem of quantile regression under spatial noise correlation is quite different from the one of spatially varying quantile regression as explored by Reich et al. (2011) and Yang and He (2015). Statistical analysis in the latter problem relies on having essentially independent replications of the response measurement at identical or nearby locations and existing methods typically employ spatial smoothing of locally estimated quantile regression parameters. More fundamentally, the underlying theory posits that the structural equations representing the response-predictor relation vary across space. In contrast, our approach theorizes that a single quantile regression formulation as in (1) holds globally at all spatial locations, and, the learning of these global model parameters, while adjusting for noise correlation, is the primary goal of the analysis. However, our model does allow spatial quantile smoothing necessary for infill prediction at new locations where only the covariates are recorded. Although this type of smoothing has less shape flexibility than what is offered by fully spatially varying approaches, it can still offer a superior performance under moderate spatial variation of predictor effects (Section 4.2).

2 Joint Spatial Quantile Regression

2.1 Modeling spatial noise correlation

Our focus is on analysing spatial point-referenced data where paired predictor-response observations (Xi,Yi), i = 1, …, n, are collected at known locations siSRr. The joint QR model postulates Yi=β0(Ui)+Xiβ(Ui), where the vector (U1, …, Un) follows a copula distribution. In the spirit of the Kolmogorov extension theorem, it is natural and useful to view the random quantile levels as Ui = U(si), i = 1, …, n, where (U(s):sS)is a stochastic process on Ssuch that U(s) ∼ Unif(0, 1) at every sS. Such an embedding within a spatial copula process is essential to formalize prediction at new spatial locations.

Towards an interpretable and practicable statistical analysis of dependence, it is pragmatic to consider a parametric family of spatial copula processes indexed by a low-dimensional parameter vector that describes both the nature and the strength of spatial correlation. For simplicity we let the correlation between any U(s) and U(s′) depend only on the spatial distance ‖ss′‖. As this distance increases, the correlation must diminish to zero and near-independence must be realized within the range of the observed spatial domain. Any hope of reliable inference of the regression parameters rests on this assumption that at least at great distances within the observed spatial region, data could be taken as nearly independent realizations from the model.

We meet these modelling needs by taking (U(s):sS)to follow a Gaussian copula, induced by a stationary Gaussian spatial process. Let Φ(·) denote the cumulative distribution function (CDF) of N(0, 1). We define
(4)
 
(5)
where
(6)
is the Matérn correlation function with smoothness parameter ν and decay parameter ϕ, a widely popular choice in spatial statistics (Banerjee et al., 2014; Cressie, 1993; Stein, 2012). Here, Γ(·) is the gamma function and Kν(·) is the modified Bessel function of the second kind of order ν. Our approach is trivially generalized to non-Matérn correlation functions. We take ν to be of a fixed, user specified value because it is typically difficult to estimate ν from data and often a value of ν < 3 is deemed sufficient for real data analysis (Banerjee et al., 2014). Thus, the spatial copula formulation is indexed by a two-dimensional parameter θ = (α, ϕ).

In Equation (5) the variation in the underlying quantile levels is decomposed into two parts in a similar spirit as the basic spatial random effects model (BSRE; Banerjee et al., 2008; Cressie, 1993, see Equation (16). The process W(s) captures structural spatial association while ɛ(s) is uncorrelated pure error. The parameter α ∈ [0, 1] determines the proportion of variation that is spatially structured. When α = 1, very little independent information is expected from nearby observations. When α = 0, the model reduces to the independent noise model. Additionally, when β0(u)=σΦ-1(u)+β~0and β(u)=β~, the model becomes Yi=β~0+Xiβ~+W~(si)+ε~(si)with W~(s)GP(0,ασ2ρ(s,s′;(ν, ϕ))) and ε~(si)iidN(0,(1−α)σ2). This special formulation coincides with BSRE.

The use of Gaussian spatial copulae leads to a fair number of computational advantages, for example, ease of parameter estimation, analytical quantile kriging and scalability with sample size. We shall elaborate on these presently. But Gaussian copula are limited in their ability to model extreme tail dependence, which may be present in heavy-tailed response distributions. We return to this in Section 6 with a generalization to t-copula for an in-depth treatment of modeling possibly heavy-tailed data.

2.2 Prior specification for Bayesian estimation

We adopt a semiparametric Bayesian approach for making inference on model parameters (β0,β,θ). Following YT17, the marginal linear QR model (3) is reparametrized to circumvent the difficult non-crossing condition (2). Fix a probability density function (pdf) f0 on R and let q0 be the corresponding quantile density, that is, the derivative of the quantile function. Let τ0=-0f0(z)dz. One may view f0 as a prior guess for the density, up to a scale parameter, of the level-
residual: Y-β0(τ0)-Xβ(τ0). When the support of Y is (−∞, ∞), one may take f0 to be the standard logistic or a standard Student-t-distribution (and τ0=0.5); the latter being attractive for modelling heavy-tailed data. These are the choices adopted in the rest of the paper. For positive valued response, one could take f0 to be a Gamma distribution or a generalized Pareto distribution (with τ0=0); again the latter being appropriate for heavy-tailed data.
The bounded, convex predictor domain Xis assumed to contain zero as an interior point. For any non-zero vector bRp, define the “projection radius opposite b” as a(b,X)=supxX{-xb/b}and a(0,X)=1, where ‖·‖ is the Euclidean norm. Reformulate the intercept and slope functions of (3) as:
(7)
 
(8)
 
(9)
with model parameters γ0R, γRp, σ > 0, η:(0,1)Rp; and ζ: [0, 1] → [0, 1] which is restricted to be a differentiable, monotonically increasing bijection, that is, a diffeomorphic deformation of the unit interval. This formulation gives an exhaustive representation of all linear QR models (3) subject to the non-crossing constraint (2) and a support match between Y and f0 as described above (Yang & Tokdar, 2017, Theorem 2).
The above reformulation embeds the non-crossing constraint mostly through the transformation (9). The function valued parameters η = (η1, …, ηp) underlying the slopes are completely unconstrained. The shape restrictions on the diffeomorphism ζ(·) are relieved via an additional transformation:
where the function η0 is also completely unconstrained, except for the continuity properties needed to make the above definition valid. Following YT17 we adopt the priors below on the new model parameters:
where ρSE(s, s′; λ) = exp (−λ2ss′‖) is the square exponential correlation function with rescaling parameter λ. The prior specification on the function valued parameters η0, …, ηp is motivated by a rich literature on Bayesian nonparametric smoothing that shows that square-exponential Gaussian process priors, equipped with a hyper-prior on the rescaling parameter, offers optimal regularization and smoothness adaptation in a variety of function estimation problems (Tokdar & Ghosh, 2007; van der Vaart & Van Zanten, 2009; Yang & Tokdar, 2015). A hyper-prior on λ is implicitly specified by choosing rj=Cor(ηj(τ),ηj(τ+0.1)|λj)=exp(0.12λj2) to be Be(6, 4) distributed. The priors on (γ0, γ, σ) are purposely left diffuse. These parameters, corresponding to a central quantile plane (if τ00.5) and the overall scale of the response, are well informed by all observations in the data and require little prior regularization. In the case of dealing with positive-valued response, one may choose to fix γ0, γ at zero. See Yang and Tokdar (2017) for more details.

To address the additional copula piece of our model, we propose the following prior specification on θ = (α, ϕ). We choose the Unif(0, 1) distribution as a generic prior for α. This is not an entirely automatic ‘low-informative’ choice. Indeed, our experimentation shows the uniform prior leads to adaptive estimation of the strength of spatial dependence (Section 5.2). However, if prior knowledge is available on the value or the range of the proportion of spatial correlation, a Beta or a truncated prior may be adopted.

More care is needed in choosing a prior for the correlation range parameter ϕ. It is useful to specify bounds for ϕ from the perspective of effective range of spatial correlation, that is, the shortest distance at which the spatial correlation falls below a small threshold (typically 0.05). As discussed earlier, any hope of reliable parameter estimation from spatially dependent data relies on the assumption of near independence between far away observation units. In any particular application, the bounds for ϕ can be determined according to one's belief of lower and upper limits of the effective range. Additionally, from a methodological perspective, the decay parameter is only weakly identified and an informative prior is needed for satisfactory and reproducible posterior inference and computation (Banerjee et al., 2008, 2014).

For a default choice, we follow the convention in the spatial modelling literature and specify the range of ϕ so that the effective range lies between one-fourth and three-fourths of the maximal pairwise distance among all locations (Banerjee et al., 2014). The prior for ϕ is taken to be a discrete uniform distribution over a dense grid of values within the specified range. This choice of range keeps ϕ away from 0 and eliminates the identifiability issue that manifests when α approaches 0. From a computational perspective, copula density evaluation (Section 3.1) is expensive as it requires O(n3)flops in each MCMC iteration. By using a discrete uniform prior, only finitely many correlation matrices are required to be computed and stored before initiating MCMC computation. Consequently, the computational overhead can be reduced to O(n2). See Section 3.3 for additional reduction in computational complexity for large data.

3 Posterior Computation

3.1 Likelihood evaluation

By Sklar's theorem, the joint conditional density of the response data given predictors can be partitioned into a marginal part and a copula part
where FY is the CDF corresponding to pdf fY and cs1:ndenotes the joint density of (U(s1), …, U(sn)) under the spatial copula formulation on (U(s):sS). One may evaluate fY and FY via the identities
(10)
where τx(y) solves y=QY(τ|x)=β0(τ)+xβ(τ)in τ. Therefore, the log-likelihood score of model parameters can be expressed as
(11)
where Ui=τXi(Yi)may be numerically approximated to arbitrary precision as follows. Fix a dense grid of points T = {0 = t1 < ⋯ < tL = 1} and use it to approximate Qi(τ):=β0(τ)+Xiβ(τ)=γ0+Xiγ+τ0τ{β˙0(t)+Xiβ˙(t)}dtat any τ on the grid by the trapezoidal rule of integration, and, by linear interpolation for any τ in between successive grid points. Identify the index l = li such that Qi(tl)Yi<Qi(tl+1)and calculate Ui by analytically inverting the linear interpolation in between these successive grid points. Note that the derivatives β˙0(τ)=σq0(ζ(τ))ζ˙(τ)and β˙(τ)=β˙0(τ)h(η(ζ(τ))), where h(b)=b/{a(b,X)·1+b2}, are required to be evaluated only for τT. See Yang and Tokdar (2017) for more details on the choice of the grid T and calculation of Ui's. The associated O(n)calculations are embarrassingly parallel across observation units.
Let Z = (Φ−1(U1), …, Φ−1(Un)) and K denote the correlation matrix with Kij=ρM(si,sj|(ν,ϕ)). Using spectral decomposition K = ΓΛΓ, the logarithm of the Gaussian copula density is
Note that αΛ+(1−α)In is a diagonal matrix and hence its determinant and inverse are easy to compute. Because the prior on ϕ is finitely supported, only finitely many Λ and Γ are needed to be computed before running MCMC.

3.2 MCMC approximation

An efficient log-likelihood evaluation makes our model amenable to Metropolis type Markov chain sampling from the posterior distribution. We deal with the issue of finite representation of the function valued parameters η0, …, ηp, exactly as in Yang and Tokdar (2017). Briefly, a set of knots T*={τ1*,,τm*}[0,1] are chosen, distinct from the likelihood grid T, and with mL. Each function ηj is represented by its T* based evaluations: ηj*=(ηj(τ1*),,ηj(τm*))Rm. Next, any function evaluation ηj(τ), needed for the likelihood evaluation, is approximated by the associated predictive process interpolation η˜j(τ)=E[ηj(τ)|nj*] (Banerjee et al., 2008; Tokdar, 2007). Due to the Gaussianity of ηj given (κj, λj) and the inverse-gamma prior on κj2, the vector ηj* is conditionally distributed given λj according to a multivariate-t-distribution and the conditional mean E[ηj(τ)|nj*,λj] is available in closed form. The prior on λj is discretized over a dense finite support to write the unconditional prior on ηj*as a finite mixture of t-distributions, and to evaluate η~j(τ)as a finite, weighted sum of the conditional means.

It may be tempting to employ a Gibbs sampler alternating between updating parameters (γ0, γ, σ, η, ζ) of the marginal part and the copula parameters θ = (α, ϕ). However, based on our experience, the scale parameter σ2 and α can be highly correlated because both parameters affect the tightness of Ui's. Instead, we find the parametrization (σs2,σe2)=(ασ2,(1-α)σ2)helps reduce autocorrelation between posterior samples and improves inference efficiency1. Notice that σs2and σe2are spatial variance and pure error variance respectively when our model degenerates to BSRE (16). With this reparametrization, one complete cycle of our Markov chain sampler could be schematically represented as follows2:

  1. Given σs and ϕ, make a Gibbs update of the parameter (γ0, γ, σe, η, ζ) by using the (adaptive) block Metropolis steps of Yang and Tokdar (2017).

  2. Given (γ0, γ, σe, η, ζ) and ϕ, update σs.

  3. Given the current Ui's extracted from step 2 and σs, sample ϕ from its full conditional distribution.

  4. Given other model parameters, jointly update σs and σe.

It is possible to marginalize out ϕ and run the sampler only on other model parameters. But in our experience, such marginalization does not seem to help with mixing and instead adds to computation complexity. Also, even though σe and σs get updated in steps 1 and 3, the additional joint update in step 4 greatly improves mixing. Lastly, the latent process realization W(s) is marginalized out during MCMC runs but can be recovered in post-processing for inference.

3.3 Reduced rank approximation for large n

The O(n3)computational complexity of factorizing a n×n matrix, and the associated O(n2)cost of storage can be prohibitive for implementing the above posterior computation scheme for data with a large sample size n. Various reduced rank approximations to Gaussian processes may be used to alleviate these computation bottlenecks; prominent examples include nearest-neighbour (NN) GP (Datta et al., 2016) and predictive process (Banerjee et al., 2008) with pivoting (Foster et al., 2009). Both models only require O(n)computation and memory for model fitting and prediction (Section 4.1). We implement NNGP using the algorithm described in Finley et al. (2019). On a dataset with 16,000 samples and 7 predictors, our algorithm under 5-NNGP with 10,000 MCMC iterations took about 55 min and the predicted conditional quantiles were statistically accurate. See Supplemental material B for a detailed comparison of computational speeds of our model with full GP and NNGP, and, an analysis of prediction accuracy of the model with NNGP.

4 Spatial Smoothing

4.1 Prediction of conditional quantile

Infill prediction is one of the major objectives in spatial data analysis. Although the model specification (3)(5) appears to target estimating the global quantile predictor effects and spatial dependence separately, our model is able to make infill prediction that adequately accounts for spatial information, thus effectively performing spatial quantile smoothing to new locations where only the covariates have been recorded. Specifically, our model allows for a local adjustment of the quantiles of response Y*at s* via the conditional copula Cs*|s1:n(U*|U1,,Un,θ), where U*=U(s*). Denoting its quantile function as QU*(·|U1,,Un,θ)and combining the marginal model, the conditional τ*th quantile of Y*given X* is
(12)
When using a Gaussian copula process, the quantile function of the conditional copula can be conveniently calculated as follows. Let K* be a n-dimensional vector with Ki*=ρM(s*,si;(ν,ϕ)). Based on the model specification (5), we have Z(s*)|(Z,θ)N(μ(s*),σ2(s*)) where μ(s*) = αK*⊤(αK+(1−α)In)−1Z and σ2(s*)=1−α2K*⊤(αK+(1−α)In)−1K*. Therefore, we obtain QU*(τ*|U1,,Un,θ)=Φ(μ(s*)+σ(s*)Φ-1(τ*)).

Remark 1

The quantile smoothing detailed above assumes Y*is generated from the model (3) with the underlying random level given by U*=U(s*). In other words, Y*is taken to be a yet unobserved unit from the same batch of response realizations as Y1, …, Yn. If instead, Y*was thought to arise from a completely fresh draw, then U*should be treated as independent of the entire process U(s), resulting in τ = τ* in Equation (12).

4.2 Spatial dependence and spatial variation

The conditional quantile function computed above varies smoothly in s*, resembling a spatially varying QR model. It is worthwhile to examine whether our model could be reinterpreted as QY(τ|X,s)=β0(s,τ)+Xβ(s,τ), with conditionally independent observations Y1, …, Yn, as done in Reich et al. (2011). The answer is yes, but in a very limited sense. The limitation is useful both theoretically and practically.

Let Vi=Φ(ε(si)/1-α). Given the spatial process realization W(s), the data generating mechanism (3)(5) can be equivalently expressed as
(13)
where hw,a(s,t)=Φ(w(s)+1-aΦ-1(t))with w:SR, sSand (a,t)(0,1)2. Clearly under model (13), responses Y1, …, Yn are conditionally independent given model parameters, (W(s):sS), and X1, …, Xn. More importantly, this model admits a quantile function in a spatially varying fashion: Q~Y(τ|X,s)=β0(hW,α(s,τ))+Xβ(hW,α(s,τ)), because, the map thw,a(s,t) at each sSis a monotonically increasing diffeomorphism of (0, 1) onto itself. Notice that this formulation offers only a limited flexibility in capturing spatial variability since the location s and the quantile level τ effect the intercept and slope functions through a single, combined input value hw,a(s, τ).

In contrast, the fully spatially varying QR model of Reich et al. (2011) offers much greater shape flexibility in spatial quantile smoothing. But it is of little use when the primary aim is to learn a global relationship between the response and the predictors, unrelated to spatial location. Additionally, a fully spatially varying QR model could be extremely difficult to estimate. Indeed, the estimation method adopted in Reich et al. (2011) is akin to carrying out a post hoc Bayesian smoothing of intercept and slope functions estimated locally via the Koenker-Bassett method. Such an approach could be quite useful for analysing datasets with many repeated observations at each location, even though the two-stage estimation method is likely to offer uncertainty quantifications that are very difficult to interpret. Furthermore, in a simulation study detailed in Supplemental material F, we observed that our model offered excellent spatial quantile smoothing, outperforming Reich et al. (2011) when the spatial variation in regression coefficients was moderate.

5 Numerical Experiments

We carried out several simulation studies to compare the proposed model against existing methods on inference efficiency and prediction accuracy, and to examine the adaptability of our model to different correlation strengths. In Section 5.1, we demonstrate that in quantile modelling for spatial data, neglecting or inappropriately incorporating spatial dependence results in suboptimal statistical performance by including competing methods that are designed for independent data or offer limited adjustment for spatial dependence. To be comprehensive on simulation design, two examples are included: one with a univariate predictor (p = 1) and one with multivariate predictors (p = 7). To investigate robustness of our model against model misspecification, two dependency patterns, one of which deviates from the Gaussian copula process assumption, are considered for each example. In Section 5.2, we illustrate that our model is capable of adapting to different levels of dependence and recovering the true underlying correlation structure by assessing estimation accuracy of copula parameters and induced correlations.

5.1 Statistical performance assessment

5.1.1 Simulation setup

Spatially dependent data sets were generated on S=[0,1]2by extending the simulation setting of Yang and Tokdar (2017). Spatial locations s1, …, sn were sampled uniformly from S. A 2 × 2 design with 100 replications each was used with two choices of the marginal QR model and 2 choices of the spatial copula process:

  • M1. Simple regression with p = 1:
  • M2. Multiple regression with p = 7:
    with
    where ϕ(·) denotes the pdf of N(0, 1).
  • C1. Asymmetric Laplace copula process (Lum & Gelfand, 2012): Ui=FAL(W(si);τ),W(si)=2ξiτ(1-τ)Z(si)+1-2ττ(1-τ)ξi,ξiiidExp(1)  

    where FAL(·;τ) is the CDF of asymmetric Laplace distribution AL(τ) with pdf fAL(x;τ)=τ(1 − τ) exp {−ρτ(x)}.

  • C2. Gaussian copula process:

The two marginal QR models are the same as in Yang and Tokdar (2017). For either copula process, we set (α, ν, ϕ) = (0.7, 2, 0.3). Additionally, for C1, we fixed τ = 0.4. The resulting asymmetry in the true copula process was used as an opportunity to assess the robustness of symmetric Gaussian copula based JSQR under model misspecification. For M1, each synthetic data set consisted of a training set with n = 200 observations, and for M2, each set consisted of n = 500 observations. For each set, a test set containing 50 observations was used for hold-out validations.

The proposed joint spatial quantile regression model (JSQR3 ) was compared against four alternatives; KB: the classical optimization based method by Koenker and Bassett (1978), JQR: the joint quantile regression model by Yang and Tokdar (2017), ALD: Bayesian quantile regression model using asymmetric Laplace error distribution by Yu and Moyeed (2001), and, ALP: spatial quantile regression model using asymmetric Laplace process by Lum and Gelfand (2012). Among these methods, KB, JQR and ALD do not incorporate spatial dependence.

These competing methods were compared based on mean absolute errors (MAE) of point estimates of regression coefficients and corresponding coverage probabilities of 95% confidence (or credible) intervals. We also compared the MAE of predicted conditional quantiles |QY(τ|X,s,U1,,Un)-Q̂Y(τ|X,s,U1,,Un)|averaged over all observations in the test set. Here Q̂Y(τ|X,s,U1,,Un)is the predicted conditional quantile function given by different methods. For KB, JQR and ALD, this function is precisely Q̂Y(τ|X). For ALP, we adopted the spatially adjusted quantile function by conditioning on the latent process. All these comparisons were performed at quantile levels {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} and averaged over all 100 data sets under each scenario. For Bayesian methods, posterior means were used as point estimates. The confidence intervals for KB were constructed by inverting a rank test proposed by Koenker (1994). Algorithm configurations are detailed in Supplemental material A.1. Results from M1 are presented in Figure 1 and those from M2×C1 are shown in Figure 2. Results from M2×C2, which are very similar to M2×C1 results, are included in Supplemental material A.2, Figure 1.

Inference efficiency of different methods for M1×C1 (top) and M1×C2 (bottom). In each row, the left two panels present the mean absolute errors of regression coefficients and the right two panels show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}
Figure 1

Inference efficiency of different methods for M1×C1 (top) and M1×C2 (bottom). In each row, the left two panels present the mean absolute errors of regression coefficients and the right two panels show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}

Inference efficiency of different methods for Example 2 with true generating process being asymmetric Laplace process. The top two rows present the mean absolute errors of regression coefficients while the bottom two rows show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}
Figure 2

Inference efficiency of different methods for Example 2 with true generating process being asymmetric Laplace process. The top two rows present the mean absolute errors of regression coefficients while the bottom two rows show the coverage probabilities of 95% confidence (or credible) intervals of regression coefficients at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}

5.1.2 Key findings

In every design, JSQR provided smallest averaged errors and highest (and approximately nominal) coverage probabilities for estimating the intercept and slope functions across all quantile levels. The three non-spatial methods, KB, JQR and ALD gave distinctly worse performance, particularly on coverage probability. KB and JQR were generally similar to one another, with the latter offering better performances for certain covariates. By design, KB and ALD gave same parameter estimates, but the latter clearly had lower coverage probabilities; this issue has been raised before in Tokdar and Kadane (2012).

The estimates by ALP were significantly worse than the rest of the group for quantile levels away from 0.5, both in terms of accuracy and coverage. ALP severely distorted the estimates of the intercept function and consequently offered poor estimates of slope functions at non-central quantile levels. Since ALP assumes an asymmetric Laplace process on errors instead of on quantile levels, the model is misspecified at any given quantile level, even though data in C1 were simulated under an asymmetric Laplace copula process.

A similar picture emerged on quantile kriging accuracy at hold-out data (Figure 3). JSQR offered substantial improvements across all quantile levels compared to the methods that did not adjust for spatial dependence. ALP came a distant second for central quantile levels, but actually performed worse at the extremities.

Mean absolute errors of conditional quantile function at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}
Figure 3

Mean absolute errors of conditional quantile function at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}

Together, this empirical evidence underlines the usefulness of a proper model based spatial QR estimation for improved parameter estimation and quantile kriging. It is noteworthy that ALP, despite its limitations, improves estimation and kriging accuracy, thanks to spatial dependence adjustment, at quantile levels for which the assumed residual process is not too different from the true data generating distribution. But its limitations show up dramatically for non-central, and particularly, extreme quantile levels. The power and flexibility of the joint QR model is evident in JSQR's superior estimation quality across all quantile levels. Moreover, JSQR appears reasonably robust against misspecification of the copula process.

5.2 Adaptation to dependence strength

In a second set of experiments, we examined the adaptability of JSQR to different strengths of spatial dependence. Specifically, 100 data sets, each with n = 500, were generated as in M1×C2, but for each set a distinct and random draw was made for the parameter values of (α, ϕ). The proportion parameter α was drawn from Unif(0, 1) and the decay parameter ϕ was randomly drawn from its prior range. The JSQR estimation method was employed exactly as in Section 5.1. For each dataset, we estimated α, ϕ and 10 pairwise correlations rij = αρ(si, sj ;(ν, ϕ)) between 5 randomly selected observed locations, by their respective posterior means. Mean absolute errors of estimates over 100 simulated data sets are reported in Table 1, which also includes the lengths and the coverage probabilities of the respective 95% credible intervals.

Table 1

Left column: mean absolute errors of posterior means of α and ϕ and induced pairwise correlations rij = αρ(si, sj; (ν, ϕ)). Middle and right columns: coverage probabilities and average lengths of 95% credible intervals of α, ϕ and rij. Standard deviations are shown as subscripts

Absolute errorCP of 95% CILength of 95% CI
α0.050.040.960.250.11
ϕ0.040.030.950.170.06
rij0.030.040.9540.140.10
Absolute errorCP of 95% CILength of 95% CI
α0.050.040.960.250.11
ϕ0.040.030.950.170.06
rij0.030.040.9540.140.10
Table 1

Left column: mean absolute errors of posterior means of α and ϕ and induced pairwise correlations rij = αρ(si, sj; (ν, ϕ)). Middle and right columns: coverage probabilities and average lengths of 95% credible intervals of α, ϕ and rij. Standard deviations are shown as subscripts

Absolute errorCP of 95% CILength of 95% CI
α0.050.040.960.250.11
ϕ0.040.030.950.170.06
rij0.030.040.9540.140.10
Absolute errorCP of 95% CILength of 95% CI
α0.050.040.960.250.11
ϕ0.040.030.950.170.06
rij0.030.040.9540.140.10

We note that accurately estimating induced pairwise correlations rij is potentially more important than accurately estimating the copula parameters α and ϕ. This is because, the induced correlations directly determine the dependence between observations and hence are more critical in adjusting for the spatial dependence toward a more accurate estimation of the marginal QR intercept and slope parameters. Induced correlations are also more identifiable than copula parameters because different combinations of α and ϕ may yield similar values of rij. This argument is verified by the results presented in Table 1 which show that our model accurately estimated rij. The 95% credible interval was narrow and its coverage was close to nominal level. Both absolute errors and length of 95% credible intervals were smaller than those of α and ϕ. As the prior range for ϕ was around (0.1, 0.4), both error of point estimate and uncertainty of ϕ were relatively greater than those of α and rij. This is expected as decay parameter is weakly identified. The results suggest that our model is capable of adapting to different levels of dependence and recovering the underlying correlation structure.

6 Modeling Heavy-Tailed Response Data

6.1 Tail dependence and t-copula

One of the biggest appeals of QR is that it enables analysing predictor effects at extreme quantile levels, which could be particularly relevant for analysing heavy-tailed response data. In order to model independent data with heavy-tailed response, it is straightforward to adapt JQR by adopting a heavy-tailed base distribution f0. However, the same adaptation alone for JSQR is inadequate for modelling heavy-tailed, spatially dependent response because Gaussian copula is tail independent (Sibuya, 1960) and hence would fail to account for possible dependence between extreme outcomes at nearby locations. This deficiency, which could lead to considerably biased prediction of extreme quantiles, may be rectified by employing a heavy-tailed base distribution in conjunction with a spatial copula process that admits tail dependence. We achieve this by using the so called t-copula process (Embrechts et al., 2001), a tail dependence encoding generalization of the Gaussian spatial copula.

Following Rasmussen and Williams (2006), we say f is a t process on Xwith parameter ψ > 0, mean function m(·):XR, covariance function k(·,·):X×XRif any finite collection of function values (f(x1),…,f(xn))tn(ψ, m, K) where Kij = k(xi, xj) and mi = m(xi). We denote fTP(ψ, m, k). Here td(ψ, μ, Σ) denote a multivariate t-distribution on Rdwith location μ, scale matrix Σ and degree of freedom ψ with pdf
Let Tψ denote the CDF of univariate, standard t-distribution with degrees of freedom ψ. We take the copula process to be U(s) = Tψ(Z(s)), sS, where
(14)
An additive decomposition similar to Equation Equation (5) can be retrieved via an equivalent Gaussian process mixture representation
The amount of tail dependence is controlled by ψ with smaller values inducing greater dependence. The t-spatial process approaches a Gaussian spatial process in the limit as ψ → ∞, with the asymptotic equivalence kicking in for moderately large values of ψ. We restrict ψ to be larger than 0.5 to avoid numerical instability and adopt Exp(0.1) as prior for (ψ−0.5). Other priors (e.g. uniform) are also examined and the results show little sensitivity to the prior choice. Prediction of conditional quantile can be performed similarly to that of the Gaussian copula process and is detailed in the Supplemental material C.

Although the t-copula admits tail dependence, neither it nor the Gaussian copula can accommodate different lower and upper tail behaviours. To this end, skew-elliptical copulae, based on skew normal distributions (Azzalini & Valle, 1996) and skew t-distributions (Azzalini & Capitanio, 2003) may be considered, especially for applications where tail asymmetry is a primary concern, for example, in finance applications (McNeil et al., 2005). However, our results in Section 5.1 indicate that our model with the Gaussian copula is robust against mild asymmetries in true copula process. Interested readers are referred to Demarta and McNeil (2005) and Smith et al. (2012) on the construction of skew-copulae.

6.2 Model assessment and comparison

Because a JSQR analysis could be carried out with various combinations of the marginal and dependence models, a natural question arises as to how to compare such competing models and select one with better predictive quality. A desirable model comparison tool here should be capable of comparing models, with different sets of predictors, base distributions and copula processes, using a single, unified criteria. Apropos of this consideration, we advocate using the Watanabe-Akaike information criteria (Watanabe, 2010, WAIC) for model comparison and selection. WAIC is particularly attractive in complex Bayesian modelling like ours for at least two reasons. From a theoretical perspective, WAIC has the property of averaging over posterior distributions and is asymptotically equivalent to a Bayesian leave-one-out cross validation. From a practical view point, WAIC is computationally efficient as it can be simply calculated by reusing posterior samples (Gelman et al., 2014).

There is an important technical hurdle to applying WAIC directly in our context. The construction of WAIC relies on independence among observations so that the log-likelihood score can be decomposed as the sum of n individual contributions. This is not the case for JSQR because of the copula piece which captures dependence between observation. We circumvent this problem by invoking conditional independence between observation units given the smooth spatial process W, which may be treated as an additional model parameter for WAIC calculation.

According to Equation (10), the log-likelihood for ith observation with W(s) as an additional parameter is
(15)
where h˙W,α(si,Vi)=1-αϕ(Z(si))ϕ(Φ-1(Vi)).The process realization can be recovered according to its posterior
and then Vi=Φ((Z(si)-W(si))/1-α). The derivation for t-copula process is similar and is detailed in Supplemental material D.

Following Watanabe (2010), we establish an approximate mathematical equivalence between the conditional WAIC score proposed here and the conditional Bayesian leave-one-out cross validated log probability density score; details are provided in Supplemental material E. Using the proposed conditional WAIC, the results of model comparison for both simulated (Section 6.3) and real data (Section 7) demonstrate that the criteria is effective in picking up the true model and generally agrees with the actual prediction performance. Since our work focuses on modelling spatial extremes and spatial dependence, we highlight model comparisons between joint quantile regression models with different base distributions (Gaussian or t) and dependency structures (independence, Gaussian or t-copula processes). By contrasting these three dependency structures, we are able to determine whether incorporating spatial dependence and whether using the t-copula process result in model improvement. As selecting predictors is not our focus here, we refer interested readers to Cunningham et al. (2020) for the use of WAIC in that regard.

Remark 2

The idea of conditional WAIC has been investigated in Li et al. (2016) and Millar (2018) in some widely used Bayesian hierarchical models (e.g. finite mixture model, random effects model) where they argue that the conditional WAIC could be unstable and an alternative ‘integrated WAIC’ (by further marginalizing out the latent variables associated with hold-out observations) is suggested. However, we do not observe any instability of the conditional WAIC under the proposed model and there is little difference between values of the conditional WAIC and ‘integrated WAIC’. Therefore, we suggest using the conditional WAIC in comparing JSQR models due to its simplicity.

Remark 3

Several methods exist for selecting a copula model (Chen & Fan, 2006; Dissmann et al., 2013; Genest et al., 2006; Huard et al., 2006; Smith et al., 2010). However these techniques are unsuitable for our model with multivariate elliptical copula and latent quantile levels. Additionally, a unified criteria is required for determining various components of the JSQR model, for which a single numerical model scoring method, as given by the WAIC, is practically appealing.

6.3 Illustration: model comparison using WAIC

Does the use of a t-copula process improve predictive accuracy in analysing heavy-tailed response data? Among various flavours of the JSQR model, does the one with best prediction accuracy get picked when the model with the lowest WAIC score is selected? We compared JQR, JSQR with the Gaussian copula process (JSQR-GP) and JSQR with the t-copula process (JSQR-TP). All three models adopted a t base distribution with an unknown degree of freedom to be learned from data. We simulated synthetic data from the linear QR formulation with an univariate predictor XUnif(−1, 1), where
Three dependency structures were considered: (i) independence, (ii) the Gaussian copula process with (α, ν, ϕ) = (0.7, 2, 0.3), and, (iii) the t-copula process with (α, ν, ϕ, ψ) = (0.7, 2, 0.3, 3). Notice that each generating model matched exactly one estimation model under comparison; the rest were either inadequate or redundant. For each scenario, we simulated 100 training sets of sample size 500 and 100 test sets of sample size 50.

In each scenario, we calculated the differences in WAIC of other two models and that of the matching model. According to Table 2, JSQR-GP and JSQR-TP generally had comparable WAICs because they provided similar model fits if the data only contains a few extreme observations. Due to the penalization of more effective model parameters, the WAICs provided by two JSQR models were just slightly larger than JQR when observations were independent. The mean absolute errors of predicted conditional quantiles were computed over observations in the test set and presented in Figure 4, the last panel of which highlighted the differences between JSQR-GP and JSQR-TP in scenario (iii). It is clear that t-copula process improves over Gaussian copula process at extreme quantile levels by 10% to 20% when data are generated from a heavy-tailed marginal distribution with a tail-dependent copula.

First three panels: mean absolute errors of predicted conditional quantiles at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}. The fourth panel: A zoomed version of the third panel highlighting the comparison between JSQR-GP and JSQR-TP for scenario (iii)
Figure 4

First three panels: mean absolute errors of predicted conditional quantiles at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99}. The fourth panel: A zoomed version of the third panel highlighting the comparison between JSQR-GP and JSQR-TP for scenario (iii)

Table 2

Average Watanabe-Akaike information criteria (WAIC) scores (relative to the matching model) and model selection rates

TruthJQRJSQR-GPJSQR-TP
ΔWAICSelectionΔWAICSelectionΔWAICSelection
Independent noise78%+1.88%+2.114%
Gaussian copula+442.9056+0.544
t-copula+402.70+1.82080
TruthJQRJSQR-GPJSQR-TP
ΔWAICSelectionΔWAICSelectionΔWAICSelection
Independent noise78%+1.88%+2.114%
Gaussian copula+442.9056+0.544
t-copula+402.70+1.82080
Table 2

Average Watanabe-Akaike information criteria (WAIC) scores (relative to the matching model) and model selection rates

TruthJQRJSQR-GPJSQR-TP
ΔWAICSelectionΔWAICSelectionΔWAICSelection
Independent noise78%+1.88%+2.114%
Gaussian copula+442.9056+0.544
t-copula+402.70+1.82080
TruthJQRJSQR-GPJSQR-TP
ΔWAICSelectionΔWAICSelectionΔWAICSelection
Independent noise78%+1.88%+2.114%
Gaussian copula+442.9056+0.544
t-copula+402.70+1.82080

The WAIC and MAE of predicted conditional quantile were consistent as their differences between any two models were small or large simultaneously. Also, the base models provided the smallest WAICs for most data sets in all three scenarios according to Table 2. These evidence suggest that WAIC accurately reflects the predictive performance of a model and is suitable for comparing joint QR models with different dependency structures.

7 Case Studies

Two case studies are presented to demonstrate the potential of JSQR models in real applications. In both examples, we showcase the superior prediction performance of the proposed models compared to existing others, and, the effectiveness of WAIC in selecting base distributions and copula processes. In the analysis of PM2.5 concentration data (Section 7.1) with 339 observations, we highlight the insightful heterogeneous predictor effects offered by JSQR in contrast to the classical BSRE model. In the wildfire risk analysis (Section 7.2), we feature the t-copula process in capturing spatial tail dependence with a substantially larger dataset with 1855 observations.

We clarify that this section serves as a validation of the proposed model in real data analysis. Both WAICs and prediction performances are evaluated using multi-fold validation studies to reduce randomness. However, readers who intend to use the JSQR model in their applications should fit the model on the whole dataset (without split) regardless of the purposes of model selection, predictor effect inference or quantile prediction.

7.1 Analysis of PM2.5 concentration data

Exposure to fine particulate matter (diameter less than 2.5 μ m) is positively associated with the risk for lung cancer and cardiovascular disease morbidity and mortality (Brook et al., 2010; Dockery et al., 1993; Pope & Dockery, 2006). Various regression models, for example, generalized additive models (Yanosky et al., 2008b), Cox hazard models (Jerrett et al., 2005; Pope et al., 1995) and conditional autoregressive models (Paciorek, 2013), have been proposed to predict PM2.5 concentration. These methods focus only on the response mean prediction. But, arguably, predicting high response quantiles is of a greater concern here. High PM2.5 concentration is more harmful as acute exposure could trigger cardiovascular morbidity (Bell et al., 2005). Additionally, the relationship between PM2.5 and many important predictors (detailed below) appears more complex than a homoskedastic linear dependence (Figure 5), rendering classical mean regression analyses rather inadequate. Quantile regression has been recently employed to analyse PM2.5 concentration (Barmpadimos et al., 2012; Porter et al., 2015). However, analyses based on KB estimates clearly overlook the spatial dependence of data from different monitoring stations as presented in Figure 5a.

Averaged daily PM2.5 concentration (μg/m3) during 2001-2 across monitoring stations in northeastern United States (a) and against two potential relevant predictors (b)-(c); response data does not appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]
Figure 5

Averaged daily PM2.5 concentration (μg/m3) during 2001-2 across monitoring stations in northeastern United States (a) and against two potential relevant predictors (b)-(c); response data does not appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]

We reanalysed data4 on PM2.5 reported in Paciorek (2013) using the proposed JSQR models. The dataset contained averaged daily PM2.5 concentrations (shown in Figure 5) from 2001 to 2002 monitored at 339 stations in the northeastern United States. In Yanosky et al. (2008a) and Paciorek et al. (2009), the following 5 covariates have been shown to have significant associations with PM2.5 or improve predictions and hence were included into our analysis: (i) logarithm of population density at county level (LCYPOP), (ii) logarithm of distance to A1 class roads (primary roads, typically interstates) (LDISTA1), (iii) proportion of urban land use within 1 kilometre of the station location (URB), (iv) logarithm of PM2.5 emissions within a 10 kilometre buffer (L25E10), and, (v) elevation of the station (ELEV). We refer interested readers to Yanosky et al. (2008a) for a detailed description of the dataset.

7.1.1 Assessment of infill quantile prediction

To examine the existences of heavy-tailedness and tail dependence, we adopted three JSQR models with different combinations of base distributions and copula processes: JSQR-GP1 with a logistic base and Gaussian copula, JSQR-GP2 with a t base and Gaussian copula, and, JSQR-TP with a t base and a t-copula. The methods in Section 5.1 were also applied for comparison. A 10-fold validation study was conducted to assess how well different spatial and non-spatial quantile regression methods performed in predicting response quantiles at hold-out locations. In each fold, the data were randomly partitioned into a training set and a test set with 270 and 69 observations respectively. All prior specifications and MCMC settings were kept same as those in Section 5. The accuracy of quantile prediction at a level τ was evaluated by the check loss ρτ(Yi-Q̂Yi(τ|Xi,si,U1,,Un))averaged over all observations in test sets.

Among the four joint QR models, JSQR-GP1 had the smallest average WAIC (808), closely followed by JSQR-GP2 and JSQR-TP (both at 810) and trailed by JQR (1124). This relative ordering suggests the existence of spatial noise correlation among observation units, but the distribution of PM2.5 does not appear to be heavy-tailed (consistent with Figure 5d) or exhibit tail dependence. The ordering of WAICs was coherent with actual prediction performance, namely, JSQR-GP1 offered slightly smaller average check losses than other JSQR models and provided a clear improvement over JQR. Therefore, we adopted JSQR-GP1 as a representative of JSQR models for all subsequent analyses.

Prediction accuracy of QR models, measured by averaged check loss relative to BSRE, are summarized in the right panel of Figure 6. For QR models, JSQR-GP1 consistently offered the smallest averaged loss across all quantile levels on test sets. The losses given by JQR, KB and ALD were similar and typically the largest except for extreme quantile levels. ALP had smaller losses than the three non-spatial methods from τ = 0.2 to 0.8 but provided larger losses out of this range. Overall, these results are consistent with our findings in Section 5.1 and underline the necessity of adjusting for spatial dependence toward a more accurate QR analysis of PM2.5 concentration. We note that BSRE and JSQR offered comparable prediction accuracy for this data set. This is perhaps due to the fact that our data consisted of daily measurements averaged across two years, thus boosting Gaussianity of the joint distribution of predictors and response. However, JSQR did offer smaller prediction error at very high quantile levels (⩾ 0.9). Furthermore, JSQR offered a more nuanced view of the relationship between certain predictors and PM2.5, especially at high quantiles, which we turn to next.

Left: PM2.5 regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]
Figure 6

Left: PM2.5 regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]

7.1.2 Heterogeneous predictor effect on response quantiles

To understand how predictors may impact PM2.5 concentration heterogeneously across different quantile levels, we refitted the entire dataset with JSQR-GP1 and compared parameter estimates against those from the basic spatial random effects model which only allows making inference on mean predictor effects. To recall, the basic spatial random effects model (Cressie, 1993) is
(16)
where W(s) ∼ GP  (0,σs2ρ(s,s;θ))can be considered as spatial random effects and ɛ(s) ∼ N  (0,σe2)is independent pure error. To contrast with JSQR, notice that the conditional response quantiles under BSRE are given by QYi(τ|Xi)=β0+σeΦ-1(τ)+Xiβ.

Both BSRE and JSQR detected strong spatial noise correlation. The proportion of spatial variation α=σs2σs2+σe2was estimated to be 81% by BSRE and 86% by JSQR. The corresponding estimates of the decay parameter ϕ were 0.43 and 0.46. BSRE and JSQR provided similar median and interval estimates for the intercept function across all quantiles (Figure 6, left panel). The two models exhibited agreement to some extent on the effects of population density (LCYPOP) and elevation (ELEV). JSQR estimated the effects of these predictors to be relatively stable across all quantile levels. The signs of the regression coefficients matched expectation and were in line with existing literature (Yanosky et al., 2008b). Given other predictors in the model, the distance between monitoring station and A1 class roads (LDISTA1) was not found to have a significant effect by either model.

A difference between JSQR and BSRE emerged in the estimates of regression coefficients of the percentage of urban land use (URB) and PM2.5 emission within 10 kilometres (L25E10). BSRE estimated the effects of both these predictors to be significant and positive. JSQR estimated URB to have a severe impact at low and middle quantile levels but little effect on the high quantiles. It also estimated L25E10 to have no significant effect at low and middle quantile levels, but a fairly pronounced positive effect at high and extremely high levels. These differential effects were significant; the difference between the slopes at τ = 0.9 and τ = 0.1 was estimated to be −0.02 (95% credible interval = [−0.04, 0.00]) for URB and 0.15 ([0.03 0.29]) for L25E10.

The JSQR estimates could be interpreted as saying that while higher urban land use results in an increased baseline PM2.5 concentration, it is the amount of local PM2.5 emission that largely determines extreme levels of PM2.5 concentration. At the 95th percentile level, the regression coefficient of L25E10 was estimated to be 0.17. Accordingly, a rise of PM2.5 emission by 36 times within a 10km radius (one standard deviation increase of L25E10) would result in an increase of 0.6 μg/m3 of the 95th percentile of PM2.5 concentration which corresponds to a 7% (or 4%) increase for states such as Maine, New Hampshire and Vermont with low (or, Ohio, Pennsylvania and DC with high) PM2.5 concentrations.

7.2 Wildfire risk analysis

Wildfires cause significant, lasting damage to ecology (Moritz et al., 2014), economy (Butry et al., 2001) and human health (Reid et al., 2016). The burning index (BI) is a metric commonly used by National Fire Danger Rating System to monitor and forecast risk due to wildfires. The calculation of BI is sophisticated and requires measuring fuel characteristics, particle properties and moisture (Cohen & Deeming, 1985). Since collecting such measurements requires field-specific knowledge and equipment and is impossible to conduct for every location, model-based approaches are essential for analysing and predicting BI with data that can be conveniently obtained. Kreuzer et al. (2017) propose a simultaneous autoregressive model with t-distributed, spatially dependent random errors to analyse BI. While such a model is able to adjust for the heavy tails of the mean residuals, it fails to provide a detailed reconstructions of the extreme right tail quantiles of BI which are of direct interest because of the associated heightened risk.

We analysed daily wildfire risk data collected by Wildland Fire Assessment System (https://www.wfas.net/) on 19 August 2020. The data contained fire weather and fire danger observations at 1855 stations across United States (excluding Alaska and Hawaii). As expected, high values of BI were concentrated in California and the data suggested the potential existence of spatial dependence (Figure 7a). We adopted elevation and weather information as the only predictors because they are easy to collect and hence are particularly useful for surrogate BI calculation. Specifically, we included (i) elevation (ELEV), (ii) temperature (TEMP), (iii) relative humidity (RH), (iv) wind speed (WIND) and (v) precipitation (PPT) into our analysis. The focus on non-central quantiles of BI, the existence of spatial dependence, and, the heteroskedasticity of BI (Figure 7b, c) simultaneously motivated the use of JSQR models.

Burning Index measured on 19 August 2020 across contiguous US (a), and plotted against wind speed (b) and relative humidity (c). Response data appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]
Figure 7

Burning Index measured on 19 August 2020 across contiguous US (a), and plotted against wind speed (b) and relative humidity (c). Response data appear heavy-tailed (d) [Colour figure can be viewed at https://academic.oup.com/]

Similar to Section 7.1.1, we performed a 10-fold validation study with the same set of methods. In each fold, the data were randomly partitioned into a training set and a test set with 1000 and 855 observations respectively. Among the four joint QR models, JSQR-TP had the smallest averaged WAIC score (7957), followed by JSQR-GP2 (7970), JSQR-GP1 (7976) and JQR (8482), lending support to incorporating a heavy-tailed marginal distribution and spatial tail dependence into the analysis. JSQR-TP was chosen as a representative of JSQR models for subsequent analyses.

The regression coefficient estimates provided by JSQR-TP and BSRE were substantially distinct (Figure 8, left panel). The signs of estimated regression coefficients given by the two models generally agreed. Elevation, relative humidity and precipitation had non-positive effects at all quantile levels, whereas temperature and wind speed had non-negative effects. JSQR-TP detected clear heterogeneous predictor effects on TEMP, RH and WIND, estimating much stronger effects on the right tail despite increased uncertainty. These variables are known to be associated with the two major components determining BI, namely the energy release which increases with higher temperature and lower humidity, and, spread which is enhanced by high winds (Cohen & Deeming, 1985). Our QR analysis suggests that these three predictors have a particularly strong impact on the severity of extreme wildfires.

Left: wildfire risk regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]
Figure 8

Left: wildfire risk regression coefficient estimates (and 95% bands) by BSRE and JSQR. Right: average check loss given by QR methods relative to BSRE at τ ∈ {0.01, 0.05, 0.1, …, 0.9, 0.95, 0.99} [Colour figure can be viewed at https://academic.oup.com/]

Figure 8 (right panel) compares the prediction accuracy of JSQR against other competitors (all figures shown relative to BSRE). JSQR-TP provided the smallest averaged loss consistently at all quantile levels. A key distinction from the PM2.5 study is that the QR models provided much smaller losses than BSRE, especially for non-central quantile levels. Particularly, JSQR-TP offered 20 to 40% improvement over BSRE at both lower and upper quantile levels. The Gaussian noise assumption underlying BSRE creates a handicap in modelling skewed, heavy-tailed daily BI measurement. But, even if this shortcoming were rectified with heavy-tailed error distribution model, the heterogeneous predictor effect estimates offered by JSQR could not be accomplished with such extensions.

8 Concluding Remarks

We have introduced here a novel generalization of the joint quantile regression model of Yang and Tokdar (2017) to adjust for noise correlation in point-referenced spatial data. We have provided ample evidence that such adjustment improves parameter estimation, uncertainty quantification and prediction. As an extension of BSRE, the proposed JSQR model offers a comprehensive and practicable solution to the problem of spatial regression beyond the mean. Through two case studies we have shown that JSQR produces interpretable estimates of potentially heterogeneous predictor effects, offers excellent fit to hold-out data and can successfully adapt to heavy-tailed response and tail dependence. An accompanying R package is under development; but a ready-to-deploy version maybe accessed at https://github.com/xuchenstat/JSQR.

Our development crucially depends on accepting the linear QR model assumption (1) simultaneously for all quantile levels τ ∈ (0, 1). This strong assumption, a ‘leap of faith’ according to Koenker (2017), should be given its due diligence before adopting a JSQR, or for that matter, any joint quantile regression analysis of data. It is conceivable that the global linear QR model is adequate only when a sufficiently rich set of nonlinear transformations of the original covariates are included as predictors. In particular, any serious application with JQR or JSQR should include diagnostics of model fit, which could be performed by assessing uniformity of the residual random levels V1, …, Vn introduced in Section 6.2. Additionally, iterative and interactive model improvement could be sought via addition of nonlinear transformations of original covariates as new or alternative predictors, and a formal model selection may be performed by using WAIC. See Cunningham et al. (2020) for recent development along this line.

Joint quantile regression models are a class of conditional density estimation models and are naturally related to the so called ‘density regression’ methods which attempt to directly estimate the conditional response density given predictors (De Iorio et al., 2004; Dunson et al., 2007; Tokdar et al., 2010). Clearly, one could obtain estimates of conditional quantiles under a density regression approach, just as one could easily obtain estimates of conditional densities under a joint QR model. Which approach is selected for a particular data analysis task depends primarily on the goals of the analysis. A major appeal of the joint QR approach is interpretability. The estimates of the slope functions give a clear picture of potentially heterogeneous predictor effects, which may be vital in a careful scientific analysis of predictor-response relationship (Abrevaya & Dahl, 2008; Cade & Noon, 2003; Wasko & Sharma, 2014). Density regression, on the other hand, offers more flexible quantile shape adjustment which could be more useful for applications focused on prediction. However, even for pure prediction purposes, joint QR, via nonlinear transformations of covariates, may be competitive against density regression methods which often struggle to account for sharp changes in the spread and skewness of the conditional densities, a hallmark of extremely heterogeneous predictor effects; see Tokdar (2013). Other than density regression, joint quantile regression is also a member of the distributional regression family, which posits distributional statistics of the response as functions of predictors (Firpo et al., 2009; Foresi & Peracchi, 1995; Klein et al., 2015, to name a few). The choice of statistics substantially determines flexibility and interpretation of the model. As our focus is on quantile regression, direct comparison with other distributional regression models is out of the scope of this work; see Koenker et al. (2013) for an interesting discussion.

Our results here are of empirical nature. A rigorous mathematical treatment of asymptotic properties of JSQR remains the subject of a future paper. The assumption of dependence between observations makes it quite challenging to study posterior consistency properties of the proposed Bayesian method. However, the conditional independence implied by the formulation (13) offers a path to addressing this challenge under an infill asymptotic setting where W is an additional infinite dimensional parameter to be estimated along with β0(·)and β(·).

Finally, we note that extensions similar to JSQR could be sought to address a variety of other dependency structures within a joint quantile regression model, with applications to time series data, longitudinal data, or data where the response is vector valued. Such developments crucially depend on the choice of the copula families that adequately address relevant aspects of statistical estimation and computing. Details will be presented in a forthcoming paper.

Supporting Information

Additional supporting information may be found online in the Supporting Information section.

1

Effective samples sizes are boosted by 27% and 42%, respectively for p = 1 and p = 7 in the simulation studies of Section 5. Mean absolute errors and coverage probabilities of regression coefficients are also improved.

2

All function valued parameters are first updated in their finite representations and then used in likelihood evaluation as described.,

3

Without special notice, we use JSQR to denote our joint spatial quantile regression model (4) with a standard logistic density f0 and the Gaussian copula process (5).

Acknowledgement

This research is supported by NSF grants DMS-1613173 and DMS-2014861. XC gratefully acknowledges support from the Katherine Goodman Stern Fellowship. We thank the editor, AE and two referees for their valuable suggestions for improving the manuscript.

References

1

Abrevaya
,
J.
&
Dahl
,
C.M.
(
2008
)
The effects of birth inputs on birthweight: Evidence from quantile estimation on panel data
.
Journal of Business & Economic Statistics
,
26
(
4
),
379
397
.

2

Azzalini
,
A.
&
Capitanio
,
A.
(
2003
)
Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
65
(
2
),
367
389
.

3

Azzalini
,
A.
&
Valle
,
A.D.
(
1996
)
The multivariate skew-normal distribution
.
Biometrika
,
83
(
4
),
715
726
.

4

Banerjee
,
S.
,
Gelfand
,
A.E.
,
Finley
,
A.O.
&
Sang
,
H.
(
2008
)
Gaussian predictive process models for large spatial data sets
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
70
(
4
),
825
848
.

5

Banerjee
,
S.
,
Carlin
,
B.P.
&
Gelfand
,
A.E.
(
2014
)
Hierarchical modeling and analysis for spatial data
.
Boca Raton
:
Chapman and Hall/CRC
.

6

Barmpadimos
,
I.
,
Keller
,
J.
,
Oderbolz
,
D.
,
Hueglin
,
C.
&
Prévôt
,
A.
(
2012
)
One decade of parallel fine (PM 2.5) and coarse (PM 10–PM 2.5) particulate matter measurements in Europe: Trends and variability
.
Atmospheric Chemistry and Physics
,
12
(
7
),
3189
3203
.

7

Bell
,
M.L.
,
Dominici
,
F.
&
Samet
,
J.M.
(
2005
)
A meta-analysis of time-series studies of ozone and mortality with comparison to the national morbidity, mortality, and air pollution study
.
Epidemiology (Cambridge, Massachusetts)
,
16
(
4
),
436
.

8

Brook
,
R.D.
,
Rajagopalan
,
S.
,
Pope
 III,
C.A.
,
Brook
,
J.R.
,
Bhatnagar
,
A.
,
Diez-Roux
,
A.V.
et al. (
2010
) :
an update to the scientific statement from the American Heart Association. Circulation
.
121
(
21
),
2331
2378
.

9

Buchinsky
,
M.
(
1994
)
Changes in the US wage structure 1963–1987: Application of quantile regression
.
Econometrica: Journal of the Econometric Society
,
62
(
2
),
405
458
.

10

Butry
,
D.T.
,
Mercer
,
E.
,
Prestemon
,
J.P.
,
Pye
,
J.M.
&
Holmes
,
T. P.
(
2001
)
What is the price of catastrophic wildfire?
 
Journal of Forestry
,
99
(
11
),
9
17
.

11

Cade
,
B.S.
&
Noon
,
B.R.
(
2003
)
A gentle introduction to quantile regression for ecologists
.
Frontiers in Ecology and the Environment
,
1
(
8
),
412
420
.

12

Cade
,
B.S.
,
Terrell
,
J.W.
&
Schroeder
,
R.L.
(
1999
)
Estimating effects of limiting factors with regression quantiles
.
Ecology
,
80
(
1
),
311
323
.

13

Chen
,
X.
&
Fan
,
Y.
(
2006
)
Estimation and model selection of semiparametric copulabased multivariate dynamic models under copula misspecification
.
Journal of Econometrics
,
135
(
1-2
),
125
154
.

14

Cohen
,
J.D.
&
Deeming
,
J.E.
(
1985
)
The national fire-danger rating system: Basic equations
, Vol.
82
.
USA
:
US Department of Agriculture, Forest Service, Pacific Southwest Forest and Range Experiment Station
.

15

Cressie
,
N.A.
(
1993
)
Statistics for spatial data
, 2nd.
New York
:
Wiley
.

16

Cunningham
,
E.
,
Tokdar
,
S.T.
&
Clark
,
J.S.
(
2020
) Chapter 2—A vignette on modelbased quantile regression: analysing excess zero response. In:
Fan
,
Y.
,
Nott
,
D.
,
Smith
,
M.S.
&
Dortet-Bernadet
,
J.-L.
(Eds.)
Flexible Bayesian regression modelling
.
Cambridge
:
Academic Press
, pp.
27
64
.

17

Datta
,
A.
,
Banerjee
,
S.
,
Finley
,
A.O.
&
Gelfand
,
A.E.
(
2016
)
Hierarchical nearestneighbor Gaussian process models for large geostatistical datasets
.
Journal of the American Statistical Association
,
111
(
514
),
800
812
.

18

De Iorio
,
M.
,
Müller
,
P.
,
Rosner
,
G.L.
&
MacEachern
,
S.N.
(
2004
)
An ANOVA model for dependent random measures
.
Journal of the American Statistical Association
,
99
(
465
),
205
215
.

19

Demarta
,
S.
&
McNeil
,
A.J.
(
2005
)
The t copula and related copulas
.
International Statistical Review
,
73
(
1
),
111
129
.

20

Dissmann
,
J.
,
Brechmann
,
E.C.
,
Czado
,
C.
&
Kurowicka
,
D.
(
2013
)
Selecting and estimating regular vine copulae and application to financial returns
.
Computational Statistics & Data Analysis
,
59
,
52
69
.

21

Dockery
,
D.W.
,
Pope
,
C.A.
,
Xu
,
X.
,
Spengler
,
J.D.
,
Ware
,
J.H.
,
Fay
,
M.E.
et al. (
1993
)
An association between air pollution and mortality in six US cities
.
New England Journal of Medicine
,
329
(
24
),
1753
1759
.

22

Dunson
,
D.B.
,
Pillai
,
N.
&
Park
,
J.-H.
(
2007
)
Bayesian density regression
.
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
,
69
(
2
),
163
183
.

23

Elsner
,
J.B.
,
Kossin
,
J.P.
&
Jagger
,
T.H.
(
2008
)
The increasing intensity of the strongest tropical cyclones
.
Nature
,
455
(
7209
),
92
.

24

Embrechts
,
P.
,
Lindskog
,
F.
&
McNeil
,
A.
(
2001
)
Modelling dependence with copulas
.
Zurich
:
Rapport technique, Département de mathématiques, Institut Fédéral de Technologie de Zurich
.

25

Finley
,
A.O.
,
Datta
,
A.
,
Cook
,
B.D.
,
Morton
,
D.C.
,
Andersen
,
H.E.
&
Banerjee
,
S.
(
2019
)
Efficient algorithms for Bayesian nearest neighbor Gaussian processes
.
Journal of Computational and Graphical Statistics
,
28
(
2
),
401
414
.

26

Firpo
,
S.
,
Fortin
,
N.M.
&
Lemieux
,
T.
(
2009
)
Unconditional quantile regressions
.
Econometrica
,
77
(
3
),
953
973
.

27

Foresi
,
S.
&
Peracchi
,
F.
(
1995
)
The conditional distribution of excess returns: An empirical analysis
.
Journal of the American Statistical Association
,
90
(
430
),
451
466
.

28

Foster
,
L.
,
Waagen
,
A.
,
Aijaz
,
N.
,
Hurley
,
M.
,
Luis
,
A.
,
Rinsky
,
J.
et al. (
2009
)
Stable and efficient gaussian process calculations
.
Journal of Machine Learning Research
,
10
(
Apr
),
857
882
.

29

Gelman
,
A.
,
Hwang
,
J.
&
Vehtari
,
A.
(
2014
)
Understanding predictive information criteria for Bayesian models
.
Statistics and Computing
,
24
(
6
),
997
1016
.

30

Genest
,
C.
,
Quessy
,
J.-F.
&
Rémillard
,
B.
(
2006
)
Goodness-of-fit procedures for copula models based on the probability integral transformation
.
Scandinavian Journal of Statistics
,
33
(
2
),
337
366
.

31

He
,
X.
(
1997
)
Quantile curves without crossing
.
The American Statistician
,
51
(
2
),
186
192
.

32

Huard
,
D.
,
Évin
,
G.
,
Favre
&
A.-C.
, (
2006
)
Bayesian copula selection
.
Computational Statistics & Data Analysis
,
51
(
2
),
809
822
.

33

Jerrett
,
M.
,
Burnett
,
R.T.
,
Ma
,
R.
,
Pope
 III,
C.A.
,
Krewski
,
D.
,
Newbold
,
K.B.
et al. (
2005
)
Spatial analysis of air pollution and mortality in Los Angeles
.
Epidemiology
,
16
,
727
736
.

34

Klein
,
N.
,
Kneib
,
T.
,
Klasen
,
S.
&
Lang
,
S.
(
2015
)
Bayesian structured additive distributional regression for multivariate responses
.
Journal of the Royal Statistical Society: Series C: Applied Statistics
,
64
,
569
591
.

35

Koenker
,
R.
(
1994
)
Confidence intervals for regression quantiles
. In: Asymptotic Statistics.
Berlin
:
Springer
, pp.
349
359
.

36

Koenker
,
R.
(
2005
)
Quantile regression (Econometric Society Monographs; no. 38)
.
Cambridge
:
Cambridge University Press
.

37

Koenker
,
R.
(
2017
)
Quantile regression: 40 years on
.
Annual Review of Economics
,
9
,
155
176
.

38

Koenker
,
R.
&
Bassett
,
G.
(
1978
)
Regression quantiles
.
Econometrica: Journal of the Econometric Society
,
46
(
1
),
33
50
.

39

Koenker
,
R.
,
Leorato
,
S.
&
Peracchi
,
F.
(
2013
)
Distributional vs. quantile regression
.

40

Kreuzer
,
A.
,
Erhardt
,
T.
,
Nagler
,
T.
&
Czado
,
C.
(
2017
)
Heavy tailed spatial autocorrelation models
.
arXiv preprint arXiv:1707.03165
.

41

Li
,
L.
,
Qiu
,
S.
,
Zhang
,
B.
&
Feng
,
C.X.
(
2016
)
Approximating cross-validatory predictive evaluation in Bayesian latent variable models with integrated IS and WAIC
.
Statistics and Computing
,
26
(
4
),
881
897
.

42

Lum
,
K.
&
Gelfand
,
A.E.
(
2012
)
Spatial quantile multiple regression using the asymmetric Laplace process
.
Bayesian Analysis
,
7
(
2
),
235
258
.

43

McNeil
,
A.J.
,
Frey
,
R.
&
Embrechts
,
P.
(
2005
)
Quantitative risk management: Concepts, techniques and tools
, Vol. 3.
Princeton
:
Princeton university press
.

44

Millar
,
R.B.
(
2018
)
Conditional vs marginal estimation of the predictive loss of hierarchical models using WAIC and cross-validation
.
Statistics and Computing
,
28
(
2
),
375
385
.

45

Moritz
,
M.A.
,
Batllori
,
E.
,
Bradstock
,
R.A.
,
Gill
,
A.M.
,
Handmer
,
J.
,
Hessburg
,
P.F.
et al. (
2014
)
Learning to coexist with wildfire
.
Nature
,
515
(
7525
),
58
66
.

46

Paciorek
,
C.J.
(
2013
)
Spatial models for point and areal data using Markov random fields on a fine grid
.
Electronic Journal of Statistics,
 
7
,
946
972
.

47

Paciorek
,
C.J.
,
Yanosky
,
J.D.
,
Puett
,
R.C.
,
Laden
,
F.
&
Suh
,
H.H.
(
2009
)
Practical large-scale spatio-temporal modeling of particulate matter concentrations
.
The Annals of Applied Statistics
,
3
(
1
),
370
397
.

48

Pope
,
C.A.
&
Dockery
,
D.W.
(
2006
)
Health effects of fine particulate air pollution: Lines that connect
.
Journal of the Air & Waste Management Association
,
56
(
6
),
709
742
.

49

Pope
,
C.A.
,
Thun
,
M.J.
,
Namboodiri
,
M.M.
,
Dockery
,
D.W.
,
Evans
,
J.S.
,
Speizer
,
F.E.
et al. (
1995
)
Particulate air pollution as a predictor of mortality in a prospective study of US adults
.
American Journal of Respiratory and Critical Care Medicine
,
151
(
3
),
669
674
.

50

Porter
,
W.C.
,
Heald
,
C.L.
,
Cooley
,
D.
&
Russell
,
B.
(
2015
)
Investigating the observed sensitivities of air-quality extremes to meteorological drivers via quantile regression
.
Atmospheric Chemistry and Physics
,
15
(
18
),
10349
10366
.

51

Rasmussen
,
C.E.
&
Williams
,
C.K.
(
2006
)
Gaussian process for machine learning
.
Massachusetts
:
MIT press
.

52

Reich
,
B.J.
,
Fuentes
,
M.
&
Dunson
,
D.B.
(
2011
)
Bayesian spatial quantile regression
.
Journal of the American Statistical Association
,
106
(
493
),
6
20
.

53

Reid
,
C.E.
,
Brauer
,
M.
,
Johnston
,
F.H.
,
Jerrett
,
M.
,
Balmes
,
J.R.
&
Elliott
,
C.T.
(
2016
)
Critical review of health impacts of wildfire smoke exposure
.
Environmental Health Perspectives
,
124
(
9
),
1334
1343
.

54

Sibuya
,
M.
(
1960
)
Bivariate extreme statistics, I
.
Annals of the Institute of Statistical Mathematics
,
11
(
2
),
195
210
.

55

Smith
,
M.
,
Min
,
A.
,
Almeida
,
C.
&
Czado
,
C.
(
2010
)
Modeling longitudinal data using a pair-copula decomposition of serial dependence
.
Journal of the American Statistical Association
,
105
(
492
),
1467
1479
.

56

Smith
,
M.S.
,
Gan
,
Q.
&
Kohn
,
R.J.
(
2012
)
Modelling dependence using skew t copulas: Bayesian inference and applications
.
Journal of Applied Econometrics
,
27
(
3
),
500
522
.

57

Stein
,
M.L.
(
2012
)
Interpolation of spatial data: Some theory for kriging
.
Berlin
:
Springer Science & Business Media
.

58

Tokdar
,
S.T.
(
2007
)
Towards a faster implementation of density estimation with logistic Gaussian process priors
.
Journal of Computational and Graphical Statistics
,
16
(
3
),
633
655
.

59

Tokdar
,
S.T.
(
2013
)
Contributed discussion on article by Muller and Mitra [Bayesian Nonparametric Inference–Why and How]
.
Bayesian Analysis
,
8
(
2
),
323
356
.

60

Tokdar
,
S.T.
&
Ghosh
,
J.K.
(
2007
)
Posterior consistency of logistic Gaussian process priors in density estimation
.
Journal of Statistical Planning and Inference
,
137
(
1
),
34
42
.

61

Tokdar
,
S.T.
&
Kadane
,
J.B.
(
2012
)
Simultaneous linear quantile regression: A semiparametric Bayesian approach
.
Bayesian Analysis
,
7
(
1
),
51
72
.

62

Tokdar
,
S.T.
,
Zhu
,
Y.M.
&
Ghosh
,
J.K.
(
2010
)
Bayesian density regression with logistic Gaussian process and subspace projection
.
Bayesian Analysis
,
5
(
2
),
319
344
.

63

van der Vaart
,
A.W.
&
Van Zanten
,
J.H.
(
2009
)
Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth
.
The Annals of Statistics
,
37
(
5B
),
2655
2675
.

64

Wasko
,
C.
&
Sharma
,
A.
(
2014
)
Quantile regression for investigating scaling of extreme precipitation with temperature
.
Water Resources Research
,
50
(
4
),
3608
3614
.

65

Watanabe
,
S.
(
2010
)
Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory
.
Journal of Machine Learning Research
,
11
(
Dec
),
3571
3594
.

66

Yang
,
Y.
&
He
,
X.
(
2015
)
Quantile regression for spatially correlated data: An empirical likelihood approach
.
Statistica Sinica
,
25
(
1
),
261
274
.

67

Yang
,
Y.
&
Tokdar
,
S.T.
(
2015
)
Minimax-optimal nonparametric regression in high dimensions
.
The Annals of Statistics
,
43
(
2
),
652
674
.

68

Yang
,
Y.
&
Tokdar
,
S.T.
(
2017
)
Joint estimation of quantile planes over arbitrary predictor spaces
.
Journal of the American Statistical Association
,
112
(
519
),
1107
1120
.

69

Yanosky
,
J.D.
,
Paciorek
,
C.J.
,
Schwartz
,
J.
,
Laden
,
F.
,
Puett
,
R.
&
Suh
,
H.H.
(
2008a
)
Spatio-temporal modeling of chronic PM10 exposure for the Nurses’ Health Study
.
Atmospheric Environment
,
42
(
18
),
4047
4062
.

70

Yanosky
,
J.D.
,
Paciorek
,
C.J.
&
Suh
,
H.H.
(
2008b
)
Predicting chronic fine and coarse particulate exposures using spatiotemporal models for the Northeastern and Midwestern United States
.

71

Yu
,
K.
&
Moyeed
,
R.A.
(
2001
)
Bayesian quantile regression
.
Statistics & Probability Letters
,
54
(
4
),
437
447
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/journals/pages/open_access/funder_policies/chorus/standard_publication_model)