Statistical inference for multivariate extremes via a geometric approach

A geometric representation for multivariate extremes, based on the shapes of scaled sample clouds in light-tailed margins and their so-called limit sets, has recently been shown to connect several existing extremal dependence concepts. However, these results are purely probabilistic, and the geometric approach itself has not been fully exploited for statistical inference. We outline a method for parametric estimation of the limit set shape, which includes a useful non/semi-parametric estimate as a pre-processing step. More fundamentally, our approach provides a new class of asymptotically-motivated statistical models for the tails of multivariate distributions, and such models can accommodate any combination of simultaneous or non-simultaneous extremes through appropriate parametric forms for the limit set shape. Extrapolation further into the tail of the distribution is possible via simulation from the fitted model. A simulation study confirms that our methodology is very competitive with existing approaches, and can successfully allow estimation of small probabilities in regions where other methods struggle. We apply the methodology to two environmental datasets, with diagnostics demonstrating a good fit.


Multivariate extreme value theory
The study of multivariate extremes primarily began in the 1970s and 80s, with the theoretical study of multivariate regular variation (MRV) (de Haan, 1970;de Haan and Resnick, 1977;Resnick, 1987). MRV is intrinsically tied up with the componentwise block maximum method for multivariate extremes. Suppose we have n independent replicates of a random vector Y i ∈ R d , i = 1, . . . , n; the componentwise maximum vector is defined as Univariate extreme value theory tells us that if, for each j = 1, . . . , d, there exists a n,j > 0, b n,j such that {M n,j − b n,j }/a n,j converges to a non-degenerate random variable, then the distribution of this limiting variable is generalized extreme value (Fisher and Tippett, 1928;Gnedenko, 1943), which is the only univariate max-stable distribution. The additional condition for joint convergence of the entire vector {M n − b n }/a n to a multivariate max-stable distribution is MRV. Since this represents an assumption on the dependence structure it can be expressed in standardized margins: a common choice is to set X P,j = 1/{1 − F j (Y j )}, where X P,j follows a standard Pareto distribution if Y j ∼ F j has a continuous distribution, else it is asymptotically Pareto. A common way to express the MRV assumption is lim t→∞ Pr(X P / X P ∈ B, X P > ts | X P > t) = s −1 H(B), where B ⊂ S d−1 = {v ∈ [0, 1] d : d j=1 v j = 1} is a measurable set with H(∂B) = 0. Assumption (1) shows that large values of the "radial" component X P become independent of the "angular" component X P / X P , which follows some probability distribution H on S d−1 , commonly referred to as the spectral measure.
Statistical methodology for multivariate extremes followed shortly after this theoretical study, and focused initially on inference for data arising as componentwise block maxima through parametrized forms of multivariate max-stable distributions (Tawn, 1990). This was soon followed by more direct exploitation of the MRV assumption (1), parametrizing models and performing inference on the spectral measure H (Coles and Tawn, 1991).
The study of componentwise maxima is a natural multivariate extension of the univariate block maximum approach, and the associated MRV dependence condition (1) widely applicable. However, it has been known for a long time that while examples not satisfying (1) are rare, the number of examples for which this assumption forms a useful basis for statistical inference is very much smaller. This is because, for many dependence structures, mass of the spectral measure H accumulates on one or more regions of the form B C = {v ∈ S d−1 : v j > 0, j ∈ C; v k = 0, k ∈ C}, C ⊂ {1, . . . , d}. (2) When this is the case, it means that joint extremes of the random vector Y (or equivalently X P ) may not always occur simultaneously; see e.g. Goix et al. (2017) or Simpson et al. (2020) for a more detailed explanation. In practice however, we never observe mass on such sets B C at finite levels, and moreover, what is more relevant for statistical modeling and extrapolation in these cases is a more detailed explanation of the behavior of Y or X P at extreme levels than (1) can provide.

Geometric approach to multivariate extremes
The early study of MRV was followed by a smaller body of work that examined the convergence of light-tailed multivariate sample clouds onto so-called limit sets (Davis et al., 1988;Kinoshita and Resnick, 1991). These ideas did not have a clear link with multivariate max-stable models and did not lead to the same proliferation of statistical methodology. More recently, several papers have revisited this geometric approach from a theoretical perspective Nolde, 2010, 2012;Nolde, 2014;Nolde and Wadsworth, 2022) and in some cases shown how the shape of the limit set links to whether joint extremes of certain variables can occur.
To make ideas more concrete consider n independent copies of a random vector X i , i = 1, . . . , n, with standard exponential margins; in practice this will typically involve marginal transformation of the original vectors Y i . The scaled n-point sample cloud is defined as N n = {X 1 / log n, . . . , X n / log n}, and we assume that this converges onto a limit set G = {x ∈ R d + : g(x) ≤ 1}, where g is the 1-homogeneous gauge function of the limit set. The precise sense of this convergence, and necessary and sufficient conditions for it, can be found in . However, these conditions are rather unintuitive, which led Nolde (2014) and Nolde and Wadsworth (2022) to consider conditions in terms of the joint Lebesgue density of X, where it exists. Denoting this joint density by f [E] , a sufficient condition for convergence of N n onto G is for a continuous gauge function g. Given that many statistical models have tractable joint densities and continuous gauge functions, equation (3) provides a simple way to determine the form of g, and hence G, in several examples (Nolde and Wadsworth, 2022).
The shape of G is important as a description of the extremal dependence of the underlying random vector. Limit sets exist for a much more general class of light-tailed marginal distributions than exponential, but we specify to this case so that there is a clear correspondence between the shape of G and the dependence structure. Nolde and Wadsworth (2022) showed how G can be used to determine an array of extremal dependence measures which generally relate to representations of multivariate extremes that are more useful than MRV when the spectral measure H places mass on one or more sets B C as in equation (2). These include expressions for determining the residual tail dependence coefficient (Ledford and Tawn, 1997), key elements of the conditional extremes model (Heffernan and Tawn, 2004), the angular dependence function (Wadsworth and Tawn, 2013), and the dependence coefficients of Simpson et al. (2020), which can be used to help determine the sets B C on which H places mass. Examples of G are given for d = 2, 3 in Section A of the supplement.
Given the importance of the shape of G, a natural question that arises is how to estimate this from a sample of data. To date this question has been studied very little indeed; Jacob and Massé (1996) study estimation from a theoretical perspective but with no implementation. Very recently, Simpson and Tawn (2022) outlined an estimation approach in the bivariate case.
In this paper, we consider estimation of G as part of a wider new approach to the statistical analysis of extreme values. That is, G is an object of interest in our methodology, but we direct our methodology more broadly at the question of statistical modeling and extrapolation for multivariate extreme values rather than focusing only on the descriptive aspects of extremal dependence that come from estimation of G. Our modeling approach allows in principle for any combination of joint extremes of sub-vectors of Y (equivalently H may place mass on any valid combination of sets B C ), and permits extrapolation in all directions -i.e., into the joint tail (where all variables are large), or into other regions of the multivariate tail where only some variables are large. Existing alternatives to methodology based on MRV do not capture these possibilities in a coherent manner.
Section 2 outlines our statistical model and assumptions. Section 3 details theoretical examples that demonstrate applicability of the method. We focus on details of statistical inference in Section 4, and use simulation to show that our approach is very competitive for estimation of extreme set probabilities in a wide range of scenarios in Section 5. Section 6 contains applications to oceanographic and fluvial datasets, and we conclude in Section 7.

Model and assumptions
Here and throughout the rest of the paper, we assume that we have a random vector X with standard exponential margins and joint Lebesgue density denoted by f [E] . Marginal transformation can be applied as a standard step via estimation of each marginal distribution function. The assumption of a joint density is very common for statistical analysis, as it is required for most likelihood-based inference, for example. We further assume that the scaled sample cloud N n converges onto a limit set G whose shape can either be described by a continuous gauge function g, or that we are only interested in the continuous part.
Assumption (3), which yields a sufficient condition for convergence of N n onto G, can equivalently be expressed f [E] (tx) = exp{−tg(x)[1 + o(1)]} for g(x) > 0 as t → ∞. The homogeneity of g suggests making the radial-angular transformation R = d j=1 X j , W = X/R; such transformations are familiar in multivariate extremes, but normally on Pareto, rather than exponential, margins. The Jacobian of this transformation is r d−1 , which leads to joint density of (R, W ): We recognize the form of a gamma kernel in the non-asymptotic terms of (4), suggesting that when R|W = w is large, its distribution could potentially be well approximated by a gamma distribution.
Indeed, if the o(1) term in the exponent is negligible, this suggests a truncated-and-renormalized gamma approximation above a high threshold r 0 (w) of the conditional distribution R|W = w.
A valid concern is whether the o(1) term in the exponent of (4) is really negligible. In Section 3 we detail several examples which in fact have the more helpful asymptotic form f R|W (r|w) ∝ r d−1 exp{−rg(w)}[1 + o(1)], i.e., with the o(1) outside of the exponent, and give explicit rates for this term. Based on this asymptotic representation, we focus in this paper on the model R|{W = w, R > r 0 (w)} . ∼ truncGamma(α, g(w)), where α > 0 is the gamma shape, and g(w) is the gamma rate parameter. In most examples, the theoretical shape parameter is α = d, but for modeling purposes the flexibility of an estimated shape is desirable. By parametrizing flexible forms for the gauge function g(w) = g(w; θ), we can use assumption (5) to estimate these parameters. Full details of our approach are given in Section 4, including diagnostic plots for assessing assumption (5).

Examples
The validity and quality of the truncated gamma approximation in (5) to the conditional density in (4) depends on the o(1) term. Since this lies in the exponent, it is not always guaranteed to be negligible. In this section, we explicitly calculate the density of R|W = w for various theoretical examples, showing that most in fact have the form f R|W (r|w) ∝ r d−1 exp{−rg(w)}[1 + o(1)], as r → ∞. The exception to this is the Gaussian dependence structure, for which we find f R|W (r|w) ∝ r α(w)−1 exp{−rg(w)}[1 + o(1)], as r → ∞, i.e., the conditional gamma form is still applicable, but the shape parameter depends on the value of w. Nonetheless, further investigations, described briefly below and in more detail in Section C of the supplement, show the assumption of a common shape in model (5) does not appear problematic in practice. This is also supported by our simulation study in Section 5. More generally, we will incorporate model checking of assumption (5) into our statistical analysis.
For each distribution below we detail the overall form of f R|W (r|w), with further calculations given in Section B of the supplement. We denote the ordered values of the vector w (and similarly x) by w (1) ≥ w (2) ≥ · · · ≥ w (d) > 0, assuming the minimum to be positive. In the convergence rates given below, we assume a strict ordering w (1) > w (2) > · · · > w (d) > 0; where this is not the case, following the derivations in the supplement, one usually observes improved rates e.g.
Max-stable and generalized Pareto distributions The general asymptotic form of the density for a max-stable distribution in exponential margins is t → ∞, where V : R d + → R + is the homogeneous of order −1 exponent function, Π is the set of all partitions of {1, . . . , d}, and V s (z) = ∂ |s| V (z)/ j∈s ∂z j .
We focus here on the d-dimensional logistic distribution, for which V (z) = The simpler form of the densities make calculations more straightforward for corresponding multivariate generalized Pareto distributions (MGPDs), which are related to max-stable distributions (Rootzén and Tajvidi, 2006;Rootzén et al., 2018). The support of MGPDs whose margins have unit scale and zero shape is contained in {x ∈ R d : x (1) > 0}. Densities for several models for which the spectral measure H places mass only on B {1,...,d} are given in Kiriliouk et al. (2019); in such cases the dependence structure can be determined by focusing on large values of x > 0. Further details are in the supplement.
For the MGPD associated to the negative logistic max-stable distribution (Galambos, 1975;Dombry et al., 2016), For the MGPD associated to the Dirichlet max-stable distribution (Coles and Tawn, 1991), Inverted max-stable distributions Inverted max-stable distributions have density where l is the stable tail dependence function of the corresponding max-stable distribution, obtained via l(x) = V (1/x), and l s (x) = ∂ |s| l(x)/ j∈s ∂x j . The gauge function is always g(x) = l(x).
Owing to the fact that l s (x) is homogeneous of order 1 − |s|, we obtain Multivariate Gaussian distribution We consider the multivariate Gaussian dependence structure with correlation matrix Σ. When one or more correlation parameters is negative then the continuous convergence − log f [E] (tx)/t → g(x) fails when components of x are zero (Nolde and Wadsworth, 2022). Since we are considering w (d) > 0 this is not an issue here, but we note that to fully capture negative association it is ideal to reformulate ideas in terms of Laplace, rather than exponential, margins; see Nolde and Wadsworth (2022) and Section 7. For Σ with non-negative entries, g(x) = (x 1/2 ) Σ −1 x, and r → ∞. In this case, the gamma shape parameter therefore depends on w, and the region on which α(w) > 0 depends on the entries of Σ. We investigate this further in Section C of the supplement, showing that local estimates of α do not vary strongly with w and may reasonably be assumed constant. We also show that results from our model are useful even in the (typically small) regions where α(w) ≤ 0.
Multivariate t ν distribution We consider the multivariate t distribution with ν degrees of freedom, focusing only on positive dependence; see the supplement for further comment. The gauge with parameter γ > 0. The Clayton copula has g(x) = d j=1 x j , and The inverted Clayton copula has g(x) In the supplement, we also calculate f R|W (r|w) for a given trivariate vine copula example.
4 Statistical inference 4.1 Calculating the threshold r 0 (w) A natural approach to calculating a high threshold for each R|W = w is quantile regression, treating W as the covariate. A similar approach has been taken in the context of establishing covariatedependent thresholds in univariate extreme value analysis (Northrop and Jonathan, 2011). When data are bivariate, so that W ∈ S 1 is equivalent to W ∈ [0, 1], this approach is straightforward.
However, standard parametric quantile regression requires a high degree of manual tuning to ensure that the model form captures the relation between R and W well. We therefore suggest using additive quantile regression (Fasiolo et al., 2021) via the corresponding R library qgam.
When W ∈ S d−1 , d > 2, then both parametric and additive quantile regression become more difficult due to the specific support of W on the simplex. A simple alternative is to calculate quantiles of R|W = w from overlapping blocks of W values, which is feasible for relatively low dimensions, but becomes more laborious as d grows. Figure 1 illustrates the concepts for d = 2, 3.
In each case, r 0 (w) is calculated as the 0.95 quantile of R|W = w.
In the second row of Figure 1 we demonstrate that the threshold r 0 (w), suitably rescaled, can be viewed as a non-/semi-parametric estimate of g. The reason for this can roughly be explained by considering the case where the gamma approximation is exact. LetF (r|w) be the (gamma) survival function of R|W = w, thenF (r 0 (w)|w) = 1 − τ . When the gamma shape parameter is d ∈ N, the survival function has an explicit expression allowing rearrangement to Since the log term in the numerator is slowly varying in r 0 (w), it may crudely be viewed as "close" to a constant for large τ , giving r 0 (w) ≈ C/g(w) and henceĝ(w) ≈Ĉ/r 0 (w). We will use this observation later to assist with model checking, but note that, combined with extension of additive quantile regression to higher dimensions, presents a very interesting avenue for future work. (1−w)r0(w)/max(1−w)r0(w) Figure 1: Top row: R against W , with the estimated 0.95 quantile of R|W = w in red. In the left and center plots, solid lines represent the output from qgam, and dashed lines from rolling-windows quantiles. In the right plot, the surface is calculated through a rolling-windows technique. Bottom row: Plots of wr 0 (w), rescaled to lie in [0, 1] d . In the left and center plots, solid lines represent the output from qgam, and dashed lines from rolling-windows quantiles. The blue lines are the unit level sets of g(w), with g the true gauge function. In the right plot, the red surface comes from the rolling-windows technique, and the blue surface is the unit level set of the true gauge function.

Likelihood
In order to fit model (5), we use likelihood-based inference. For n 0 independent observations of . . , n 0 , the likelihood that we maximize is where ψ = (α, θ) andF (·; α, g(w; θ)) represents the gamma survival function with shape parameter α, and rate parameter g(w; θ). Uncertainty in the maximum likelihood estimators (MLEs) can be obtained through the inverse Hessian matrix. In practice, many datasets exhibit temporal dependence, so that while likelihood (6) may be used for parameter estimation, block-bootstrap techniques will be preferable for estimation of uncertainty.

Gauge functions from specific distributions
Key to a successful fit of model (5) via likelihood (6) are flexible parametrized forms of g, that are able to capture a wide variety of limit set shapes. In Section 3 we detailed various forms of gauge function that come from different underlying distributions, some of which are illustrated in Section A of the supplement. Further forms can also be found in Nolde and Wadsworth (2022).
Any of these parametric forms could be fitted as a candidate model, and standard model-selection techniques such as information criteria used to establish a best choice; we will demonstrate this in our simulation study of Section 5.
A key attraction of our new approach to inference for multivariate extremes is the ability to be able to capture the complex dependence structures that arise when different sub-groups of variables can potentially be co-extreme while the others are small. Under MRV, this corresponds to the spectral measure placing mass on sets B C as described in Section 1. In order to capture these scenarios, we consider the gauge function corresponding to the asymmetric logistic distribution (Tawn, 1990), which can place mass on any valid combination of sets B C . The full expression for this involves minimization over several components, and is given in Section D of the supplement.
Figure 2 depicts some of the potential limit sets arising from this structure when d = 3.

Additively mixing gauge functions
The gauge functions described in Section 3 provide a starting point for inference on model (5), but may not always be flexible enough to capture the structures of observed data. We now consider how to mix gauge functions to generate more flexible models. The limit sets G for data with exponential see Nolde and Wadsworth (2022). Each form of g given in Section 3 satisfies this constraint, and we need any scheme for mixing gauge functions to satisfy this also, since they will be applied to data in exponential margins.
A simple way to mix that retains condition (7) is via minimization: g(x) = min{g [1] (x), . . . , g [m] (x)}, for g [1] , . . . , g [m] each satisfying condition (7). The resulting gauge function is the one that would correspond to a mixture density f [E] [E] (x) with m k=1 π k = 1 and π k ∈ (0, 1) for each k. However, such an approach has the effect of retaining the most protruding part of each limit set and may not yield the most realistic shapes; some examples are given in Section E of the supplement. Instead we focus on additive mixing, defining The resulting function is denoted byg as in general it will not satisfy condition (7), and will need to be rescaled to do so. Suppose that the coordinatewise supremum of the setG = {x :g(x) ≤ 1} isc = (c 1 , . . . ,c m ). Then the rescaled gauge function g(x) =g(c 1 x 1 , . . . ,c d x d ) satisfies (7). Some examples of limit sets from additively mixed functions are depicted in Figure 3. Interestingly, we observe for d = 2 that this process is able to interpolate between limit sets for which g(1, 1) = 1 and have a "pointy" shape, to those with g(1, 1) < 1 and are described by Balkema and Nolde (2012)   the ability to move between "pointy" shapes representing joint occurrence of extremes in some components and "blunt" shapes representing separate extremes. We focus in the figures only on the case m = 2, and leave theoretical study of this phenomenon for any m to future work.
Note that when using additive mixing, the component gauge functions g [k] (x) need not satisfy condition (7) due to the rescaling. This allows, for example, one to include the Gaussian gauge function g(x) = (x 1/2 ) Σ −1 x 1/2 when Σ has negative entries, and increases the flexibility of this approach. In practice, we use numerical methods to rescale.

Model checking
We propose checking the fitted model from likelihood (6) via probability-probability (PP) plots.
Comparison of the "empirical" estimate of the gauge functionĝ(w) ≈Ĉ/r 0 (w), as outlined in Section 4.1, provides another check on the form of the fitted model. As was seen in Section 4.1, while we do not expect perfect correspondence betweenĝ(w) and g(w; θ), we can expect to see broad similarities in shape. Again we use this in Section 6.

Prediction
A key aspect of our proposed geometric framework for statistical inference is that we can use simulation from the fitted model to estimate probabilities of lying in extreme regions, enabling extrapolation outside the range of the observed data. Up to this point, we have focused on the conditional distribution of R|{W = w, R > r 0 (w)}. In order to perform extrapolation and estimate multivariate tail probabilities, we need realizations of the distribution of X in some suitably extreme region. Notationally it is helpful to introduce an alternative radial variable, R = R/r 0 (W ), so that We focus initially on simulating an arbitrary number of points satisfying the conditioning event {R > 1}, and discuss below adaptations for simulating above higher thresholds. To get draws from the distribution of X|R > 1, we require simulation from two components, which are multiplied together: (i) the distribution of W |R > 1; (ii) the distribution of R|{W = w, R > r 0 (w)}.
The second of these is a simple case of simulating from the fitted truncated gamma distribution, which can be done exactly via the probability integral transform. For the first of these, we may either resample from the empirical distribution of W |R > 1, or we could fit a parametric model to such samples and simulate from this. We opt for the former in this work, and note the latter as a potential line of future investigation. Figure 4 shows 5000 draws simulated from X|R > 1, based on a model fitted to 2500 data points.
To estimate the probability of lying in extreme sets, we exploit the simple equation for any set B lying entirely within the region {x ∈ R d + : d j=1 x j > r 0 (x/ d j=1 x j )}; some examples are given in Figure 4. The first probability on the right-hand side of (8) can be estimated empirically from the simulated draws. The second probability may be estimated from the dataset as the proportion of points R exceeding 1. When quantile regression at level τ has been used to find r 0 (w), we expect the proportion of points to be near 1 − τ . and logistic (right) distributions. Models were fitted to 2500 data points shown in green. Light gray squares represent potential sets B in equation (8).
The fact we can simulate an arbitrary number of points from our model with the condition {R > 1} means that in principle we can extrapolate quite a way beyond the observed data.
Nonetheless, such an approach may be computationally demanding for very extreme sets that require a large number of simulations. We consider now how to simulate given the condition {R > k}, with k > 1; results will be illustrated in Section 5.
Simulation from the distribution of R|{W = w, R > kr 0 (w)} is again straightforward from the fitted truncated gamma distribution. Simulation from the distribution of W |R > k is more challenging if k is sufficiently high that there are few or no empirical samples available. However, we have the relation where f U (·|V > v) denotes the density of a random vector U |V > v. Note that so that under the truncated gamma model (5) for R|{W = w, R > r 0 (w)}, we have the approximate proportionality f W (w|R > k) . ∝ f W (w|R > 1)F (kr 0 (w); α, g(w)) F (r 0 (w); α, g(w)) .
an approximate sample from the distribution of W |R > k, using a sample from the distribution of W |R > 1.
Finally, to estimate Pr(R > k), so that we can calculate extreme probabilities as in equation (8), note that the (approximate) constant of proportionality in (10) is Pr(R > k|R > 1), from the denominator of equation (9). An estimate of this is therefore where w i , i = 1, . . . , n 0 are the angles corresponding to the values for which R > 1. Lastly, Pr(R > k) = Pr(R > k|R > 1) Pr(R > 1), where Pr(R > 1) is estimated empirically, as previously. We note that another alternative to this procedure is to fit the generalized Pareto (GP) distribution to R |R > 1 and use this fitted model to estimate Pr(R > k|R > 1). Our investigation into this found that both options perform similarly for relatively small k, but the GP model introduces extra uncertainty for larger k, and so we stick to the first approach in Section 5.

Simulation study
We now demonstrate the performance of our methods against existing approaches for analyzing multivariate extremes. Our focus lies on estimation of probabilities Pr(X ∈ B) for three sets B that lie in different parts of the region where X may be considered extreme.
We begin with the bivariate case, which is well-established and understood, demonstrating that our methodology gives estimates with low bias in each situation, performing competitively with other methods across a range of scenarios. Specifically, we compare with estimation methodology based on multivariate regular variation (MRV), hidden regular variation (Ledford and Tawn, 1997) (HRV) and the conditional extreme value model (CE) of Heffernan and Tawn (2004). HRV is a refinement of MRV that allows for situations where the spectral measure H places no mass on B {1,...,d} . Like MRV however, the asymptotics of this method are suited only to extrapolating into regions where all variables are large simultaneously. We then move on to the more difficult case of d = 3, and show that we can substantially outperform the CE model in this setting, which is the only other approach able to provide an estimate of the probabilities of interest.
In each case we fit model (5) to the data using four different gauge functions: those corresponding to distributions (I)-(III), where the parameter is to be estimated from the data, and the function g(x; θ) = max{(x 1 − x 2 )/θ, (x 2 − x 1 )/θ, (x 1 + x 2 )/(2 − θ)}. We select the model that yields the lowest value of the Akaike Information Criterion (AIC) for the prediction step, thereby avoiding using knowledge of the true data-generating process. Recall that before fitting model (5), we need to calculate a high threshold r 0 (w). In Section 4.1, we described using either additive quantile regression or a rolling-windows quantile calculation for this. We used both techniques in the simulation study, setting τ = 0.95, finding relatively little difference in the performance of the resulting inference, particularly in comparison to differences across extreme-value methodologies.
Therefore, to keep presentation focused, we detail only the results where r 0 (w) was found using the simpler rolling-windows quantile method. Although our focus is on extreme probability estimation, we also display (non-)parametric estimates of G in Section F. with little bias; the smallest variance is attributed to MRV, which is as expected since we are looking at a distribution where H places mass on B {1,2} and estimating a probability in the joint tail. The geometric approach and CE estimate Pr(X ∈ B 2 ) relatively well, with the smallest variance attributable to the geometric approach based on X|R > k for suitable k > 1. HRV and MRV start to exhibit bias because B 2 lies outside the joint tail region. For Pr(X ∈ B 3 ), all estimates based on HRV and MRV are equal to zero. For the geometric approach, we are able to estimate this probability well when selecting a suitable k. Specifically, in each repetition, we select one of the largest values of k such that B 3 ⊂ {x : (x 1 + x 2 ) > kr 0 (x 1 /(x 1 + x 2 ))}. This results in 100% of probabilities having a non-zero estimate, compared to 0% for HRV/MRV, and 4.5% for CE (at each of two thresholds). A boxplot of this case is included in Section F of the supplement.
For distributions (II) and (III), the geometric approach and CE exhibit quite similar performance, although CE has a smaller variance for estimates of Pr(X ∈ B 2 ) under distribution (II), and of Pr(X ∈ B 3 ) under distribution (III). MRV is not an appropriate method for these distributions and always performs badly; HRV is appropriate in the joint tail, where it exhibits similar performance to other methods for (II) and better performance for (III), while it leads to poor estimates in other regions. An additional boxplot in Section F of the supplement displays more detailed information for the estimates of Pr(X ∈ B 1 ) under distribution (III). As described for distribution (I), we also used a suitable k > 1 for estimating this probability. The geometric approach outperforms CE in this case. This is because, using an appropriate k, we are able to simulate points to generate non-zero estimates of the probabilities (93.5% and 100% of estimates are positive for the two thresholds shown). In contrast, only 43.5% and 43.5% of estimates are positive for CE.

Dimension d = 3
We again perform estimation based on 5000 data points from three different data structures: (I) asymmetric logistic distribution, for which the spectral measure H places mass on For the d = 3 case we consider only two methodologies: the geometric approach and CE. We know already that HRV/MRV only performs well when considering sets B in the joint tail, i.e., where all variables are of a similar magnitude; moreover for MRV we require mass on B {1,2,3} for good performance of this method. There is therefore relatively little to learn about these methods, and the sets we are considering lie outside of the region where they may perform well, since they are all small in x 3 .
For the geometric approach we fit model (5) to the data after identifying potential suitable forms for the gauge function g. For this initial step, we calculate the coefficients τ C (δ), and associated estimates of the probability of mass on B C as in Simpson et al. (2020), for δ = 0.4, 0.5, 0.6 and C ⊆ {1, 2, 3}. These estimates help to identify potential faces of the simplex on which the limiting spectral measure H places mass, and hence a suitable structure for the form of the gauge function.
If all estimates suggest the same extremal dependence structure then a single model is fitted, where the gauge function corresponds to that of the asymmetric logistic distribution for the identified structure. Otherwise, up to three different models are fitted, and the best model selected via AIC.
We note that for distributions (I) and (II) this means that we have the potential to be fitting the correct model form to the data, subject to its identification via the Simpson et al. (2020) methodology, although for distribution (III), we always have a misspecified model. Figure 6 displays boxplots of the estimated probabilities using the two methods. In most cases, the geometric approach exhibits low bias, particularly in comparison to CE, which is typically biased 8e−06 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q G1 G2 CE1 CE2 HRV MRV 0e+00 2e−06 4e−06 6e−06 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q respectively. Green boxplots, labelled G1, G2, give results from our geometric approach: G1 is calculated from X|R > 1; G2 is calculated from X|R > k, where k is determined as the maximum value such that all sets B 1 , B 2 , B 3 lie in the region {x : x 1 + x 2 > kr 0 (x 1 /(x 1 + x 2 ))}. Dark blue boxplots, labelled CE1, CE2 give results from the conditional extremes model: CE1 is calculated from X|X 1 > 6.9; CE2 is calculated from X|X 1 > 10. Turquoise and purple boxplots, labelled HRV and MRV respectively, are calculated using hidden regular variation (HRV) and multivariate regular variation (MRV). True values of the probabilities are indicated by horizontal red lines.
down for Pr(X ∈ B 1 ) and up for Pr(X ∈ B 2 ), Pr(X ∈ B 3 ). In conditional extreme value modeling, dependence structures are defined pairwise, so while any pair of variables (X i , X j ) can theoretically have mass on B {i,j} or B {i} and B {j} , the methodology cannot usually capture more complex higherorder structures well. The structure of distribution (III) is the simplest, with only variables X 1 , X 2 exhibiting simultaneous extremes, and CE is correspondingly more successful in this case. For Pr(X ∈ B 3 ) and distributions (I) and (III), additional boxplots are provided in Section F of the supplement. These demonstrate that the geometric approach labeled G3 provides the best estimate in both cases, but is biased down. In contrast we can see from Figure 6 that estimates of this probability for distribution (I) are biased strongly upwards for CE, while for distribution (III) only 5.5% of estimates for CE are positive.

Data analyses
We use our new modeling approach to analyze two multivariate environmental datasets. The first is wave data from Newlyn, UK, included because of its extensive previous analysis in the literature.
The second is a set of river flow data from Simpson et al. (2020).

Newlyn wave data
This dataset of 2894 measurements of wave height (meters), surge (meters) and period (seconds), denoted here as (X H , X S , X P ), was originally analyzed in Coles and Tawn (1994) using a model that assumed MRV with all mass of the spectral measure on B {H,S,P } . The full trivariate dataset has subsequently been analyzed in Bortot et al. (2000), who assumed a censored multivariate Gaussian model, and Coles and Pauli (2002), whose model was able to accommodate the situation where the spectral measure places mass on some faces of the simplex, but was otherwise quite restrictive.
The first step is to transform each marginal to exponential, which is done using a semi-parametric estimate of the distribution function for each variable X · : whereF · is the empirical df, u · is a high threshold, φ u,· = Pr(X · > u · ), and the form above u · is the GP distribution with scale σ · > 0 and shape ξ · . We take the thresholds u H , u S and u P to be the 95% quantiles of the respective distributions.
To get an initial idea of the extremal dependence structure, we use the Simpson et al. (2020) methodology, calculating τ C (δ) for a range of values of δ. These estimates suggest that the spectral 6e−05 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q G1 is calculated from X|R > 1; G2 and G3 are calculated from X|R > k j , j = 1, 2, where k j is determined as a large value such that the sets B 2 or B 3 lie in the region {x :  et al. (2000) and Coles and Pauli (2002).
To calculate the threshold r 0 (w), we use the rolling-windows procedure described in Section 4.1, with τ = 0.95. We then fit model (5) with three forms for g: (i) the asymmetric logistic gauge function with the structure given by τ C (δ), (ii) gauge corresponding to the Gaussian distribution, and (iii) an additive mixture of the Gaussian and asymmetric logistic gauges, as described in Sec- with approximate Hessian-based standard errors (0.74, 0.12, 0.12, 0.18). Figure 7 displays the PP plot for this fit as described in Section 4.4, as well as the same plot transformed onto the exponential scale to emphasize the upper tail, indicating no lack of fit. We also compare the empirical gaugê g(w) and the fitted Gaussian gauge function g(w; θ) in the right panel of Figure 7. The empirical gauge is relatively "jagged" and variable due to the manner of its calculation, but there is clear correspondence between its overall shape and that of the fitted gauge. As a further diagnostic, we compare empirical and model-based estimates of the sub-asymptotic joint tail dependence coefficient. For X · ∼ F · , this is defined by The empirical estimator of χ C (u) is obtained by replacing each distribution function and joint probability with its empirical counterpart, while the model-based estimate is calculated using simulation from the fitted model as described in Section 4.5, and suitable sets B. In Figure 8 we consider  Finally we consider analysis of the structure variable outlined in Coles and Tawn (1994). They introduce the overtopping discharge rate Q(v; X HSP ) for a sea wall of height v as The value X * H is introduced to approximate the actual off-shore wave height, since measurements are taken on-shore. The goal is to estimate the sea wall height v p (in meters) for which the value for v p using realizations of V generated through simulation and reverse marginal transformation. As in Bortot et al. (2000), we fix a 1 = 0.25, a 2 = 26, and l = 4.3. Figure 8 displays the obtained values v p , with empirical quantiles and those calculated from fitting the GP distribution directly to the tail of V (the so-called "structure variable approach") for comparison. For very small p, the return levels obtained from the geometric model are larger than those from the GP fit. They are comparable to those obtained in Bortot et al. (2000), but much lower than those in Coles and Tawn (1994), whose model incorrectly assumes H places mass on B {HSP } .

River flow data
We now apply our modeling approach on 12,327 measurements of daily mean river flow (m 3 /s) from four gauging stations in the north west of England. The data were previously explored in Simpson et al. (2020), where focus lay on determining the support of the spectral measure, but not subsequent modeling of the variables. We opt to consider four out of the five locations initially used in order to keep the number of parameters reasonable; further discussion on dimensionality can be found in Section 7. The four stations, labeled 1, 2, 3, 4, correspond to those labeled A, B, C, D in Simpson et al. (2020).
Margins are standardized using equation (11). We then use the Simpson et al. (2020) methodology, which suggests that the spectral measure may place mass on the faces and B {1,2,3,4} of the simplex S 3 . We fit the model with the corresponding asymmetric logistic gauge function, obtaining an AIC of 2666, as well as the model with the Gaussian gauge function, and an additive mixture of the two. The resulting AIC values are 2601 and 2609. Once again, the model with the Gaussian gauge is preferred, in apparent conflict with the estimated structure of the spectral measure, though we note this is also subject to uncertainty. Parameter estimates are ( α, θ) = (2.46, 0.83, 0.90, 0.80, 0.90, 0.57, 0.62), with approximate standard errors (0.62, 0.11, 0.14, 0.14, 0.14, 0.16, 0.14), found via block bootstrap with block length 20. The asymmetric logistic gauge, while able to capture the structure of different groups of variables being co-extreme, appears too inflexible to capture other aspects of the dependence. The additively mixed model is an attempt to alleviate this problem, but leads to a large number of parameters without a sufficient improvement in fit to compensate for them. Figure 9 displays coefficients χ 123 (u), χ 134 (u) and χ 1234 (u), defined analogously to (12). If H places mass on B {1,2,3,4} , then each of these coefficients has a positive limit as u → 1, but at observable levels, the model-based estimates from the Gaussian gauge represent a good fit to the data. Plots of χ C (u) for the remaining groups of variables are given in Section G of the supplement, along with the PP plot, showing no lack of fit.

Discussion
We have presented a new approach to multivariate extreme value modeling, based on estimation of the shape of the limit set of a sample cloud of data points in light-tailed margins. The methodology allows for modeling datasets with complicated extremal dependence structures, whereby different groups of variables may be co-extreme, as well as extrapolation into parts of the multivariate tail where only some variables are large. By offering models for complex dependence structures with non-simultaneous extremes, this approach paves the way for more useful higher dimensional extreme value modeling. Recent literature on multivariate extremes that is targeted at higher dimensions typically involves making strong simplifying assumptions on the dependence structure. For example, the extremal graphical models outlined in Engelke and Hitz (2020) require an assumption that the spectral measure H places all mass on B {1,...,d} .
In this work, we demonstrated the methods up to dimension d = 4. The main challenges for routine application of the methods for d larger than 3 or 4 lie in calculation of the threshold function r 0 (w), and specification of flexible gauge functions. The former could potentially be addressed by adapting the additive quantile regression approach of Fasiolo et al. (2021) to incorporate basis functions whose support is the simplex S d−1 . Addressing the latter challenge requires ways to build flexible and parsimonious gauge functions, which is a topic of current work. In particular, we note that models fitted in Section 6 had the ability to capture the complex structures suggested by the Simpson et al. (2020) methodology, but the best fits were obtained through models that were more flexible in other aspects.
We have focused here primarily on positive dependence as it is common in many datasets and simplifies the presentation. For datasets exhibiting any form of negative dependence, the limit set shapes are more descriptive in Laplace, rather than exponential, margins. For example, we mentioned for the multivariate Gaussian case that the continuous convergence to g(x) fails when some component of x is zero; this is not an issue in Laplace margins, where the limit set lies in the region [−1, 1] d rather than [0, 1] d , and similarly for the t ν distribution. Moving from the positive quadrant to R d requires defining the angles W differently, but otherwise a similar approach could be applied, and represents a natural next step in developing this methodology.

A Example limit sets
Figures 10 and 11 display example illustrations of limit sets in exponential margins for three dependence structures in dimension d = 2 and d = 3 respectively. Equations for the gauge functions of these limit sets can be found in Section 3 of the main paper, or Section B of the supplement.

B Gauge functions and conditional distributions of R|W = w
Here we provide detailed calculations of the gauge functions such that we can establish the asymptotic behavior of f R|W (r|w) as r → ∞. In each case, we begin with the relevant density in exponential margins and calculate the asymptotic behavior of f [E] (tx) as t → ∞; this is subsequently used to establish results for f R|W (r|w) as r → ∞. We recall that the notation for ordered values is Logistic distribution The d-dimensional logistic distribution with unit Fréchet margins has density where V : R d + → R + is the homogeneous of order −1 exponent function. For the logistic distribution, this is given by The transformation to exponential margins is given by give the asymptotic behavior for large x j yields z j (x j ) = e x j + 1/2 + O(e −x j ). We can therefore express the density in exponential margins as Firstly consider the contribution exp{−V (e tx + 1/2 + O(e −tx ))}. By homogeneity Next consider the partial derivatives V s (z). We have and therefore Combining all of these results yields The gauge function comes from taking Negative logistic MGPD MGPDs have marginal scale and shape parameters, and when these are all set to 1 and 0 respectively, the marginal distributions are exponential conditionally upon being positive. That is, if Z follows a MGPD with unit-scale and zero-shape parameters, the marginal distribution functions F j (z) are with c j = Pr(Z j ≤ 0). To translate to exponential margins we solve Following calculations in Kiriliouk et al. (2019), the unit-scale zero-shape MGPD density associated to the negative logistic max-stable distribution is and so on the region {x : Outside of this region we require knowledge of the distribution of Z j |Z j < 0, j = 1, . . . , d, which is harder to summarize in general.

Dirichlet MGPD
In this case and so on the region {x : Inverted max-stable distributions Recall the form of the density is The derivatives l s (x) are homogeneous of order 1 − |s|, and so The leading-order term in the summation therefore comes from the partition π = {{1}, {2}, . . . , {d}}, with |s| = 1 for all s ∈ π. Second-order behavior comes from the d partitions containing one set with |s| = 2 and all others with |s| = 1. We therefore have so that g(x) = l(x), and Multivariate Gaussian distribution The multivariate Gaussian density with Gaussian margins is where Ω = (ω jk ) j,k = Σ −1 is the precision matrix. Transforming to exponential margins, we obtain where z j (x j ) is found through solving x j = − log(1 − Φ(z j )), with φ, Φ the standard univariate normal density and df, respectively. Since we are interested in tx j , t → ∞, we exploit Mills' ratio for the solution. Dropping the component index, and writing z t = z(tx), this gives which is rearranged to give We firstly deal with the Jacobian expression in density (13). Again via Mills' ratio, Next consider the terms in the quadratic form of the exponent: where k(x) does not depend on t. Putting everything together, with g(x) = (x 1/2 ) Σ −1 x 1/2 , We therefore observe that the conditional distribution of R|W = w has the gamma form, but with shape parameter α(w) = d/2 + (w 1/2 ) Σ −1 w −1/2 /2, rather than d as in all other examples calculated here.
Multivariate t ν distribution (positive dependence) The multivariate t distribution with ν degrees of freedom exhibits both positive and negative dependence. After transformation to exponential marginals, the negative dependence is manifested in the limit set by inclusion of sections on the planes {x j = 0}, j = 1, . . . , d. We focus here on the shape of the limit sets for x > 0 only, which captures the positive dependence in the tail.
The density with centered t ν margins and dispersion matrix Σ is with Ω = (ω jk ) j,k the inverse dispersion matrix. Transforming to exponential margins gives where z j (x j ) is the solution to x j = − log(1 − F Tν (z j )), and f Tν , F Tν represent the marginal density and df of the t ν distribution. Again, we are interested in large values of x j and z j : Soms (1976) gave an expansion for the ratio of the univariate survival function to density, from which we can deduce that Dropping the component index and writing z t = z(tx), we have where c is a constant depending on ν. For a new constant c , this is rearranged to give To find the asymptotic behavior of (14) we firstly consider the Jacobian term: Considering the kernel, we have Combining both expressions, Clayton and inverted Clayton copulas The Clayton copula with parameter γ > 0 has distribution function in uniform margins The corresponding density is , and so The gauge function is therefore g(x) = d j=1 x j , and If the random vector U follows a Clayton copula with uniform margins, then the random vector 1 − U follows an inverted Clayton copula with uniform margins. Its density is f [U ] (1 − u).
In exponential margins The gauge function is therefore g( Vine copula from Nolde and Wadsworth (2022) Nolde and Wadsworth (2022) give an example of a gauge function derived from a particular vine copula construction. Vine copulas are specified by pairs of bivariate copulas: in this case we take the two base pairs to be independence (between (X 1 , X 2 )) and inverted Clayton with parameter β > 0 (between (X 2 , X 3 )), and use the inverted Clayton with parameter γ > 0 to model the dependence between (X 3 |X 2 , X 1 |X 2 ).
Let c 1,2 , c 2,3 and c 1|2,1|3 denote the densities of the respective copulas in standard uniform margins. The joint density in exponential margins is For the copula densities, c 1,2 = 1, while , and c 1|2,3|2 is of the same form but with parameter γ. We have Combining all components, Simplifying the gauge function we get

C Multivariate Gaussian case
In this section we consider the conditional distribution R|W = w for the multivariate Gaussian dependence structure in more detail. In Section 3 of the main paper, and Section B of the supplement, the asymptotic form of this distribution is shown to have a gamma form with shape parameter given by a function of the angle w, There are two concerns with this shape parameter: (i) whether its variation with w indicates the need for a more complex model than that in equation (5), where the shape is assumed constant, and (ii) the fact that α(w) ≤ 0 for some values of w. We investigate these issues in turn, using a single correlation across all pairwise variables to define our covariance matrix, Figures 12 and 13 display local estimates of the shape parameter α under the truncated gamma model (5) for R|{W = w, R > r 0 (w)} for d = 2, 3, respectively. In each case the angles w are restricted to a small section of the simplex, and the rate parameter is fixed at the "true" value g(w). For comparison, we also perform the same procedure for the logistic and inverted logistic dependence structures, over a variety of dependence strengths.
For the Gaussian case, we observe that estimates α remain relatively constant on the simplex S d−1 in practice for d = 2, 3. In particular, the shape parameter estimates vary no more with the angle w when compared to the logistic and inverted logistic setting. The evidence suggests that for many distributions the shape parameter will vary with the angle w in practice to some degree. This is likely due to the fact that the rate of convergence of R|W = w to the gamma form can depend on w either explicitly (as in the logistic case), or practically through a constant term (as in the inverted logistic case). When the dependence is strong for the inverted logistic distribution there are large parts of the simplex where there is insufficient data to estimate the local model.
The second issue with the shape parameter is that there will be values of w on the simplex S d−1 such that α(w) ≤ 0. Figure 14 illustrates these regions, demonstrating that they appear to be small in volume and thus not important, especially as the dimension d increases. In both the d = 2 and d = 3 cases, we see that the region is almost negligible for ρ < 0.7. Furthermore, simulations from our models often do not produce points in these regions because when the dependence is high, there are very few points W near these boundaries that are accompanied by large values of R. Gaussian data and gauge function are used for column 1, logistic in column 2, inverted logistic in column 3.
From top to bottom, dependence parameters are such that joint dependence is increasing. Dots represent point estimates, and outer solid lines approximate 95% pointwise confidence intervals. Finally, we offer evidence that extrapolation and probability estimation in these potentially problematic regions is not an issue when compared to other contemporary multivariate extremes methods. In both the d = 2 and d = 3 setting, we fix ρ = 0.7 and consider rectangular sets which overlap with the regions where α(w) ≤ 0. For d = 2 we estimate the probability of lying in the set (5, 7) × (0, 0.75), and for d = 3 we estimate the probability of lying in the set (5, 10) × (0, 2) × (0, 2) (see Figure 15). The resulting probability estimates are presented in the boxplots provided in Figure 16. We compare our geometric approach to the conditional extremes model of Heffernan and Tawn (2004). For d = 3 the results are comparable in terms of bias and variance, while for d = 2, our method is unbiased but with slightly higher variability than conditional extremes.   True probability values are displayed by the red dashed line.

D Gauge function corresponding to the asymmetric logistic distribution
The asymmetric logistic distribution (Tawn, 1990) is a max-stable distribution with exponent func- where 2 D \ ∅ is the power set of D = {1, . . . , d}. The parameters satisfy γ C ∈ (0, 1] and for each j there is a marginal condition that C∈2 D \∅ θ j,C = 1, with θ j,C = 0 if j ∈ C. Notice that for singleton sets the values γ {1} , . . . , γ {d} are irrelevant; for convenience we assume that each of these is equal to 1.
The parameters θ j,C play no role in determining the structure of the gauge function, except for where they lead to certain sets of variables being discounted as not taking extreme values simultaneously. We therefore consider a modification of V that allows us to derive the associated gauge function in a simpler manner. Define with γ C as before, and 1 Variables in C can be simultaneously extreme 0 Variables in C cannot be simultaneously extreme.
In other words, θ C = 1 when the corresponding spectral measure H places mass on B C . For each variable j = 1, . . . , d there must be at least one θ C = 1 for j ∈ C. The distribution function exp{−V * (e x )} is a multivariate max-stable distribution with asymmetric logistic type dependence and Gumbel margins with non-zero location and unit scale. Distributions with unit-scale Gumbel margins have the same limit sets as distributions with exactly unit exponential margins (Nolde and Wadsworth, 2022). There is one further difference between the models defined by exponent functions (15) and (16): when θ C = 0 in (16), this "switches off" the group of variables corresponding to C with no effect on other groups. With (15), if θ j,C = 0 for a single j ∈ C, then the set of variables that can be simultaneously extreme corresponds to θ C\{j} , meaning that there could be two (or more) different γ parameters corresponding to the same set of variables. The function V * in (16) is restricted to a single γ parameter per group of variables.
We define the collection of sets C J to be all of those containing the index set J ⊂ {1, . . . , d}. The partial derivatives of V * (z) are, for k, l ∈ {1, 2, 3}: The density of the distribution in non-centered Gumbel margins is exp{−V * (e tx )}e t d j=1 x j π∈Π s∈π −V s (e tx ), and the gauge function is derived through the minimum of all non-zero terms coming from the partial derivatives. To this end, we define C + J to be the collection of all sets that contain the index set J and for which θ J = 1. For example, when d = 3 and θ {1} = θ {1,2,3} = 0 and all other θ J = 1 then C + {1} = {{1, 2}, {1, 3}}. For d = 3 this yields the following expression for g(x): From the derivation and form of this gauge function, we observe that the general form for any

E.2 Additive mixing
Some examples of shapes obtainable by additively mixing gauge functions with d = 2 are presented in Figure 3 of the main manuscript. Figure 18 displays some examples with d = 3.  Figure 19 shows examples of the three datasets for the d = 2 simulation study, and three sets of interest B 1 = (10, 12) × (10, 12), B 2 = (10, 12) × (6, 8), and B 3 = (10, 12) × (2, 4). Figure 20 displays estimates of the limit set shape via non-parametric estimation of g using rolling-windows quantiles, as described in Section 4.1 of the main paper, and parametric estimation from the MLE of the gauge function parameters. We display parametric estimates using both all fits from the correct gauge function, and only the fits where the correct gauge function returned the minimum AIC. For distributions (I), (II) and (III), this is 82.5%, 41.5% and 35%, respectively.

F Additional simulation study figures
The non-parametric estimates do not quite join to the axes because we ascribe the rolling-windows estimate of r 0 (w) to the center of each window for w. Figure 21 displays boxplots relating to Figure 5 of the main manuscript. The left panel represents estimates of Pr(X ∈ B 3 ) for distribution (I), using a log-scale for clarity. This was obtained using a threshold k as described in Section 5.1. The right panel of Figure 21 displays more detailed information for the estimates of Pr(X ∈ B 1 ) under distribution (III).  approach at a high threshold as described in the text. Right: estimates of Pr(X ∈ B 1 ) for distribution (III) using two different thresholds for the geometric approach (green). Estimates for two different threshold from the conditional approach are in dark blue, and from HRV in turquoise.