Parametric Integration by Magic Point Empirical Interpolation

We derive analyticity criteria for explicit error bounds and an exponential rate of convergence of the magic point empirical interpolation method introduced by Barrault et al. (2004). Furthermore, we investigate its application to parametric integration. We find that the method is well-suited to Fourier transforms and has a wide range of applications in such diverse fields as probability and statistics, signal and image processing, physics, chemistry and mathematical finance. To illustrate the method, we apply it to the evaluation of probability densities by parametric Fourier inversion. Our numerical experiments display convergence of exponential order, even in cases where the theoretical results do not apply.


Introduction
At the basis of a large variety of mathematical applications lies the computation of parametric integrals of the form where Ω ⊂ C d is a compact integration domain, P a compact parameter set, (Ω, A, µ) a measure space with µ(Ω) < ∞ and h : P × Ω → C a bounded function. A prominent class of such integrals are parametric Fourier transforms (2) f q (z) = Ω e i z,x f q (x) dx for all p = (q, z) ∈ P with a parametric family of complex functions f q with compact support Ω. Today, Fourier transforms lie at the heart of applications in optics, electric engineering, chemistry, probability, partial differential equations, statistics and finance. The application of Fourier transforms even sparked many advancements in those disciplines, underlining its impressive power. The introduction of Fourier analysis to image formation theory in Duffieux (1946) marked a turning point in optical image processing, as outlined for instance in Stark (1982). The application of Fourier transform to nuclear magnetic resonance in Ernst and Anderson (1966) was a major breakthrough in increasing the sensitivity of NMR, for which Richard R. Ernst was awarded the Nobel prize in Chemistry in 1991. Fourier analysis also plays a fundamental role in statistics, in particular in the spectral representation of time series of stochastic data, see Brockwell and Davis (2002). For an elucidation of other applications impacted by Fourier theory we recommend appendix 1 of Kammler (2007). For parametric Fourier integrals of form (2) with fixed value of q and z being the only varying parameter, efficient numerical methods have been developed based on discrete Fourier transform (DFT) and fast Fourier transform (FFT). The latter was developed by Cooley and Tukey (1965) to obtain the Fourier transform f (z) for a large set of values z simultaneously. In regard to (2), this is the special case of a parametric Fourier transform where q is fixed and z varies over a specific set. The immense impact of FFT highlights the exceptional usefulness of efficient methods for parametric Fourier integration.
Shifting the focus to other examples, parametric integrals arise as generalized moments of parametric distributions in probability, statistics, engineering and computational finance. These expressions often appear in optimization routines, where they have to be computed for a large set of different parameter constellations. Prominent applications are model calibration and statistical learning algorithms based on moment estimation, regression and Expectation-Maximization (EM), see for instance Hastie et al. (2009).
The efficient computation of parametric integrals is also a cornerstone of the quantification of parameter uncertainty in generalized moments of form (1), which leads to an integral of the form where P is a probability distribution on the parameter space P.
In all of these cases, parametric integrals of the form (1) have to be determined for a large set of parameter values. Therefore, efficient algorithms for parametric integration have a wide range of applicability. In the context of reduced basis methods for parametric partial differential equations, a magic point empirical interpolation method has been developed by Barrault et al. (2004) to treat nonlinearities that are expressed by parametric integrals. The applicability of this method to the interpolation of parametric functions has been demonstrated by Maday et al. (2009).
In this article, we focus on the approximation of parametric integrals by magic point empirical interpolation in general, a method that we call magic point integration and -provide sufficient conditions for an exponential rate of convergence of magic point interpolation and integration in its degrees of freedom, accompanied by explicit error bounds, -translate these conditions to the special case of parametric Fourier transforms, which shows the broad scope of the results, -empirically prove efficiency of the method in a numerical case study. 1 The remainder of the article is organized as follows. In section 2 we revisit the magic point empirical interpolation method from Barrault et al. (2004) and present the related integral approximation, which can be perceived from two different perspectives. On the one hand it delivers a quadrature rule for parametric integrals and, on the other, an interpolation method for parametric integrals in the parameter space. In section 3 we provide analyticity conditions on the parametric integrals that imply exponential order of convergence of the method. We focus on the case of parametric Fourier transforms in the subsequent section 4. In a case study we discuss the numerical implementation and its results in section 5. Here, we use the magic point integration method to evaluate densities of a parametric class of distributions that are defined through their Fourier transforms.

Magic Point Empirical Interpolation for Integration
We now introduce the magic point empirical interpolation method for parametric integration to approximate parametric integrals of the form (1). Before we closely follow Maday et al. (2009) to describe the interpolation method, let us state our basic assumptions that ensure the well-definedness of the iterative procedure.
Assumptions 2.1. Let (Ω, . ∞ ) and (P, . ∞ ) be compact, P × Ω ∋ (p, z) → h p (z)bounded and p → h p be sequentially continuous, i.e. for every sequence p i → p we have h p i − h p ∞ → 0. Moreover, there exists p ∈ P such that the function h p is not constantly zero.
For M ∈ N, the method delivers iteratively the magic points z * 1 , . . . , z * M , functions θ M 1 , . . . , θ M M and constructs the magic point interpolation operator which allows us to define the magic point integration operator In the first step, M = 1, we define the first magic parameter p * 1 , the first magic point z * 1 and the first basis function q 1 by Notice that thanks to Assumptions 2.1, these operations are well-defined.
For M ≥ 1 and 1 ≤ m ≤ M we define where we denote by (B M ) −1 jm the entry in the jth line and mth column of the inverse of matrix B M . By construction, B M is a lower triangular matrix with unity diagonal and is thus invertible.
Then, recursively, as long as there are at least M linearly independent functions in {h p | p ∈ P}, the algorithm chooses the next magic parameter p * M according to a greedy procedure. It selects the parameter so that h p * M is the function in the set {h p | p ∈ P} which is worst represented by its approximation with the previously identified M − 1 magic points and basis functions. So the M th magic parameter is In the same spirit, select the M th magic point as The M th basis function q M is the residual r M , normed to 1 when evaluated at the new magic point, i.e.
Notice the well-defined of the operations in the iterative step thanks to Assumptions 2.1 and the fact that the denominator in (13) is zero only if all functions in {h p | p ∈ P} are perfectly represented by the interpolation I M −1 , in which case they span a linear space of dimension M − 1 or less.
Remark 2.2 (Empirical quadrature rule for integrating parametric functions). Magic point integration is an interpolation method for integrating a parametric family of integrands over a compact domain. From this point of view, the magic point empirical interpolation of Barrault et al. (2004) provides a quadrature rule for integrating parametric functions. The weights are Ω θ M m (z) dz and the nodes are the magic points z * m for m = 1, . . . , M . As in adaptive quadrature rules, these quadrature nodes are not fixed in advance. Whereas adaptive quadrature rules iteratively determine the next quadrature node in view of integrating a single function, the empirical integration method (5) tailors itself to the integration of a given parametrized family of integrands.
Remark 2.3 (Interpolation of parametric integrals in the parameters). For each m = 1, . . . , M the function θ M m is a linear combination of snapshots h p * j for j = 1, . . . , M with coefficients β m j , which may be iteratively computed. We thus may rewrite formula (5) as Thus, for an arbitrary parameter p ∈ P, the integral Ω h p (z) dz is approximated by a linear combination of integrals Ω h p * j (z) dz for the magic parameters p * j . In other words, I M (h) is an interpolation of the function p → Ω h p (z) dz. Here, the interpolation nodes are the magic parameters p * j and the coefficients are given by In contrast to classic interpolation methods, the "magic nodes" are tailored to the parametric set of integrands. As a striking advantage of this approach, the interpolation nodes are chosen independently of the dimensionality of the parameter space.
A discrete approach to empirical interpolation has been introduced by Chaturantabut and Sorensen (2010). Whereas the empirical interpolation is designed for parametric functions, the input data for the Discrete Empirical Interpolation Method (DEIM) is discrete. A canonical way to use DEIM for integration is to first choose a fixed grid in Ω for a discrete integration of all parametric integrands. Then, DEIM can be performed on the integrands evaluated on this grid.
In contrast, using magic point empirical interpolation for parametric integration separates the choice of the integration grid from the selection of nodes p * m . For each function, a different integration discretization may be used. Indeed, in our numerical study, we leave the discretization choices regarding integration to Matlab's quadgk routine. Fixing a grid for discrete integration beforehand might for example become advantageous when the domain of integration Ω is high-dimensional.

Convergence Analysis of magic point integration
Theorem 2.4 in Maday et al. (2009) relates the approximation error of magic point empirical interpolation to the best n-term approximation. We identify two generic cases for which this result implies exponential convergence of magic point interpolation and integration. In the first case, the set of parametric functions consists of univariate functions that are analytic on a specific Bernstein ellipse. In the second case, the functions may be multivariate and do not need to satisfy higher order regularity. The parameter set now is a compact subset of R and the regularity of the parameter dependence allows an analytic extension to a specific Bernstein ellipse in the parameter space.
In order to formulate our analyticity assumptions, we define the Bernstein ellipse B([−1, 1], ̺) with parameter ̺ > 1 as the open region in the complex plane bounded by the ellipse with foci ±1 and semiminor and semimajor axis lengths summing up to ̺. Moreover, we define for b < b ∈ R the generalized Bernstein ellipse by where the transform τ [b,b] : C → C is given by For Ω ⊂ R we set Definition 3.1. Let f : X 1 × X 2 → C with X m ⊂ C nm for m = 1, 2. We say f has the analytic property with respect to X 1 ⊂ C in the first argument, if X 1 ⊂ X 1 and there exist functions f 1 : X 1 × X 2 → C and f 2 : and f 1 has an extension f 1 : X 1 × X 2 → C such that for all fixed x 2 ∈ X 2 the mapping We define the analytic property of f with respect to X 2 in the second argument analogously.
We denote and for all p ∈ P and M ∈ N, where I M is the magic point interpolation operator given by the iterative procedure from section 2. Hence, where z * m are the magic points and θ M m are given by (9) for every m = 1, . . . , M . Theorem 3.2.
Let Ω ⊂ C d , P and h : P × Ω → C be such that Assumptions 2.1 hold and one of the following conditions is satisfied for some ̺ > 4, (a) d = 1, i.e. Ω is a compact subset of R, and h has the analytic property with respect to B(Ω, ̺) in the second argument, (b) P is a compact subset of R, and h has the analytic property with respect to B(P, ̺) in the first argument. Then there exists a constant C > 0 such that for all p ∈ P and M ∈ N, Proof. Assume (a). Thanks to an affine transformation we may without loss of generality assume Ω = [−1, 1]. As assumed, there exist functions f 1 : P × Ω → C and f 2 : Ω → C such that for all p, z ∈ P × Ω and f 1 has an extension f 1 : P × B(Ω, ̺) → C that is bounded and for every fixed p ∈ P is analytic in the Bernstein ellipse B(Ω, ̺). We exploit the analyticity property to relate the approximation error to the best n-term approximation of the set U := {h p | p ∈ P}. This can conveniently be achieved by inserting an example of an interpolation method that is equipped with exact error bounds, and we choose Chebyshev projection for this task. From Theorem 8.2 in Trefethen (2013) we obtain the explicit error bound, with constant c(f 1 ) := 2 ̺−1 sup (p,z)∈P×B(Ω,̺) f 1 (p, z) . The Chebyshev projection of the family of functions f 1 (p, ·) induces an approximation of the family of functions h p , along with an N -dimensional function space U N , simply by setting for every z ∈ Ω. From (25), the approximation I K N inherits the error bound, with constant c 1 := c(f 1 ) max z∈Ω |f 2 (z)|. Using (27), we can apply the general convergence result from Theorem 2.4 in Maday et al. (2009). Consulting their proof, we realize that (23) follows by integration. The proof follows analogously under assumption (b).
Remark 3.3. The proof makes the constant C explicit. Under assumption (a) with representation (24), it is given by Under assumption (b), an analogous constant can be derived.
Remark 3.4. Consider a multivariate integral of the form with compact sets P ⊂ R D and Ω ⊂ R d equipped with finite Borel-measures µ 1 and µ 2 . Then, the application of the magic point empirical interpolation as presented in section 2 yields and, under the assumptions of Theorem 3.2, we obtain If the domain Ω is one dimensional and z → h p (z), enjoys desirable analyticity properties, the approximation error of magic point integration decays exponentially in the number M of interpolation nodes, independently of the dimension of the parameter space P. Vice versa, if the parameter space is one dimensional and p → h p (z) enjoys desirable analyticity properties, the approximation error decays exponentially in the number M of interpolation nodes, independently of the dimension of the integration space Ω. The reason why magic point interpolation and integration do not suffer from the curse of dimensionality -in contrast to standard polynomial approximation -is owed to the fact that classic interpolation methods are adapted to very large function classes, whereas magic point interpolation is adapted to a parametrized class of functions.
In return, this highly appealing convergence comes with an additional cost: The offline phase involves a loop over optimizations in order to determine the magic parameters and the magic points. Whereas the online phase is independent of the dimensionality, the offline phase is thus affected. Let us further point out that Theorem 3.2 is a result on the theoretical iterative procedure as described in section 2. As we will discuss in section 5.1 below, the implementation inevitably involves additional problem simplifications and approximations, in order to perform the necessary optimizations. In particular, instead of the whole parameter space a training set is fixed in advance. For this more realistic setting, rigorous a posteriori error bounds have been developed for the empirical interpolation method, see Eftang et al. (2010). These bounds rely on derivatives in the parameters and straightforwardly translate to an error bound for magic point integration.

Magic Points for Fourier Transforms
In various fields of applied mathematics, Fourier transforms play a crucial role and parametric Fourier inverse transforms of the form (32) f q (x) = 1 2π R e −izx f q (z) dz need to be computed for different values p = (q, x) ∈ P = Q × X , where the function z → f q (z) := R e izy f q (y) dy is well defined and integrable for all q ∈ Q. A prominent example arises in the context of signal processing, where signals are filtered with the help of short-time Fourier transform (STFT) that is of the form with a window function w and parameter b. One typical example for windows are the Gauß windows, w σ (t) = 1 √ 2πσ 2 e −t 2 /(2σ 2 ) , where an additional parameter appears, namely the standard deviation σ of the normal distribution. The STFT with a Gauß window has been introduced by Gabor (1946). His pioneering approach has become indispensable for time-frequency signal analysis. We refer to Debnath and Bhatta (2014) for historical backgrounds.
We consider the truncation of the integral in (32) to a compact integration domain Ω = [Ω, Ω] ⊂ R and choose the same domain Ω for all p = (q, x) ∈ P = Q × X . Thus, we are left to approximate for all p = (q, x) ∈ P the integral Then, according to section 2, we consider the approximation of I(h p ) by magic point integration, i.e. by where z * m are the magic points and θ M m are given by (9) for every m = 1, . . . , M . In those cases where the analyticity properties of q → f q (z) or of z → f q (z) are directly accessible, Theorem 3.2 can be applied to estimate the error sup (q,x)∈Q×X |I(h (q,x) )− I M (h)(q, x)|. The following corollary offers a set of conditions in terms of existence of exponential moments of the functions f q .
for ̺ and find that for the Bernstein ellipse B(Ω, ̺(η)) is contained in the strip of analyticity of f q and by the choice of η we have ̺(η) > 4. Hence assumption (a) of Theorem 3.2 is satisfied. In regard to Remark 3.3 we also obtain the explicit form of the constant C(η), which proves the claim.
Similarly, we consider the case where Q is a singleton and Under this assumption we obtain an interesting additional assertion if the integration domain Ω is rather small. This case occurs for example for approximations of STFT.
The proof of the corollary follows similarly to the proof of Corollary 4.1.

Case Study
5.1. Implementation and complexity. The implementation of the algorithm requires further discretizations. For the parameter selection, we replace the continuous parameter space P by a discrete parameter cloud P disc . Additionally, for the selection of magic points we replace Ω by a discrete set Ω disc , instead of considering the whole continuous domain. Each function h p is then represented by its evaluation on this discrete Ω disc and is thus represented by a finite-dimensional vector, numerically.
The crucial step during the offline phase is finding the solution to the optimization problem arg max In a continuous setup, problem (44) is solved by optimization routines. In our discrete implementation, however, we are able to consider all magic parameter candidates in the discrete parameter space P disc and all magic point candidates in the discrete domain Ω disc to solve problem (44). This results in a complexity of O(M · |P disc | · |Ω disc |) during the offline phase for identifying the basis functions and magic points of the algorithm. We comment on our choices for the dimensionality of the involved discrete sets in the numerical section, later. Before magic point integration can be performed, the quantities Ω θ M m (z) dz need to be computed for m = 1, . . . , M . The complexity of this final step during the offline phase depends on the number of integration points that the integration method uses.

Tempered Stable Distribution.
We test the parametric integration approach on the evaluation of the density of a tempered stable distribution as an example. This is a parametric class of infinitely divisible probability distributions. A random variable X is called infinitely divisible, if for each n ∈ N there exist n independent and identically distributed random variables whose sum coincides with X in distribution. The rich class of infinitely divisible distributions, containing for instance the Normal and the Poisson distribution, is characterized through their Fourier transforms by the Lévy-Khintchine representation. Using this link, a variety of infinitely divisible distributions is specified by Fourier transforms. The class of tempered stable distributions is an example of this sort, namely it is defined by a parametric class of functions, which by the theorem of Lévy-Khintchine are known to be Fourier transforms of infinitely divisible probability distributions. Its density function, however, is not explicitly available. In order to evaluate it, a Fourier inverse has to be performed numerically. As we show below, magic point integration can be an adequate way to efficiently compute this Fourier inversion for different parameter values and at different points on the density's domain.
Following Küchler and Tappe (2014), parameters define the distribution η q on (R, B(R)) that we call a tempered stable distribution and write if its characteristic function ϕ q (z) := R e izx η q (dx) is given by with Lévy measure The tempered stable distribution is also defined for β + , β − ∈ (1, 2) where the characteristic function is given by a similar expression as in (47) The resulting distribution γ = CGMY(C, G, M, Y ) is also known as CGMY distribution and is well established in finance, see Carr et al. (2002). The density function of a tempered stable or a CGMY distribution, respectively, is not known in closed form. With q = (C, G, M, Y ) ∈ Q of (49) and z ∈ R, its Fourier transform, however, is explicitly given by wherein Γ denotes the gamma function. By Fourier inversion, the density f q can be evaluated, 5.3. Numerical Results. We restrict the integration domain of (52) to Ω = [0, 75] and apply the parametric integration algorithm on a subset of (53) P = Q × R, specified in Table 1. We draw 4.000 random samples from P respecting the intervals bounds of Table 1 and run the offline phase until an L ∞ accuracy of 10 −12 is reached. The error decay during the offline phase is displayed by Figure 1. We observe exponential error decay reaching the prescribed accuracy threshold at M = 40. Let us have a closer look at some of the basis functions q m that are generated during the offline phase. Figure 2 depicts five basis functions that have been created at an early stage of the offline phase (left) together with five basis functions that have been identified towards the end of it (right). Note that all plotted basis functions are evaluated on a relatively small subinterval of Ω. They are numerically zero outside of it. Interestingly, the functions in both sets have intersections with the Ω axis rather close to the origin. Such intersections mark the location of magic points. Areas on Ω where these intersections accumulate thus reveal regions where the approximation accuracy of the algorithm is low. In these regions, the shapes across all parametrized integrands seem to be most diverse.
In the right picture we observe in comparison that at a later stage in the offline phase, magic points z * m are picked farer away from the origin, as well. We interpret this observation by assuming that the more magic points are chosen close to the origin, the better the algorithm is capable of approximating integrands more precisely there. Thus, its focus shifts towards the right for increasing values of M where now smaller variations attract its attention.
We now test the algorithm on parameters not contained in the training set. To this extent we randomly draw 1.000 parameter sets . . , 1.000, uniformly distributed from the intervals given by Table 1. For each The L ∞ ((p i ) i=1,...,1.000 ) error for each m = 1, . . . , M is illustrated by Figure 3. The setup of the numerical experiment does not fall in the scope of our theoretical result in several respects. We discretized both the parameter space and the set of magic point candidates. While Table 1 ensures a joint strip of analyticity R+i(−1, 1) that all integrands h p share, a Bernstein ellipse B(Ω, ̺) with ̺ > 4 on which those integrands are analytic does not exist. In spite of all these limitations, we still  Table 1 the CGMY density is evaluated by the Parametric Integration method and compared to numerical Fourier inversion via Matlab's quadgk routine. the L ∞ error decay is displayed. observe that the exponential error decay of the offline phase is maintained. 2 From this we draw two conclusions. First, both Ω disc and P disc appear sufficiently rich to represent their continuous counterparts. Second, the practical use of the algorithm extends beyond the analyticity conditions imposed by Theorem 3.2. Figure 4 presents the absolute errors and the relative errors for each parameter set p i , individually, for the maximal value of M . Note that only relative errors for CGMY density values larger than 10 −3 have been plotted to exclude the influence of numerical noise. While individual outliers are not present among the absolute 1 M 1 4.5 8 x -1 0 1 x -1 0 1 x -1 0 1 Figure 5. An illustration of those randomly drawn parameter constellations that resulted in the ten worst (orange) and the ten best (green) absolute errors. The empty blue circles are the magic parameters selected during the offline phase.
deviations, they dominate the relative deviations in contrast. This result is in line with the objective function that the algorithm minimizes. Finally, Figure 5 displays all magic parameters identified during the offline phase together with those randomly drawn parameter constellations that resulted in the ten smallest absolute errors (green dots) and the ten largest absolute errors (orange dots). We observe that orange parameter constellations are found in areas densely populated by magic parameters whereas green parameter constellations often do not have magic parameters in their immediate neighborhood, at all. This result may surprise at first. On second thought, however, we understand that areas where magic parameters accumulate mark precisely those locations in the parameter space where approximation is especially challenging for the algorithm. Integrands associated with parameter constellations from there exhibit the largest variation. Approximation for white areas on the other hand is already covered by the previously chosen magic parameters. Consequently, green dots are very likely to be found there.