Non-Gaussian gravitational clustering field statistics

In this work we investigate the multivariate statistical description of the matter distribution in the nonlinear regime. We introduce the multivariate Edgeworth expansion of the lognormal distribution to model the cosmological matter field. Such a technique could be useful to generate and reconstruct three-dimensional nonlinear cosmological density fields with the information of higher order correlation functions. We explicitly calculate the expansion up to third order in perturbation theory making use of the multivariate Hermite polynomials up to sixth order. The probability distribution function for the matter field includes at this level the two-point, the three-point and the four-point correlation functions. We use the hierarchical model to formulate the higher order correlation functions based on combinations of the two-point correlation function. This permits us to find compact expressions for the skewness and kurtosis terms of the expanded lognormal field which can be efficiently computed. The method is, however, flexible to incorporate arbitrary higher order correlation functions which have analytical expressions. The applications of such a technique can be especially useful to perform weak-lensing or neutral hydrogen 21 cm line tomography, as well as to directly use the galaxy distribution or the Lyman-alpha forest to study structure formation.


INTRODUCTION
The cosmological matter distribution encodes the information of the origin of our Universe and the processes which lead to structure formation. A precise understanding and modeling of its statistics is crucial to extract the cosmological information from observational data and to ultimately understand cosmic evolution.
The Universe is being scrutinized with unprecedented accuracy. Many excellent astronomical surveys have been launched in the recent past and ongoing and upcoming projects are on the way to perform the most ambitious map of the Universe up-to-date.
On the other hand the Canada-France-Hawaii Tele-scope Legacy Survey (CFHTLS) 6 (Hoekstra et al. 2006), the VISTA (Visible and Infrared Survey Telescope for Astronomy) 7 and the PANoramic Survey Telescope And Rapid Response System (Pan-STARRS) 8 (Kaiser & Pan-STARRS Team 2002), and the planned Dark Energy Survey (DES) 9 (The Dark Energy Survey Collaboration 2005) or the Joint Dark Energy Mission (JDEM) 10 from NASA-DOE and Dark UNiverse Explorer (DUNE) from CNES (Crotts et al. 2005;Réfrégier et al. 2006) will provide the first weak lensing surveys covering very large sky areas and depth. Also the Lyman alpha forest will be a useful observable to study cosmology using for instance the BOSS survey (the matter power-spectrum has already been measured with the SDSS, see McDonald et al. 2005).
The neutral hydrogen 21-cm line will provide a new astronomical window to study structure formation in the Universe. Some of the most notable projects are the Giant Metre-wave Radio Telescope (GMRT) 11 (Pen et al. 2008), the Precision Array to Probe Epoch of Reionization (PAPER) 12 (Parsons et al. 2010), the LOw Frequency ARray (LOFAR) 13 (Falcke et al. 2007) and the Murchison Widefield Array (MWA) 14 (Lonsdale et al. 2009).
In summary, an avalanche of astronomical data is being collected to study its structure and history based on different observables. In order to extract valuable cosmological information not only a careful modeling of the systematics of the observation process and the nature of the observable is required, but also a precise modeling of the underlying signal. We focus here on the cosmological matter density field.
Different approaches can be found in the literature to reconstruct the large-scale structure. Geometrical reconstruction methods try to approximately capture the higher order statistics beyond the two-point correlation function in an effective way through a geometry-based prescription to form structures from a point source distribution. The salient and pervasive foamlike pattern of the cosmic web has led to develop methods like the Voronoi or Delaunay tessellations (see for example van de Weygaert et al. 2009;Aragón-Calvo et al. 2007;Schaap & van de Weygaert 2000). On the other hand one can find reconstruction methods based on a statistical approach. The advantage of the statistical methods with respect to the geometrical ones is that one can clearly specify the assumptions made on the matter field and the observable in form of probability distribution functions. The statistical methods could be more suitable to extract statistical quantities like the powerspectrum. This has been succesfully done in the Cosmic Microwave Background (CMB) for which the fluctuations can be assumed to be Gaussian distributed (Eriksen et al. 2007). The disadvantage is that the level of complexity that such methods can achieve has always been limited as the formulation of the probability distribution functions and its applications to reconstruction methods is relatively complex and computer-intensive. Indeed, any complex realistic formulation seemed to be untreatable. For a long time the Wiener-filter, also-called least-squares filter, has been the only available method in Astronomy to incorporate the statistical information in the reconstruction method (see for example Bunn et al. 1994;Zaroubi et al. 1995;Fisher et al. 1995;Webster et al. 1997;Zaroubi et al. 1999;Erdogdu et al. 2004Erdogdu et al. , 2006. Less attention was paid in the astrophysics community to the nonlinear version of the least-squares filter proposed by Tarantola & Valette (1982). Here the data model which relates the measurements to the seeked signal is extended to be nonlinear. Probably the first group applying this method in an astrophysical context was Chen et al. (1998) to map the interstellar absorption structures in the galactic plane. Pichon et al. (2001) proposed to use this nonlinear reconstruction scheme to recover the cosmic density and velocity field traced by the Lyman alpha forest. The drawback of Tarantola & Valette (1982)'s approach is that it requires both a Gaussian prior (with a nonlinear transformation) and a Gaussian likelihood for the distribution of the observable. This is a too crude assumption for many observables, like for instance a galaxy distribution. Recently the Poisson-lognormal (and Gaussian-lognormal) model was proposed in a Bayesian framework to recover the cosmic density field . In this study it was shown that the lognormal prior is in good agreement with the underlying matter field extracted from N-body simulations in the large overdense regions (> 10 3 ), but fails to fit the matter statistics in the underdense regions. One of the advantages of this method with respect to the nonlinear least-squares approach is that it can deal with non-Gaussian likelihoods. Another important point of the Bayesian approach proposed in ) is that it can be easily extended to sample full posterior distributions (see the works by .
Nevertheless, non of the above mentioned statistical methods includes any information beyond the two-point correlation function. As gravitational clustering forms nonlinear structures the Universe becomes inhomogenous and complex patterns arise which encode high order statistics.
The purpose of this work is to extend the multivariate characterization of matter beyond the two-point correlation function to incorporate higher order statistics. We therefore relax the lognormal assumption and introduce the multivariate Edgeworth expansion which leads to additional terms describing the skewness and kurtosis of the field with respect to the lognormal distribution function. This work is based on the univariate Edgeworth expansion introduced by Scherrer & Bertschinger (1991);; Bernardeau & Kofman (1995) and Colombi (1994). The Edgeworth expansion we find deviates from the trivial generalization of the univariate case to the multivariate case. We use the hierarchical model (see Fry & Peebles 1978;Fry 1984Fry , 1986Balian & Schaeffer 1989) to formulate the three-point and fourpoint correlation functions which permits us to find particular expressions for the skewness and kurtosis terms. The expressions we find are compact due to the symmetries introduced by the hierarchical model and can be computed by means of convolutions with fast Fourier transforms (fft's). As the hierarchical model may fail at certain scales and regimes ) this work could be extended incorporating more complex higher order correlation functions which include galaxy biasing or redshift distortions (see the works by Scoccimarro et al. 1998;Taruya & Soda 1999;Matsubara 2003;Zheng 2004;Matsubara 2008) and to perform topological and morphological studies (see for example Matsubara & Yokoyama 1996;Gott et al. 2008;James et al. 2009).
We believe that the method introduced in this paper can be very useful to study cosmological structures in the range between the quasi-nonlinear and the nonlinear regime. It could be interesting to apply higher order statistics to galaxy redshift surveys, to weak-lensing surveys, to the Lyman alpha forest or to the 21 cm line. We would like to warn the reader that this work is still in a development phase as higher order correlation models need to be tested and many numerical studies still have to be done. This is the first of a series of works in which we will analyze the statistical description of gravitational clustering. This paper is structured as follows. In the next section the gravitational clustering statistics will be analyzed in great detail (section 2). We will start reviewing the work done so far for the univariate matter distribution (section 2.2) and then present the multivariate case (section 2.3). This will lead us to a multivariate Edgeworth expansion of the lognormal field up to third order in perturbation theory including two-point, three-point and four-point correlation functions. Then we will present the hierarchical model (section 2.4) and use the expression for the three-point correlation function to calculate the skewness and kurtosis terms in the Edgeworth expansion (section 2.5). A detailed calculation can be found in the appendix. Finally we will present the summary and conclusions of this work.

GRAVITATIONAL CLUSTERING FIELD STATISTICS
In this section we will study the matter field statistics produced by gravitational clustering. We start with a physical motivation followed by the review of the univariate non-Gaussian statistics. Then we introduce the multivariate Edgeworth expansion of the Lognormal field. Finally we present the hierarchical model and calculate the skewness and kurtosis terms of the Edgeworth expansion.

Physical motivation
Let us divide the Universe into Nc cells and assign to each cell i a position ri, a matter density ρi and a peculiar velocity vi. The continuity equation relates the evolution of the matter content in the Universe to its peculiar velocity field: with t being the cosmic time, a the scale factor, r the set of positions ({r1, . . . , rN c }), ρ the matter density field ({ρ1, . . . , ρN c }), v the peculiar velocity field ({v1, . . . , vN c }) and d/dt = ∂/∂t + 1/a (v · ∇r)ρ(r) the total derivative. We can follow matter particles until they start crossing-over (in this regime particles can have different peculiar velocities at the same position) and form caustics. Before this occurs we can write the formal solution to Eq. (1) as: with s being the logarithm of the normalized density: and the matter overdensity field given by: δM = ρ/ ρ − 1. The ensemble averages are used at this stage only to denote the mean of the variable. Note that the field s does not have zero mean, but is given by: µ s ≡ s = ln ρ − ln ρ . It is convenient to define a field Φ with zero mean ( Φ = 0): Assuming that Φ is a Gaussian random field leads to a lognormal distributed density field (see Coles & Jones 1991). Note however, that Lagrangian perturbation theory -which is known to give a good approximation of gravitational clustering until shell crossing starts (see e.g. Buchert & Ehlers 1993;Buchert 1994;Bouchet et al. 1995)-deviates from the lognormal distribution already in the linear Zel'dovich (1970) approximation (see Padmanabhan & Subramanian 1993;Bernardeau & Kofman 1995). Furthermore, after structures start to virialize the peculiar velocity field will be strongly modified and Lagrangian perturbation theory will start to fail dramatically. Colombi (1994) suggested to use the formalism developed by  and Bernardeau & Kofman (1995) to study the departures from the lognormal distribution function including higher order correlation functions in the univariate matter distribution (see also the work on a generalized lognormal distribution by Szapudi et al. 2000).

Univariate case
In this subsection we will revise the matter statistics for the one-dimensional probability distribution function as developed in the works by ; Bernardeau & Kofman (1995) and Colombi (1994). For a general overview on asymptotic statistical techniques see Berkowitz & Garner (1970) and Barndorff-Nielsen & Cox (1989). Other univariate matter field distribution functions have been proposed. They are however not trivially extendable to the multivariate case. Either they do not include higher order correlations, but are extracted from fitting the univariate matter distribution based on numerical N-body simulations (e.g. Miralda-Escudé et al. 2000), or a distribution function function is expanded using the variance as an univariate parameter (Gaztañaga et al. 2000). For this reason we will consider only the above mentioned approach based on the expansion of the lognormal distribution function. Let us define the quantity ν with zero mean and unity variance: with σ 2 = Φ 2 being the variance of Φ (the relation between the variance of Φ and the variance of the matter overdensity δM is derived in appendix B). Higher order moments of ν can be found by calculating the ensemble average of powers of ν over the probability distribution function P (ν): with n being the order of the moment. Please note that the moment µ refers to the variable ν and not to the variable s. The moment generating function is given by: Subsequent derivatives of Mν(t) at the origin t = 0 yield the moments: The cumulant generating function is given by: with κn being the cumulants or connected moments: The cumulants can be obtained from the relation between the moment and cumulant generating functions (see for example Bernardeau et al. 2002): or equivalently: C(t) = ln (Mν (t)). Hence, the cumulants can be calculated by: It is however, more convenient to use expression (11) to relate the moments to the cumulants: This equation yields for the first order moments: µ0 = 1 (14) µ1 = κ1 = 0 µ2 = κ2 + κ 2 1 = 1 µ3 = κ3 + 3κ2κ1 + κ 3 1 = κ3 µ4 = κ4 + 4κ3κ1 + 3κ 2 2 + 6κ2κ 2 1 + κ 4 1 = κ4 + 3 µ5 = κ5 + 5κ4κ1 + 10κ3κ2 + 10κ3κ 2 1 + 15κ 2 2 κ1 + 10κ2κ 3 1 + κ 5 1 = κ5 + 10κ3 µ6 = κ6 + 6κ5κ1 + 15κ4κ2 + 15κ4κ1 + 10κ 2 3 + 60κ3κ2κ1 +20κ3κ 3 1 + 15κ 3 2 + 45κ 2 2 κ 2 1 + 15κ2κ 4 1 + κ 6 1 = κ6 + 15κ4 + 10κ 2 3 + 15 .
The probability distribution function of ν can be obtained by inverting Eq. (7) using the inverse Laplace transform (see the review by Bernardeau et al. 2002): Thus, the generating function fully defines the probability distribution function. When the departures from the Gaussian distribution function are small one can expand P (ν) in a Gram-Charlier series: with G(ν) = 1 √ 2π e − ν 2 2 and h l (ν) being the Hermite polynomials. The Hermite polynomial of degree n can be calculated by: This leads to the following first polynomials: Using the orthogonality relation: we can calculate the Gram-Charlier coefficients c l : The latter expression yields: The Gram-Charlier series may have poor convergence properties (see Cramer 1946), and in some cases even violently diverge (see Blinnikov & Moessner 1998). For this reason  suggested to model the departures from the Gaussian distribution with an Edgeworth expansion which is a true asymptotic expansion (for a full explicit expansion for arbitrary order see Blinnikov & Moessner 1998, and references therein). The Edgeworth expansion consists of rearranging the terms in the Gram-Charlier series based on collecting all the terms with the same clustering strength. To see how to do this let us recall how the different terms scale with σ.
In perturbation theory the matter field is expanded as a sum of terms with increasing order: Φ = Φ1 + Φ2 + Φ3 + . . . , with Φ1 = O(σ 1 ) being the linear term, Φ2 = O(σ 2 ) being the quadratic term, Φ3 = O(σ 3 ) being the cubic term, etc.. The linear term Φ1 is assumed to be Gaussian distributed, or equivalently in our case, the matter density field is assumed to be lognormal distributed at first order. To calculate the subsequent perturbation terms one needs to use higher order correlation functions. As it was shown by Fry (1984) the cumulants of order higher than two scale as: κn = O(σ 2n−2 ). For this reason the following normalized cumulant has usually been introduced to calculate the Edgeworth expansion: . Using the latter expression one can group the terms with the same scaling power of σ. The Edgeworth expansion until third order perturbation is then given by:

Multivariate case
In this subsection we generalize the relations of the univariate matter distribution to the multivariate case. Now Φ, ρ and s are scalar fields: and with S being the variance of the field ln(1+δM)−µs (see appendix C for the relation between the variance of Φ and the variance of the matter overdensity field δM). We introduce the field ν which has zero mean and unity variance by definition: The n-dimensional moments are given by:  In case of more than one equivalent element the number of elements is indicated. The circles mark the terms which are not automatically zero as they do not contain a single unconnected point (we deal here with a centered variable with zero mean). The three equivalent terms for the third order moment have been specified. Let us define a cumulant object as a set of connected points in a term of a moment. Two cumulant objects will be equivalent if they have the same number of points. The number of equivalent elements Ne in a term is calculated by the factorial of the order of the moment n divided by the factorial of the number of single points N p0 , the factorial of the number of equivalent cumulant objects N obj in a term and the factorial of the number of points for each cumulant object k N pk : Ne = n!/(N p0 !N obj ! k N pk !). As an example the eighth term of the sixth order moment: κ ij κ kl κmn has 15 equivalent elements: 15 = 6!/(3!2!2!2!).
The multivariate moment generating function yields: Analogously to the univariate case subsequent derivatives of Mν (t) at the origin t = 0 lead to the moments: The cumulant generating function is given by: with κi 1 ...in being the cumulants or connected moments: The moments are related to the cumulant generating functions by: Plugging in Eqs. (28) and (30) in the latter expression we obtain: A way to spare the tedious calculations consists on looking at all combinations of connections between points in a diagram (see Fig. 1 and Bernardeau et al. 2002).
Since the mean of νi is zero (µi = 0) we got rid off all the terms containing single connected points (κi). The result for the first moments is listed below: where we have used the following identities: , the Kroenecker delta: δ K and the modified Levi-Civita tensor we introduce here: with Nt being the number of transpositions. This tensor has the property of being always positive (including zero) since it is multiplied by the factor (−1) Nt which is positive for an even number of transpositions and negative for an odd number of transpositions, thus compensating for the negative sign coming from the Levi-Civita tensor. The number of equivalent objects is indicated to the lower right of the terms. To see how this number is calculated see the caption in Fig. (1). Let us study here the multivariate Gram-Charlier series expansion (see Berkowitz & Garner 1970): The corresponding multivariate Hermite polynomials are calculated by (see Berkowitz & Garner 1970): from which the following recursive formula can be built: which we have used in our calculations.
Here are the results for the first couple of polynomials: where we have used the same notation as for the moments. One should note that the number of equivalent terms for the moments and Hermite polynomials coincides with the factors in the corresponding terms of the univariate case. The Gram-Charlier coefficients can now be calculated by making an ensemble average over the Hermite polynomials: (40) The first coefficient is c0 = 1 as in the univariate case and the rest is calculated using the above equation: Rearranging terms in the Gram-Charlier series based on the findings in the univariate case we can build the multivariate Edgeworth expansion: which can also be written as: where we have inserted the expression for the field ν and introduced the following notation: . From this expression it is possible to calculate the probability of a density field Φ given the higher order point correlation functions (of the logarithm of the density field!).
We can find a more compact expression for the last equation if we define a skewness term accounting for the asymmetry of the distribution function as: and a kurtosis term accounting for the flatness of the distribution function composed by two terms: K ≡ KA + KB with the first term given by: and the second term given by: Now we can rephrase Eq. (43) by: Note that the last term in the previous equation has 10 members (6 indices: 6!/72) which will only be equal under certain symmetry conditions. Our calculations confirm that the first three terms in Eq. (47) are a trivial generalization of the univariate case (see Eq. 23), whereas the 4th term is not.

On the positive definiteness of the expanded Normal distribution
Please note that P k (ν) is not a real distribution function as it is not generally positive definite. The truncated expanded distribution function should be regarded only as an approximation at k-th order (1st order: lognormal, 2nd order: includes skewness S, 3rd order: includes skewness S and kurtosis K). To ensure a positive definiteness one has to impose the additional condition: 1 + S(ν) 0, ∀ν at 2nd order, 1 + S(ν) + K(ν) 0, ∀ν at 3rd order, etc.. It is also possible to make an exponenzation of the non-Gaussian contribution or a quadratic expression (see the works on non-Gaussian realizations in the CMB: Contaldi et al. 2000;Contaldi & Magueijo 2001): For small skewness and kurtosis terms the Taylor expansion of the exponential will be dominated by 1 + S(ν) + K(ν) + . . . , so that Eq. (48) will be very close to Eq. (47) ensuring positive definiteness. In general one has to define a function F of the non-Gaussian contributions which ensures positive definiteness:  Figure 2. The matter statistics depends on the gravitational regime, the cosmic scale and the cosmic time we are looking at. Towards large scales, in the linear regime matter is closely Gaussian distributed. At smaller scales and later times gravitational clustering will start to depart from Gaussianity.
When looking at small deviations from Gaussianity a higher order correlation expansion can be done. The lognormal distribution is a good description for the further nonlinear regime as it can be regarded as linear in a Lagrangian framework which is known to be valid in the quasi-nonlinear regime. At even lower scales and late times the lognormal distribution fails and an expansion around this distribution function can be done. When shellcrossing starts and structures form caustics (the full nonlinear regime) the statistical state-of-the-art description is based on numerical N-body simulations.

Lognormal limit
On scales larger than about 10 Mpc the lognormal distribution resembles very well the observed galaxy and matter density statistics (see Hubble 1934;Wild et al. 2005;Kitaura et al. 2009). It was demonstrated by  that the lognormal prior can be applied at least down to scales of few Mpc to fit the matter statistics in the overdense regions. Towards the linear regime in the mild non-linear regime the correction terms in the Edgeworth expansion start to become negligible and we can model the density field with a multivariate lognormal distribution (see Coles & Jones 1991): where S is the covariance matrix of the lognormal distribution. Note that the covariance matrix is defined by: Sij ≡ sisj − µsiµsj (which does not coincide with the covariance of the overdensity field: δMiδMj ) and µsi describes a constant mean field given by (a derivation of both the covariance and the mean can be found in appendix C): with the hats denoting the Fourier transform of the auto-correlation matrix. The constant term Sii corresponds to the correlation function evaluated at zero: S[0], i.e. when it is evaluated for the distance of an object at a certain position with itself.

Gaussian limit
On very large scales ( > ∼ 100 Mpc), in the linear regime, the information about the primordial fluctuations has been preserved throughout cosmic history. Inflationary scenarios predict a close to Gaussian distribution function for the initial density fluctua-tions (see Guth 1981;Guth & Pi 1982;Starobinsky 1982;Hawking 1982;Linde 1982;Albrecht & Steinhardt 1982;Bardeen et al. 1983). Therefore we expect to obtain the Gaussian prior in the low density limit. When |δM| ≪ 1 the signal s can be approximated by the linear term of the Mercator series ln(1 + δMj ) ∼ δMj and the mean is approximately zero µsi ∼ 0. In that case the prior distribution for the matter density field is given by the Gaussian multivariate distribution function (see Bardeen et al. 1986): Please note that the correlation matrix S is now defined by: Sij ≡ δMiδMj . This prior is the one required by the Bayesian approach to derive the Wiener-filter (see Zaroubi et al. 1995) and has been extensively applied to CMB-mapping (see for example Bunn et al. 1994;Tegmark 1997) and to the large-scale structure reconstruction (see for example Zaroubi et al. 1995;Erdogdu et al. 2004Erdogdu et al. , 2006Kitaura et al. 2009).
Intrinsic deviations from the Gaussian or the mild nonlinear gravitational regime may be described by expanding the Gaussian distribution in an Edgeworth expansion. Note that the whole multivariate formalism presented in this section can also be applied to study weakly non-Gaussian matter fields as was shown by  for the univariate case. One has to define instead ν ≡ S −1/2 δM and use the higher order correlations corresponding to the overdensity field. However, additional problems could arise here yielding negative densities which are not present in the lognormal formulation.
The matter distributions described by the Eqs. (50) and (52) are only valid in a limited range of overdensities. In order to be able to go further into the nonlinear regime we need to model the higher order statistics. The problem we face here is that we have to introduce more complex models increasing the number of parameters (see the works by Scoccimarro et al. 1998;Taruya & Soda 1999;Matsubara 2003;Zheng 2004;Matsubara 2008, including galaxy biasing and redshift distortions in higher order correlations).
The simplest non-trivial higher order correlation representation is given by the hierarchical models which we will discuss in the next section.

Hierarchical model for higher order correlation functions
Hierarchical models rely on the assumption that the cosmological structures are to some extent self-similar. In these models higher order correlation functions are constructed from products of the two-point correlation function (see Fry & Peebles 1978;Fry 1984Fry , 1986Balian & Schaeffer 1989): such that the whole set of points i1 . . . in is connected by links of ξij. These links are organized in a tree structure, where α are the trees corresponding to each order n. The sum over Lα denotes a sum over all possible labellings or leafs of a given tree. The remaining freedom is encoded in the Q α n hierarchical coefficients for each tree α and order n. Note that the hierarchical model always refers to the overdensity field δM, but we are assuming that it also applies for the field Φ defined in Eq. (4).
) leafs of the first tree of the four-point correlation function ) leafs of the second tree of the four-point correlation function Table 1. The different trees up to order 5 are shown with the corresponding number of leafs/labellings. The trees are constructed by distributing the m indices n times so that there is at least one index for each dimension of n neglecting the index order. Note, that by the same procedure one can find that the sixth order correlation function has 5 trees instead of 4 as one would naivly expect. To know then the number of leafs we have not only to consider the index order, but also the couple combinations of indices. The first factor can be calculated in an analogous way to the number of elements in Fig. (1). To calculate the latter factor one has to first subtract n indices to the tree (since we are interested in the couple combinations we already assume that each correlation function has been assigned one index) and then divide (m − n)! by the factorials of the remaining numbers assigned to each index. For example for the second tree of the fifth order we would get after subtraction: (2|1|0|0|0), which has three indices left (3! permutations) with 2 equal ones (divided by 2): 3!/2. Dividing this number by 3! which comes from the three indices with equal number of appearances, we get that the leafs redundancy number is: N 2 5γ = 2. The leafs for the three-point and four-point correlation function are shown in detail at the bottom of the table.
Particular expressions for the three and four-point correlation functions were already proposed by Fry & Peebles (see 1978). The three-point correlation function can then be written as: where we have denoted the only one hierarchical coefficient for the three order case as Q3 (we show in Tab. 1 how to calculate the number of trees and the number of corresponding leafs). Note that the three-point correlation function is the sum of the leafs for the single tree: The four-point correlation function has 16 leafs, 4 leafs in the first tree and 12 leafs in the second one (see Fry & Peebles 1978;Balian & Schaeffer 1989, and our calculation in Tab which can also be written as: 4 [ξijξ ik ξ jl + ξij ξ il ξ jk + ξijξ ik ξ kl + ξ il ξ ik ξ kj +ξijξ il ξ kl + ξ ik ξ il ξ jl + ξijξ jk ξ kl + ξ ik ξ jk ξ jl +ξijξ jl ξ kl + ξ il ξ jk ξ jl + ξ ik ξ jl ξ kl + ξ il ξ jk ξ kl ] . (57) where we have denoted the first hierarchical coefficient as Q a 4 and the second one as Q b 4 .

Non-Gaussian multivariate Edgeworth expansion with the hierarchical model
We can use now the three and four-point correlation functions calculated from the hierarchical model to apply the third order Edgeworth expansion to the lognormal probability distribution. Note that the results of this section can also be applied for the Gaussian case by inserting the correlation functions of the overdensity field instead of the correlation functions of the logarithm of the density field. In the next subsections we present the skewness and kurtosis terms which are required in the Edgeworth expansion (see Eq. 43).

Skewness terms
The second order term in the Edgeworth expansion describes the skewness with respect to the lognormal distribution and requires the third order Hermite polynomial (see Eq. 39). Let us define the weighted Hermite polynomial: Then we can calculate the skewness term S as (see Eq. 43 and appendix D): where we have identified the two-point correlation function ξij with Sij .

Kurtosis terms
The third order term in the Edgeworth expansion requires the fourth and sixth order Hermite polynomials and the three and four-point correlation functions. Thus, we separate the kurtosis K into two contributions K ≡ KA + KB. Let us start with the first contribution which requires the fourth order Hermite polynomial: The first contribution KA to the kurtosis term K can be written as (see Eq. 43): which after some calculations (see appendix E1) leads to: Sij .
As we saw in sections (2.2) and (2.3) the asymptotic Edgeworth expansion has a second term at third order. It is in particular the fifth term of the sixth-order moment in Fig. (1) which is the product of two three-point correlation functions. Accordingly, the second contribution KB requires the sixth order weighted Hermite polynomial: The expression for KB inserting the three-point correlation function in the hierachical model reads (see Eq. 43): After making the corresponding calculations (see appendix E2) we find that the second contribution KB to the kurtosis term K can be written as We verify the skewness and kurtosis terms presented here by comparing the univariate case Eqs. (A1, A2, and A3) to the corresponding Eqs. (D5, E14, and E37) simplified to a single index (see appendices A, D and E). Now the first and second kurtosis contribution terms can be added and plugged in Eq. (43) together with the skewness found in the previous section to calculate the probability distribution function of the matter field. We would like to emphasize here that the expressions found in this section for the skewness and the kurtosis can be efficiently computed (see Eqs. 60, 62 and 65). It should be noticed, that they rely on basic operations with the fft required to perform convolutions being the most expensive one.

SUMMARY AND CONCLUSIONS
The avalanche of astronomical data coming from different surveys which scan different epochs of the Universe makes it possible to map the Universe with unprecedented accuracy. We stress here the necessity to model as precise as possible the matter statistics so that the multidimensional picture of the Universe can be reconstructed with the highest possible fidelity. This implies providing a multivariate higher order statistical description of the structures in the Universe.
We have extended the works done for the univariate case and presented the multivariate Edgeworth expansion of the lognormal field. We made the expansion explicitly up to third order in perturbation theory where we had to calculate the multivariate Hermite polynomials up to sixth order. The skewness and kurtosis terms include the two-point, the three-point, and the four-point correlation functions.
We could show that these terms can be calculated using analytical expressions for the higher order correlation functions like the ones provided by the hierarchical model.
The expressions derived in this work could be used to generate and reconstruct three-dimensional matter fields using higher order correlations within a Bayesian framework applying for example the Hybrid Markov Chain Monte Carlo Hamiltonian sampling technique (see the works by ). This could have interesting applications to study non-Gaussianity in the Large-Scale Structure.
As new astronomical windows are being opened to map the Universe at earlier times in which structures were not clustered as much as they are today, we think that the study of the moderate nonlinear regime is crucial for an accurate data analysis. Although numerical N-body simulations provide a magnificent tool to model structure formation, almost without resolution restrictions in comparison to the resolution provided by astronomical observations, they do not picture the actual realization of the Universe. Therefore, we believe that a special effort should be done to model the multivariate statistical nature of the matter distribution which permits one to extract as much information as possible directly from the observational data. We hope that this work serves to contribute in this direction.

ACKNOWLEDGMENTS
I thank Andrea Ferrara for encouraging scientific conversations and for providing me all the necessary support. I thank Rien Van de Weygaert for motivating discussions on the Cosmological Large-Scale Structure. I thank especially Ignacio Cernuda for comments on the manuscript. I thank Euihun Joung and Sunghye Baek for useful discussions on integrals, quantum mechanics and numerical computing.
Warm thanks to the organizers of the Astronomical Data Analysis 6th conference (3rd-6th May 2010) and the Cosmic Co-Motion Workshop (27th-30th Sep 2010) for letting me present part of this work.
The author thanks the Intra-European Marie Curie fellowship with project number 221783 and acronym MCMCLYMAN for supporting this project and The Cluster of Excellence for Fundamental Physics on the Origin and Structure of the Universe for supporting the final stage of this project.

APPENDIX A: UNIVARIATE SKEWNESS AND KURTOSIS TERMS IN THE HIERARCHICAL MODEL
From the hierarchical model ansatz (see section 2.4) we can define the univariate three-point correlation function as: ξ3 = 3Q3σ 4 and the four-point correlation function by: ξ4 = 4Q a 4 σ 6 +12Q b 4 σ 6 . Let us now look at the skewness and kurtosis terms.

A1 Skewness in the univariate case
For the skewness we need to define the third order univariate weighted Hermite polynomial: We then get: (A1)

A2 Kurtosis in the univariate case
In the case of the kurtosis terms we need to define the fourth and sixth order univariate weighted Hermite polynomials:

A2.1 First group of kurtosis terms
Looking at the fourth order Hermite polynomial contribution we get:

A2.2 Second group of kurtosis terms
We then get:

APPENDIX B: UNIVARIATE MEAN AND VARIANCE
Here we will derive the univariate relations for the mean and variance of the expanded lognormal distribution. Note, that the results presented here are in agreement with the general formula presented by Colombi (1994) without an explicit derivation. Let us define the k-th order characteristic function for the variable Φ as a function of t: We will look at this expression for the different orders and derive from it the corresponding mean and variance.

B1 Case k = 1: lognormal
One finds by shifting the Gaussian integral the solution to the characteristic function of the lognormal distribution to be given by: where we use the following definition: σ 2 ≡ Φ 2 k . Recalling the expression for Φ: Φ ≡ log( ρ ρ ) − µs with ρ ≡ ρ k and µs = s k , we can write the density as: ρ = ρe Φ+µs . Using the characteristic function we can calculate all the moments of the density: For the lognormal distribution we have: σ 2 ρ t = ρ t e tµs e tΦ 1 = ρ t e tµs+ t 2 2 σ 2 . (B4)

B1.1 Mean
The particular case for t = 1: leads to the mean:

B1.2 Variance
Whereas the case t = 2: relates the variance of the overdensity δM to the variance of Φ: Thus, one finds that the variance σ 2 of Φ under the lognormal assumption is given by: with σ 2 δ ≡ δ 2 M k .

B2 Case k = 2: lognormal with skewness
The characteristic function including skewness yields: We have to solve integrals including an exponential term and polynomials of Φ. Note however, that we can generate such integrals by subsequent derivatives of the characteristic function: Taking the expression of the generating function for k = 1 (Eq: C3) we can calculate the solutions to the integrals including different powers of Φ: We can then calculate the characteristic function at order k = 2: Defining the integral over the skewness term by: we can write the second order moment of the density as: The mean and variance can be obtained by evaluating the latter expression at t = 1 and t = 2 respectively.

B3 Case k = 3: lognormal with skewness and kurtosis
Here we include also the kurtosis terms: Since the kurtosis terms use the fourth and sixth order Hermite polynomials we need to calculate up to the sixth order derivatives of the generating function: Defining the kurtosis integral by: we can write the characteristic function as: The t-th order moment of the density yields: Inserting t = 1 and t = 2 yields the mean and the variance respectively.

APPENDIX C: MULTIVARIATE MEAN AND COVARIANCE
In this section we derive the multivariate relations for the mean and variance of the expanded lognormal distribution. Let us define the k-th order characteristic function for the variable Φ as a function of t: Let us consider now the different k-order to calculate the corresponding mean and variance.

C1 Case k = 1: lognormal
We shift the Gaussian integral to obtain the solution to the characteristic function of the lognormal distribution: which simplifies to: Taking into account that the density field ρ is related to Φ by: ρi = ρ exp (Φi + µsi), we get: ρi 1 . . . ρi n 1 =ρ n exp l t l µsi l + 1 2 lm t l Si l im tm .

C2 Case k = 2: lognormal with skewness
Let us recall the expression for the skewness (Eq. 60): The characteristic function including skewness yields: This integral can be solved by calculating the different moments of the lognormal generating function: Performing the derivatives we get: This leads to the n-order moment of the density field ρ: Si 2 im tm n Si 3 in tn , which can be simplified to:

C2.1 Mean
By setting l, m, n = 1 and t1 = 1 we then obtain the mean:
Please note, that contracting the expressions found here to the univariate case gives the same relations as found in the previous section.
The covariance is given by:

APPENDIX D: MULTIVARIATE SKEWNESS TERMS IN THE HIERARCHICAL MODEL
Let us recall the expression for the skewness (Eq. 60) we want to calculate here: where we have inserted the three-point correlation function from the hierarchical model. In the following subsection we will calculate the contribution of each Hermite polynomial term separately.

D1 First Hermite term
Here we go through all the correlation terms applied to the first Hermite term: ijk SijS ik ηiηj η k = i ηiΦ 2 i ijk SijS jk ηiηj η k = j ηj Φ 2 j ijk S ik S jk ηiηj η k = k η k Φ 2 with the factor 3 being due to the fact that the result is identical for the three terms of the three-point correlation function.
or the singly coupled index appears in the remaining S function: 12 × 2 × ijkl Sij S ik S jl ηiη l S −1 jk = 12 × 2 × ij SiiηiΦi . (E9) There are 2 configurations for both cases for each four-point correlation term, hence 12 × 2. The last configuration for the second Hermite term is based on 2 indices of η doubly coupled to the S functions (this happens only once for each four-point correlation term: 12 times in total): (iii) Third Hermite term: the last Hermite term has two possibilities: either only one of the indices of the S −1 functions coincides respectively with one of the indices of the S functions (this can only happen once for each four-point correlation term): or two indices of one of the S −1 functions coincide with the corresponding indices of one of the S functions (this can occur twice for each four-point correlation term: 12 × 2):

E1.3 Result
Putting the contribution of both trees together we get the first kurtosis term: (iii) Non of the indices of the S −1 functions coincides pairwise