Fast emulation of two-point angular statistics for photometric galaxy surveys

We develop a set of machine-learning based cosmological emulators, to obtain fast model predictions for the $C(\ell)$ angular power spectrum coefficients characterising tomographic observations of galaxy clustering and weak gravitational lensing from multi-band photometric surveys (and their cross-correlation). A set of neural networks are trained to map cosmological parameters into the coefficients, achieving a speed-up $\mathcal{O}(10^3)$ in computing the required statistics for a given set of cosmological parameters, with respect to standard Boltzmann solvers, with an accuracy better than $0.175\%$ ($<0.1\%$ for the weak lensing case). This corresponds to $\sim 2\%$ or less of the statistical error bars expected from a typical Stage IV photometric surveys. Such overall improvement in speed and accuracy is obtained through ($\textit{i}$) a specific pre-processing optimisation, ahead of the training phase, and ($\textit{ii}$) a more effective neural network architecture, compared to previous implementations.


INTRODUCTION
The golden age of cosmology is continuing.The discovery of the accelerated expansion of the universe (Perlmutter et al. 1999;Riess et al. 1998) led to the consolidation of the ΛCDM model, which can now be convincingly defined as the standard model of cosmology.
We are currently witnessing a period of fast developments, from the theoretical and observational points of view, aimed at understanding its details and ingredients.In fact, the standard model is capable to successfully predict virtually all early and late universe observations to high accuracy using a restricted number of parameters, although some residual, perhaps telltale, tensions should not be overlooked (e.g.Bernal et al. 2016).However, the very nature of this model opens a number of fundamental questions which await a satisfactory answer, as in particular the nature of its dark sector components, Dark Matter and Dark Energy.
Together with CMB anisotropy observations (Aghanim et al. 2020, e.g.[), one of the key pillars supporting the standard model is represented by measurements derived from galaxy surveys.On the one hand, statistical measurements of galaxy clustering (GC hereafter) in 3D (when spectroscopic redshifts are available) or 2D (i.e. in projection on the sky) provide us with an indirect probe (through luminous tracers) of the distribution of matter density fluctuations on different scales and at different epochs (Feldman et al. 1994;Yamamoto et al. 2006).On the other hand, photometric imaging surveys allow us, under certain conditions, to measure galaxy shapes and use their shear, due to weak gravitational lensing (WL hereafter) by foreground galaxies, to assess directly the matter density field, providing us with the ability to perform a tomography of the large-scale matter distribution in the Universe (Hu 1999).Combinations of photometric 2D-GC angular correlation function, WL correlation function, and GC-WL cross-correlation between galaxy positions and shapes (3x2pt statistic hereafter), together with the spectroscopic 3D-GC, will enable unprecedented precision and accuracy to be achieved in the estimate of cosmological parameters.
We are on the verge of a new era for galaxy surveys, with the next generation of experiments that aim at solving the mysteries of the standard cosmological model.This is the case, in particular, of the DESI ground-based spectroscopic survey (Abbott et al. 2018b), recently started at KPNO, and the ESA Euclid mission (Laureijs et al. 2011a), a combined imaging and spectroscopic experiment, currently scheduled for launch in 2023.The multi-band, multi-epoch photometric LSST survey by the Vera C. Rubin telescope (Weltman et al. 2020), and the future Nancy Roman space telescope (Dore et al. 2019), will provide further unprecedented photometric and spectroscopic galaxy data, contributing to the accumulation of an enormous amount of information.The Euclid mission, in particular, will play a special role in this respect, with its ability to simultaneously deliver measurements of both spectroscopic and photometric galaxy clustering together with weak gravitational lensing (and further additional probes, like, e.g.galaxy clusters -see Blanchard et al. (2020)).
Such an avalanche of new experimental data, naturally calls for a comparable quantum leap in the efficiency of our analysis tools.Each of these new data sets will be massive on its own.Additionally, the combination and cross-correlation of different probes (GC and WL, but also CMB anisotropies, void statistics, clusters, etc.) will leverage their information content, while providing firmer control over systematic errors.Besides the theoretical challenge of accurately modelling the data down to small, nonlinear scales (Carrasco et al. 2012;Cataneo et al. 2019;Knabenhans et al. 2021;Angulo et al. 2021), we will be facing a serious computational challenge: cosmological analyses are based on Markov Chain Monte Carlo (MCMC) techniques, which require theoretical predictions to be computed as many as 10 4 to even 10 6 times, with the execution of so-called Boltzmann solvers at each step of the chains.The most widely used of such codes are CAMB (Lewis et al. 2000) and CLASS (Lesgourgues 2011), which compute the evolution of linear perturbations in about ≈ 0.5 seconds (7 seconds when massive neutrinos are included) on a standard laptop1 .Such basic figures clearly show that performing a full analysis of cosmological datasets requires either a very long time or the access to high-performance computing facilities.
To overcome this issue, codes called emulators, capable of reproducing the results of Boltzmann solvers, but interpolating from a coarser grid of "pivot" models, have been recently developed, achieving significant speed-up.The first of such codes, based on polynomial regression, were used in the analysis of CMB data (Jimenez et al. 2004;Fendt & Wandelt 2006).A different approach was taken in Albers et al. (2019), where a neural network was trained to replace the most expensive parts of the CLASS Boltzmann solver.
More recently, emulators were developed to produce model predictions for large-scale structure two-point statistics (Manrique-Yus & Sellentin 2020; Mootoovaloo et al. 2020Mootoovaloo et al. , 2022;;Aricò et al. 2021;Spurio Mancini et al. 2022).These codes allow us to emulate either the spatial 3D matter power spectrum, Pmm(k), in real space, or the angular power spectrum, C( ), i.e. the coefficients of the spherical harmonic decomposition of the 2D correlation function, which quantifies density fluctuations as a function of the angular scale alone.The latter is customary when describing CMB anisotropies, weak lensing measurements or any statistics measured in projection on the sky, as also 2D galaxy clustering within "tomographic slices".
In this paper, we describe the development of a set of three new emulators to derive fast model predictions for the C( ) associated to three different observables.In particular, we focus on the 3x2pt statistics associated to two of the main cosmological probes of next generation surveys, GC and WL, together with their cross-correlation.Specifically, the crosscorrelation of the tiny and coherent image distortions of background galaxies with the angular 2D galaxy distribution is of special interest and will be among the key observables of, for instance, the Euclid and LSST galaxy surveys.
We adopt a machine learning approach, building and training a set of neural networks to map cosmological parameters into the 3x2pt angular correlation coefficients C( ).Neural networks are becoming more and more popular in many areas of science and engineering.Much of their success stems from their universal approximation properties, i.e. their ability to approximate any continuous function, provided enough data and an appropriate number of hidden layers (Cybenko 1989;Hornik et al. 1989) are combined.Recent years have witnessed the emergence of powerful neural networks in the form of different architectures tailored to the specific task being performed.Such models are typically built by stacking multiple layers of neurons, the so-called deep neural networks.
The main advantage of a machine learning approach to cosmological inference is the significant speed-up of the full analysis.In fact, concerning the computation of Pmm(k), in the standard approach the hierarchy of coupled Einstein-Boltzmann equations needs to be solved, and the solution is numerically integrated to obtain the desired quantities.Emulators skip these steps, with a gain of orders of magnitude in speed.The gain is even more pronounced when emulators are applied directly to cosmological observables and tomographic analyses, which are our main interest here (Hu 1999): for example, in an analysis involving both GC, WL, and their cross-correlation, with 10 tomographic redshift bins the value of 210 different C( ) needs to be computed, for each multipole .This is computationally very expensive if performed via standard approaches, especially within MCMC techniques for cosmological parameter inference.Thus, a direct emulation of the angular power spectra would provide a huge speed-up of the full analysis.
As discussed in the following, our implementation allows us to emulate the 3x2pt angular power spectrum coefficients C( ) for a typical Stage-IV galaxy survey, with a speed-up O(10 3 ) with respect to the standard approach, while preserving an accuracy of about 0.175% over the whole emulated range of angular scales.Besides their speed and accuracy, our emulators can be easily extended to larger parameter spaces.These figures compare very favourably to similar works in the literature.
Of course, the gain in terms of time needed for cosmological parameter inference comes at the price of data generation and training processes, which can be also computationally expensive.The overall balance, however, remains positive, given the massive computational costs needed to generate standard MC chains and the property of deep neural networks to be easily generalized to high-dimensional parameter spaces.In the present work, emulators are treated as black-box functions and do not incorporate any explicit prior knowledge of the underlying physics.They learn the mapping between input parameters and output correlation coefficients in an entirely data-driven fashion.
This paper is structured as follows.In Section 2 we explain how we produced and preprocessed the training dataset, the neural network architecture and the training procedure.In Section 3 we test the precision and speed of our emulators.In Section 4 we make a comparison with the emulators present in literature.We then summarise our results and conclude in Section 5.

Theoretical Background
We give here a brief overview of the theory underlying twopoint statistics used in the analyses of LSS surveys, using a notation similar to Blanchard et al. (2020).Throughout the paper we consider as reference background model the simplest generalization of the standard ΛCDM model, in which dark energy is assumed to be a dynamical quantity with Equation of State (EoS) described by the Chevalier -Polarski-Linder parameterisation: w(a) = w0 − wa(1 − a) (Chevallier & Polarski 2001;Linder 2003).We refer to this background model as a flat "w0waCDM" model.
As mentioned above, our aim is to emulate three specific sets of two-point angular statistics involving galaxy clustering and weak lensing, accounting also for the correlation between different tomographic redshift bins: (i) the auto-correlation of angular galaxy positions in a photometric sample; (ii) the cross-correlation of the lens galaxy positions with the estimated shear, and (iii) the auto-correlation of the shear field itself.We describe these quantities in terms of C( ) angular power spectra, i.e. the coefficients of the spherical harmonic decomposition of the above correlation functions, with the multipole order, , related to the inverse of the angular scale in configuration space.
In the standard approach, each C( ) corresponds to a projection of the nonlinear matter power spectrum, Pmm (k, z), weighted with the window functions W A (z) and W B (z) associated to the probes A and B (here γ ≡ WL and g ≡ GC, respectively) where for A = B we recover the auto-power spectrum of a single probe.Under the Limber approximation (Limber 1954;LoVerde & Afshordi 2008), these coefficients can be expressed as where E(z) = H(z)/H0 is the dimensionless Hubble parameter, zmin and zmax are the minimum and maximum redshifts of the survey, r(z) is the comoving distance, i and j label the tomographic redshift bins, and k is the wavenumber, which is connected to the multipole order, , as this In the adopted flat w0waCDM cosmology, E(z) reads and the comoving distance is Still following Blanchard et al. (2020), the WL window function is given by where W γ i (z) is the so-called lensing efficiency with n g i (z) being the normalized surface galaxy number density in the considered i-th tomographic bin (see below for details).
Unlike WL, the GC selection function needs to account also for the galaxy bias b g i (z) (see Desjacques et al. (2018) for a review), We use the phenomenological galaxy bias model introduced in Tutusaus et al. ( 2020), with A = 1.0,B = 2.5, C = 2.8, and D = 1.6, as obtained through a fit to the Euclid flagship simulation2 .
Here we model the true galaxy distribution as where z0 = zm/ √ 2 is the median redshift of the distribution normalized as to obtain an integrated galaxy surface density ng .Here we assume the same specifications as for the Euclid photometric survey (Laureijs et al. 2011a), i.e. zm = 0.9 and ng = 30 arcmin −2 .
To account for photometric redshift errors, we convolve the true galaxy surface density, n g (z), with the following double Gaussian kernel describing the probability that a galaxy with redshift z has a measured redshift zp (Kitching et al. 2009) This is quite a flexible parameterization, which includes both multiplicative and additive bias in the redshift determination, accounting for a fraction fout of galaxies with incorrect estimate of redshift; the values of the coefficients are given in Tab. 1. Finally, the number density in the i-th bin, entering Eqs.7-6, is where z + i and z − i are the redshift values that define the i th tomographic bin.Here we use ten equi-populated redshift bins, so that the number density in each bin is simply ng i = ng /10 and the boundaries of the bins are: zi = {0.001,0.42, 0.56, 0.68, 0.79, 0.9, 1.02, 1.15, 1.32, 1.58, 2.5}.

Data set generation
In this subsection, we describe both the dataset used in our analysis and the operations performed before the training phase.The C( ) were evaluated using our code named CosmoCentral.jl3, a cosmological analysis package developed in Julia4 .The linear matter power-spectrum was obtained from CLASS and corrected for nonlinear evolution using the revised halofit recipe from Takahashi et al. (2012).
The parameter hyperspace described in Table 2 was then sampled using 200 logarithmically-spaced multipoles , while 10,000 combinations of cosmological parameters were generated according to the Latin Hypercube Sampling (LHS) algorithm (McKay et al. 1979).This prior was chosen to closely mimic that of EuclidEmulator2 (Knabenhans et al. 2021).

Neural network emulator
Previous emulators (Mootoovaloo et al. 2020(Mootoovaloo et al. , 2022) ) were based on Gaussian processes, which have a neat probabilistic interpretation and allow an easy estimate of the uncertainty associated with their predictions.However, they are not ideal for handling big data sets.The main disadvantage is the need of inverting an N × N covariance matrix to calculate the posterior distribution, with N being the length of the data vector in the training dataset.
Neural networks, instead, have proven to scale favourably with the size of the training data.This is a crucial point since, when feasible, by increasing the size of the training dataset we can improve the precision and accuracy of the emulator.
We address our problem as a classic regression task : given the value of the input parameters (the cosmological parameters and the chosen multipole ), a neural network is trained to output the set of angular correlation coefficients, C AB ij ( ), for each of the three probes described in Sec.2.1.The three neural networks are developed using Flux.jl, a Julia machine learning library (Innes et al. 2018).The resulting Julia emulators can be easily wrapped and used in established Python data-analysis frameworks (Torrado & Lewis 2021;Brinckmann & Lesgourgues 2019).The chosen architecture consists of four hidden layers with 100 neurons each, and a Rectified Linear Unit (ReLu) activation function (Nwankpa et al. 2018), as presented in Figure 3.The input layer takes 8 features (corresponding to the cosmological parameters and the multipole ).The output layer differs from that of similar emulators in the literature (Mootoovaloo et al. 2020;Manrique-Yus & Sellentin 2020), which typically compute the C( ) for a specific tomographic combination, ij, and in a given multipole range (or even at a specific multipole value).In our case, instead we output 55 coefficients for the autopower spectra, C AA ij ( ), and 100 coefficients for the crosspower spectra, C AB ij ( ), i.e. we output at once all combinations, (i, j), of tomographic indices, with i, j = 1...10, for the required multipoles.This approach has several advantages: • Both the training and the predictions are fast, since we use only three emulators, rather than the several tens required, e.g., in Manrique-Yus & Sellentin (2020) to emulate the photometric 3x2pt.
• C AB ij ( ) are not uncorrelated between different tomographic bins.We take advantage of this correlation by emulating all tomographic combinations at once.
• Our emulator suite is lightweight, occupying less than 1 MB of disk storage.
The architecture of the emulators is presented in Figure 3.
A step of the utmost importance before training is the preprocessing of the training dataset, which helps the neural network to better capture the features it has to learn, both reducing the training time and increasing the neural network performance (Ioffe & Szegedy 2015).Specifically, we first perform the following transformation and then normalize the resulting values within the [0, 1] interval.We empirically verified the importance of this step: compared to a preliminary attempt, in which we tried to directly emulate the C AB ij ( ), this pre-processing yields a considerable improvement.Regarding the training phase, we chose the ADAM optimizer (Kingma & Ba 2014), with a learning rate of 3 • 10 −4 and a batch-size of 256; the training lasted for 1,000 epochs.80% of the data set was used for training and 20% for the validation.We used a mean-square-error loss function, which was evaluated after each epoch both for the  training and the validation data sets; when the test loss function decreases, the stored network coefficients are updated.

EMULATOR PERFORMANCE
We assess the speed of our emulators using BenchmarkTool.jl, which runs the emulators several times and provides an accurate estimate of execution times.In this test, we emulated 30 multipoles, , with the lensing auto-correlation emulator, obtaining a mean execution time of about 300 µs.
In Figure 4, we show the relative emulation error for the specific set of cosmological parameters corresponding to the centre of the emulated space, as reported in Table 2. Remarkably, the accuracy is very good over all multipoles considered: the residuals for the lensing auto-correlation are almost always below 0.1%, while for the other two angular statistics they almost never exceed the 0.3% threshold.We also built a test dataset of 4,000 cosmologies distributed, according to the Latin Hypercube Sampling, over the same range as the training dataset reported in Table 2. To quantify the goodness of the emulators, we use the Mean Absolute Relative Difference (MARD) for each combination θ of cosmological parameters present in the validation dataset, defined as where C AB ij,NN is the neural network prediction, C AB ij,GT is the ground truth, computed in the standard way (Eq.( 1)), and N C( ) represents the number of C 's computed for each combination of cosmological parameters.This metric quantifies the relative accuracy of the emulator.The resulting distribution is shown in Figure 5.The important result is that the relative error of the predictions is well within 0.175%, with the lensing emulator, in particular, yielding an accuracy of 0.1%.This higher accuracy, compared to the other two cases, is due to the negligible effect of Baryon Acoustic Oscillations (BAO) on C γγ ij , which make the weak lensing angular power spectra smoother and hence easier to emulate than C gg ij and C γg ij .In all cases, these errors are well below the typical uncertainties of nonlinear corrections to the matter powerspectrum: for example, EuclidEmulator2 (Knabenhans et al. 2021) and BaccoEmulator (Angulo et al. 2021), have an accuracy of about 1 − 2% up to k = 5 hMpc −1 .To estimate the significance of this level of systematic error with respect to the typical measurement uncertainty, we compare it to the expected cosmic variance σ( ) where f sky is the fraction of surveyed sky.We define the Mean Absolute Error Ratio (MAER) as where for clarity, we omitted to write explicitly the dependence on the cosmological parameters θ.
To estimate σ( ), f sky was set to 0.4 (a value compatible with the ∼ 15, 000 deg 2 expected from the Euclid and DESI surveys (Laureijs et al. 2011b)).
Equation ( 15) does not include the shot-noise contribution, which is a survey-dependent quantity and cannot be rescaled as easily as the (multiplicative) f sky term.Neglecting shot noise, however, is a conservative approach: any such term would yield a higher relative performance for our emulators.As can be seen from Figure 6, the MAER is always smaller than 2%.For comparison, the cosmological library developed  2. The yellow and green lines represent, respectively, a relative difference of 0.3 and 0.1%.The lensing emulator is the most precise, with an error smaller than 0.1% for almost all evaluated coefficients.The superior performance of the WL emulator is due to the absence of the BAO wiggles, as can be seen in Figure 2.  by the LLST collaboration, CCL (Chisari et al. 2019), set a threshold of 10% for this metric.

COMPARISON WITH EXISTING EMULATORS
As we briefly described in the Introduction, there are two categories of LSS emulators in the literature.The first category emulates the matter power spectrum in real space, Pmm(k), which is then fed to a code that computes the theoretical C AB ij ( ).The second category (Mootoovaloo et al. 2020; Manrique-Yus & Sellentin 2020), instead, emulates directly the latter.Our code belongs to this second class, with our implementation introducing considerable improvement over previous implementation of this kind.
On the other hand, comparison with the first category is more delicate.
For instance, one advantage of Pmm(k) emulators is that they are more flexible, since they are agnostic about the galaxy redshift distribution (Spurio Mancini et al. 2022).C AB ij ( ) emulators, instead, are inevitably tailored for a specific survey: if the survey changes, the training process needs to be repeated.However, the most resource-demanding step of our approach is the estimation of the power spectra from Boltzmann codes.Once computed for a large range of redshifts, as done for this work, these could be stored and reused to evaluate the C AB ij ( ) with faster codes than current Boltzmann solvers5 .The training procedure can also be accelerated, with transfer learning (Zhuang et al. 2019): using a previously trained network as initial ansatz, the network will learn to emulate the new C AB ij ( ) with less computational resources.
Another advantage of Pmm emulators is their insensitivity to probe-specific nuisance parameters, which enter the C AB ij ( ) evaluation through the weight functions W A i (z).Therefore, the training set in the case of C AB ij ( ) emulators needs to cover a larger parameter hyperspace.However, considering that for this work we needed 10,000 cosmologies, the resulting cost is still acceptable and the gain is still positive.Furthermore, some parameters, as the magnification bias, can be added analytically as multiplicative factors, thus at no cost for the C AB ij ( ) emulators (Abbott et al. 2018a).For their part, the C AB ij ( ) emulators present several advantages.First, they are blazingly fast, since they do not need to compute intermediate quantities like weight functions, background variables and numerical integrals.The net result is a gain of about a factor of 1000 in speed, compared to standard Boltzmann integrators.
Emulating directly the C AB ij ( ), rather than Pmm(k), becomes even more advantageous when one needs to drop the Limber approximation, when projecting from 3D to 2D statistics.This assumption can in fact become a source of systematic error in high-precision measurements, as expected from next-generation surveys (Fang et al. 2020;Martinelli et al. 2022).Although efficient algorithms to estimate the C AB ij ( ) from Pmm(k) without the Limber approximation have been recently developed (Schöneberg et al. 2018;Fang et al. 2020), the advantage of bypassing completely this computational phase through a direct C AB ij ( ) emulation, is evident: once the training set of C AB ij ( ) is built properly, using or not the Limber approximation, the emulator will learn accordingly without any change in the architecture or procedure.

CONCLUSIONS
In this work we have developed a set of fast cosmological emulators, which are capable to predict a specific set of typical 2D statistics from galaxy surveys, the so-called 3x2pt statistics from WL and photometric GC, in less than a millisecond.We have shown that, compared to previous works, they are more precise and flexible, with significant saving of computational and storage resources.Precision is also improved, using an appropriate pre-processing step and possibly benefiting of a different architecture: since we are training a single neural network to learn all tomographic combinations, (i, j), at once, the emulator can take advantage of the existing physical correlations between different combinations to deliver the correct result.
Our current results could be further improved by including more astrophysical effects and nuisance parameters as the galaxy bias.A full beyond-Limber version of the C( ) emulators is also in our plans.As discussed in the previous section, with our approach this is particularly advantageous in terms of computational cost.We could also further improve in terms of hyperparameters of the neural network: in the current implementation we have used general-purpose dense neural networks, all with the same number of layers and neurons, without further specialization.However, using different architectures for each emulator could increase the precision further.This approach can also be naturally extended to incorporate CMB measurements (Spurio Mancini et al. 2022).
Finally, these techniques would also allow us to improve gradient-based methods in likelihood-based inference approaches: samplers such as the Hamiltonian Mon-teCarlo (Hoffman & Gelman 2014) use the likelihood gradient to efficiently explore the parameter space.However, with standard approaches based on Boltzmann solvers such as CAMB or CLASS (Lewis et al. 2000;Lesgourgues 2011), evaluating the likelihood gradient requires computing many finitedifference derivatives, which are both expensive and may suffer poor accuracy and precision.With a neural network the situation is completely different, since by construction the derivative of the loss function with respect to the input is obtained using automatic differentiation.We plan to implement these techniques in a future work.

Figure 1 .
Figure 1.Normalized galaxy density distributions in tomographic redshift bins, accounting for the photometric errors given in Tabl.1, used in this work.

Figure 2 .
Figure 2. Theoretical predictions for, respectively, lensing auto-correlation, galaxy-galaxy lensing cross-correlation, galaxy clustering autocorrelation.The power-spectrum was generated using CLASS in the central cosmology of Table2, while the C( )'s have been evaluated using CosmoCentral.jl .

Figure 3 .
Figure 3. Architecture definition of our neural network emulators.n output changes accordingly to the emulator considered (55 and 100, respectively, for the auto-correlation and cross-correlation functions).

Figure 4 .
Figure 4.The relative error in the C( ) evaluation for the central cosmology defined in Table2.The yellow and green lines represent, respectively, a relative difference of 0.3 and 0.1%.The lensing emulator is the most precise, with an error smaller than 0.1% for almost all evaluated coefficients.The superior performance of the WL emulator is due to the absence of the BAO wiggles, as can be seen in Figure2.

Figure 5 .
Figure5.The distribution of the relative systematic errors of the three emulators, defined in Equation (13), using a validation dataset of 4,000 cosmologies.

Figure 6 .
Figure6.The distribution of the relative importance of systematic errors, with respect to the intrinsic cosmic variance, as defined in Equation (15).

Table 2 .
(Knabenhans et al. 2021parameters used for the training dataset generation, chosen as to be similar to those of EuclidEm-ulator2(Knabenhans et al. 2021).