COSMOPOWER: emulating cosmological power spectra for accelerated Bayesian inference from next-generation surveys

We present $\it{CosmoPower}$, a suite of neural cosmological power spectrum emulators providing orders-of-magnitude acceleration for parameter estimation from two-point statistics analyses of Large-Scale Structure (LSS) and Cosmic Microwave Background (CMB) surveys. The emulators replace the computation of matter and CMB power spectra from Boltzmann codes; thus, they do not need to be re-trained for different choices of astrophysical nuisance parameters or redshift distributions. The matter power spectrum emulation error is less than $0.4\%$ in the wavenumber range $k \in [10^{-5}, 10] \, \mathrm{Mpc}^{-1}$, for redshift $z \in [0, 5]$. $\it{CosmoPower}$ emulates CMB temperature, polarisation and lensing potential power spectra in the $5\sigma$ region of parameter space around the $\it{Planck}$ best fit values with an error $\lesssim 10\%$ of the expected shot noise for the forthcoming Simons Observatory. $\it{CosmoPower}$ is showcased on a joint cosmic shear and galaxy clustering analysis from the Kilo-Degree Survey, as well as on a Stage IV $\it{Euclid}$-like simulated cosmic shear analysis. For the CMB case, $\it{CosmoPower}$ is tested on a $\it{Planck}$ 2018 CMB temperature and polarisation analysis. The emulators always recover the fiducial cosmological constraints with differences in the posteriors smaller than sampling noise, while providing a speed-up factor up to $O(10^4)$ to the complete inference pipeline. This acceleration allows posterior distributions to be recovered in just a few seconds, as we demonstrate in the $\it{Planck}$ likelihood case. $\it{CosmoPower}$ is written entirely in Python, can be interfaced with all commonly used cosmological samplers and is publicly available at https://github.com/alessiospuriomancini/cosmopower .


INTRODUCTION
Analysis of the two-point statistics of cosmological fields is one of the cornerstones of modern observational cosmology. For parameter inference pipelines involving two-point statistics (i.e. power spectra, or their derived real-space counterparts, correlation functions), the computational bottleneck is running Boltzmann solvers like C (Lewis et al. 2000) or C (Lesgourgues 2011;Blas et al. 2011) to compute theoretical power spectra for a given cosmology. However, cosmological power spectra are generally smooth functions of their input cosmological parameters, and hence lend themselves well to emulation: finding compact, accurate, and fast-to-evaluate surrogate functions that map cosmological parameters to the corresponding predicted power spectra.
Emulation offers the promise of reducing the computational overhead of evaluating cosmological power spectra by many orders of magnitude, with negligible loss of accuracy in the final parameter inference. This surrogate modelling approach has recently seen numerous applications to the Bayesian solution of the inverse problem in different scientific fields, ranging from geophysical seismic waves (Das et al. 2018;Spurio Mancini et al. 2021;Piras et al. 2021) to stel-★ a.spuriomancini@ucl.ac.uk lar and galaxy spectra (Czekala et al. 2015;Alsing et al. 2020), from chemical mechanisms (de Mĳolla et al. 2019;Kasim et al. 2021) to applied engineering (Thiagarajan et al. 2020;Buffington et al. 2020).
Emulation of cosmological power spectra is not a new idea either. Early examples of emulators include CMB (Jimenez et al. 2004) and PICO (Fendt & Wandelt 2007), both performing polynomial regression of power spectra (represented in some compact basis). The matter power spectrum emulators built from the Coyote Universe simulations (Heitmann et al. 2009(Heitmann et al. , 2013Lawrence et al. 2010Lawrence et al. , 2017 are based on Gaussian Process regression (Rasmussen & Williams 2005), and were extended by Ramachandra et al. (2021) to ( ) cosmologies. Recently, the Euclid Emulator was proposed as a surrogate model for the nonlinear matter power spectrum (Knabenhans et al. 2019;Euclid Collaboration et al. 2021), Mootoovaloo et al. (2020) developed a Gaussian Process emulator of cosmic shear band powers, while Mootoovaloo et al. (2022) and Ho et al. (2021) used Gaussian Processes to emulate the matter power spectrum. Bird et al. (2019) and Rogers et al. (2019) developed Gaussian Process emulators for the Lyman-forest flux power spectrum. C N (Auld et al. 2007; Auld et al. 2008) is the first application of neural networks to cosmological power spectra emulation, followed by Agarwal et al. (2012Agarwal et al. ( , 2014. More recently, Manrique-Yus & Sellentin (2019) developed neural network interpolators for angular power spectra of LSS observables, while Albers et al. (2019) used neural networks to accelerate parts of power spectra computations within the Boltzmann code C ; moreover, the B project ) recently included a neural network interpolator for the linear matter power spectrum . Kern et al. (2017), Schmit & Pritchard (2017) and Bevins et al. (2021) developed neural network emulators for the 21-cm global signal or power spectrum.
In this paper, we introduce a suite of neural cosmological power spectrum emulators covering both CMB (temperature, polarisation and lensing) and (linear and nonlinear) matter power spectra. These emulators provide orders-of-magnitude speed-up over direct Boltzmann solvers, whilst being comfortably accurate for upcoming surveys such as the Simons Observatory (Ade et al. 2019) 1 , Euclid (Laureĳs et al. 2011) 2 , the Vera Rubin Observatory (Ivezić et al. 2019) 3 and the Nancy Grace Roman Space Telescope (Spergel et al. 2015) 4 . For LSS observables, we demonstrate the accuracy and acceleration provided by our emulators on data from the Kilo-Degree Survey (KiDS) as well as on a simulated cosmic shear analysis of a Euclid-like survey. For the CMB, we validate C P on a Planck 2018 CMB temperature and polarisation analysis.
Our emulators are trained to provide accurate emulation of cosmological power spectra on a very wide range of cosmological parameters, and they easily allow for different configurations of input and derived cosmological parameters. In addition, they are fully differentiable, which makes them amenable to gradient-based inference, and can be run on Graphics Processing Units to gain further acceleration.
The structure of this paper is as follows. In Sect. 2 we introduce the neural network emulation framework tailored to power spectrum emulation. Application to the matter power spectrum emulation is presented in Sect. 3, including validation on the KiDS and Euclid-like analyses. Application to the CMB case is presented in Sect. 4, including validation on the Planck analysis. We summarise the properties of C P and conclude in Sect. 5.

EMULATING COSMOLOGICAL POWER SPECTRA
We consider two methods for power spectra emulation.
• The first one is a direct mapping between cosmological parameters and power spectra by means of a neural network (NN), schematically represented in the left-hand panel of Fig. 1 (see e.g. Bishop 2006 for an introduction to NNs; here, we follow a similar notation to Alsing et al. 2020). A NN is a set of stacked layers, each composed of multiple nodes; we label each of the nodes with index . To each of them a weight and bias parameter are associated, whose linear combination is passed through a nonlinear activation function (see Appendix A1 for details on the activation function employed in this work); we denote the ensemble of weights and biases as . The nonlinear function parameterised by maps the input cosmological parameters to the (log-)power spectra = ( ; ), where is either the wavenumber for the matter power spectrum, or the multipole ℓ in the CMB case. The spectra used for training the deep learning emulators are preprocessed by calculating their logarithm, followed by standardisation of the logarithmic features, i.e. dividing each logarithmic feature by its standard deviation after subtracting its mean. Taking the logarithm of the spectra reduces the dynamic range in the training data, and ensures that minimising the mean-square-error loss optimises for fractional (rather than absolute) accuracy on the emulated power spectrum. Standardisation ensures more rapid training convergence (Wan 2019).
• In the second method we train a NN to learn the mapping between cosmological parameters and Principal Components of the power spectra, as shown in the right-hand panel of Fig. 1. Principal Component Analysis (PCA) is a linear dimensionality reduction technique which performs a singular value decomposition of the input signal and retains the PCA components with the highest variance. We perform a PCA decomposition of the spectra in our training dataset, which produces a set of basis functions , , with ∈ 1, . . . , PCA . In other words, we assume we can decompose the spectra as: where the coefficients in the new basis , are the Principal Components. We train a NN to output estimatesˆof the Principal Components , given cosmological parameters as input. Similarly to the power spectra components in the direct NN case, the PCA components are also standardised.
We report the implementation details of the neural networks and PCA in Appendix A, including details on the training procedure. We tested both emulation approaches on the cosmological power spectra of interest and found the former to be in general more accurate. Thus, we decided to use it for all power spectra with the exception of the cross temperature-polarisation TE ℓ and lensing potential ℓ CMB power spectrum, which were emulated using the second method. The use of the PCA decomposition is indeed particularly convenient when, as in the CMB TE spectrum case, it is not possible to take the logarithm of the training power spectra, due to some negative values. We also observed a better performance of the emulator for the spectrum when applying the PCA compression first; we argue this is due to the smaller values of the logarithmic spectra, which range from −6 to −20 and might therefore cause numerical issues if fed directly into the NN.
One of the key advantages of C P is that the emulators are trained on very broad parameter ranges, which we report in Table 1. The choice of such large parameter ranges is motivated by the desire to provide a tool that is as general as possible (see Sect. 5 for further comments on this point).
A common problem in existing emulators is that they are trained to provide good performance on a fixed choice of cosmological parameterisation, while Boltzmann solvers maintain the flexibility to choose among different input parameterisations. In addition, Boltzmann solvers allow for derived parameters to be computed a posteriori, so that one can explore degeneracies between different parameters without restrictions. For example, one cannot directly sample in both ln10 10 s and 8 , as these two parameters are related; choosing to sample one or the other may in fact even have some effect on the posterior distribution (see e.g. Joachimi et al. 2021). The common strategy when performing cosmological inference is to choose one of these parameters for sampling, and let the Boltzmann solver compute the corresponding other one at each point in the posterior chain. In C P this is also possible, without re-training emulators from scratch. As we show in Sect. 3 and Appendix A3 for the case of ln10 10 and 8 , one can choose to sample the former or the latter, and obtain the other one with a GP. We refer the reader to Appendix A3 for details on the implementation of such a GP (see also  1. Schematic of the neural network emulation implemented in C P . A neural network is trained to learn the mapping between cosmological parameters and a) power spectra, b) coefficients in a Principal Component Analysis of the power spectra. Mootoovaloo et al. 2020 for a similar application of GPs to obtain derived values of 8 ).

Theory
Here we briefly summarise the theory underlying two-point statistics analyses of LSS surveys, following a notation similar to that of Asgari et al. (2021); Heymans et al. (2021); Joachimi et al. (2021). A flat ΛCDM model is assumed throughout the paper. Extensions of our emulators to beyond-ΛCDM cosmologies will be explored in future work (see Sect. 5 for details).
LSS analyses target the shear and clustering signal of the observed galaxies, including the shear-clustering cross-correlation, in what is typically referred to as '3x2pt' analysis (Joachimi & Bridle 2010). Angular power spectra of shear and clustering statistics can be expressed as integrals of the matter power spectrum ( , ) along the line of sight, weighted by kernel functions: where the indices {a, b} can be assigned the labels { , I, n}, denoting contributions from cosmic shear, galaxy intrinsic alignment and galaxy positions, respectively. The integral in Eq. (2) is performed along the line of sight up to the Hubble radius H = / 0 , with the speed of light and 0 the Hubble constant. denotes the comoving distance, which is a function of the redshift (a dependence implicitly assumed in Eq. 2 for ease of notation). The Limber projection (Kaiser 1992;LoVerde & Afshordi 2008) connects the wavenumber and redshift at which the matter power spectrum ( , ) is evaluated in that integral, so that given a multipole ℓ and a redshift (corresponding to a comoving distance ) the matter power spectrum is evaluated at wavenumber = (ℓ + 1/2)/ ( ). We note that, while we will restrict ourselves to Limber-approximated spectra in this paper, the emulation framework in C P can be equally applied to accelerate the computation of non-Limber projected spectra, which we plan to investigate in future work (see Sect. 5 for a discussion).
The weighting functions can be written as where Ω m is the total matter density parameter, is the scale factor, ( ) denotes the redshift distribution for redshift bin , ( ) is the linear growth factor, cr is the critical density, 1 is a constant, pivot is an arbitrary redshift set to 0.3, while IA and IA are two free parameters, allowing for a scaling in amplitude and redshift, respectively, of the intrinsic alignment contribution. The linear galaxy bias coefficients are also left free to vary in the inference pipeline.
The observed angular two-point statistics of galaxy ellipticities for tomographic redshift bins and , as a function of the angular multipole ℓ, can be written as i.e. as a sum of a pure cosmic shear contribution and contaminants resulting from the intrinsic alignment of galaxies, which also affects the angular power spectrum of the cross-correlation between galaxy ellipticities and positions (usually referred to as 'galaxy-galaxy lensing'): while the galaxy clustering power spectrum nn (ℓ) is not affected by intrinsic alignments and, assuming no magnification bias (Duncan et al. 2013;von Wietersheim-Kramsta et al. 2021), can be explicitly written as Two-point statistics measured by LSS surveys are typically linear transformations of the theoretical power spectra given by Eqs. 6, 7, 8, such as band-power estimates (Schneider et al. 2002;van Uitert et al. 2018

Emulating the matter power spectrum
Eq.
(2) clearly indicates that the prediction of the matter power spectrum ( , ) is central in the computation of the two-point statistics of cosmic shear, galaxy-galaxy lensing and galaxy clustering. Boltzmann codes can perform practically exact computations (up to numerical accuracy) of the matter power spectrum predicted by the linear theory of gravitational instability, which we denote as L ( , ). As nonlinearities become more important, linear theory breaks down. We will write the full, nonlinear spectrum as the product of the linear one L ( , ) and a nonlinear correction, which we label NL−CORR ( , ): Nonlinear corrections become important at small scales, corresponding to large wavenumbers . Their modelling as a function of cosmological parameters is uncertain and further complicated by baryonic effects, whose impact on the nonlinear matter power spectrum induces important, yet not fully understood modifications to the nonlinear power on small scales (see e.g. Chisari et al. 2019, for a review). In recent years, the HM software developed by Mead et al. (2015Mead et al. ( , 2016Mead et al. ( , 2021 has found widespread use in LSS analyses, as a way to predict the nonlinear power spectrum while taking into account baryonic effects. HM is a modified halo model which includes baryonic contributions as opposed to, for example, the fitting function H F (Smith et al. 2003;Takahashi et al. 2012;Bird et al. 2012). In this work we consider the latest version of the HM software (Mead et al. 2021). We focus on two of its free parameters, min and 0 , denoting the minimum halo concentration and the halo bloating, respectively. We train one emulator for the linear power spectrum L ( , ) and one for the nonlinear correction NL−CORR ( , ). We also experimented emulating directly the full power spectrum, but noticed increased performance when separating the linear contribution from the nonlinear correction. We observe this split helps the emulator isolate and learn more efficiently the effect of the nonlinear parameters min and 0 . For both linear and nonlinear contribution, we emulate the power spectrum at 420 values in the range [10 −5 , 10] Mpc −1 . The redshift is varied over the range [0, 5] and treated as an additional input parameter for the emulator. We use ∼ 1.8 · 10 5 spectra for our training set and leave ∼ 2 · 10 4 spectra for our testing set. We verified that this fraction of parameter samples randomly selected for testing purposes from the training set still represents a homogeneous and uniformly distributed sample for each parameter. The left-and right-hand panels of Fig. 2 report the emulation accuracy on the testing set for the linear power spectrum and nonlinear correction, respectively. We use percentile plots to show the statistical behaviour of the emulator accuracy throughout the testing set, as a function of the wavenumber . For the linear power, 99% of the testing power spectra are emulated with differences smaller than 0.1% of their real value across the entire wavenumber range considered. For the nonlinear correction, 99% of the spectra are emulated with less than 0.4% error. As expected, the percentage difference in the nonlinear correction is typically about an order of magnitude larger than the linear one, thus dominating the total error. We can see how the error on the nonlinear correction increases for 1 Mpc −1 , i.e. on highly non-linear scales. This was expected, as numerical computations performed by Boltzmann codes at these scales are more uncertain in the first place. One way to ensure increased numerical stability in these computations is to ask for a maximum value max at which nonlinearities are computed that is well above the actual maximum at which the power spectrum is required. In our analysis we set max = 100 Mpc −1 , while we only use the matter power spectrum up to = 10 Mpc −1 .
These accuracy plots already show excellent performance of the emulators in obtaining high-fidelity predictions for the matter power spectrum. However, a full inference analysis is required to thoroughly test our emulators. As further discussed in Sect. 5, we remark here that this is the only way to completely test the validity of an emulation approach such as the one developed in C P . Crucially, different accuracy emulation levels are required for different types of analyses for which the emulators are being developed. C P is a tool designed to be adequate for Stage IV surveys: as such, we need to test the performance of C P on a simulated Stage IV inference pipeline. This is what we show in Sect. 3.4 with the simulated analysis of a Euclid-like survey.
Before showing those results, we present in Sect. 3.3 an application to a Stage III analysis, namely a joint weak lensing and galaxy clustering analysis from 450 deg 2 of the KiDS survey. We refer the reader to Appendix B for another, more recent Stage III application, namely a cosmic shear analysis of 1000 deg 2 from the KiDS survey. The goal of showing these Stage III results is to demonstrate that while C P is explicitly designed to aid inference analyses from Stage IV surveys, it can clearly benefit those already ongoing from Stage III experiments too. In the latter, C P can indeed provide contours in a matter of a few minutes without any Graphic Processing Unit acceleration (see Sect. 5 for a discussion on this point). In addition, in showing results for Stage III analyses we would like to emphasise the 'train-once-use-repeatedly' nature of C P . Our emulators are designed to be trained only once, so that the end users do not need to re-train any of the models, as long as the emulators are employed assuming the cosmological model and range indicated in Table 1. For example, in the KiDS and Euclid-like analyses shown in Sect. 3.3 and 3.4, the emulators for the linear power spectrum and nonlinear correction are the same for both analyses.

Validation on the
10 5 10 4 10 3 10 2 10 1 10 0 10 1 Matter power spectrum emulation accuracy for a) the linear power spectrum and b) the nonlinear correction, as measured on the ∼ 2 · 10 4 spectra of the testing set for our emulators. Dark red, red and salmon areas enclose the 68, 95 and 99 percentiles of the fractional absolute emulator error, respectively, as a function of wavenumber . By construction, the redshift is an input parameter for the emulators. Therefore, these percentile curves are computed with spectra evaluated at redshifts ∈ [0, 5], i.e. spanning the whole range of validity of our emulators.

2015).
Here, we repeat their analysis using an inference pipeline entirely embedded within the cosmological sampler M P ( 2019), except for those cosmological parameters ( cdm , b , s ), whose prior range was restricted in the more recent KiDS-1000 analysis (Asgari et al. 2021;Heymans et al. 2021;Joachimi et al. 2021). As explained in Joachimi et al. (2021), the KiDS collaboration decided to restrict those prior ranges in their analyses, to make sure that the parameters are more physically motivated; we adhere to this choice. For the cosmological parameters in Table 2 and the baryonic feedback parameter min , the prior ranges coincide with the range of validity of our emulators (cf. Table 1). The halo bloating parameter 0 is fixed to 0 = 0.98 − 0.12 min , as implemented in C following Mead et al. (2015). The recent KiDS-1000 analyses sample the cosmological parameter 8 = 8 √︁ Ω m /0.3, instead of ln10 10 used in van Uitert et al. (2018) and other past KiDS analyses (Hildebrandt et al. 2016;Köhlinger et al. 2017;Hildebrandt et al. 2020). Here, we stick to the choice of ln10 10 as in van Uitert et al. (2018), for a more direct comparison. In Appendix B, instead, we show results for KiDS-1000 with 8 among the sampled parameters. In all cases, we always show plots for all of ln10 10 , 8 and 8 : the parameters that have not been used directly in sampling have been obtained as derived parameters with a postprocessing GP, as explained in Sect. 2 and Appendix A3. Fig. 3 shows the comparison between contour plots obtained sourcing power spectra from C and those obtained running C -P to replace the Boltzmann code. In both analyses, we sample the posterior distribution with the M N algorithm (Feroz et al. 2009), as implemented in the Python wrapper P M N (Buchner et al. 2014). We run the pipelines with parallelisation across 16 Intel Xeon E5640 cores. The posteriors with our emulator are obtained in under 3 minutes, compared with the ∼ 2.5 hours required when using C . This produces a speed-up factor of approximately 50.
We also compare the values of the log-evidence log Z obtained with M N in the two runs, and find log Z = −73.77 ± 0.19 and log Z = −73.79 ± 0.19 for the run sourcing power spectra from C and C P , respectively. The good agreement between these evidence values is another, arguably stronger indicator of the accuracy of our emulators. In order to get an accurate estimate of this quantity one needs indeed accurate values of the posterior distribution across the whole prior range, while for parameter estimation it may be sufficient to get accurate estimates around the peak of the distribution.
We also applied C P to a more recent KiDS analysis, namely that considering cosmic shear band powers from 1000 deg 2 of the survey, as presented in Asgari et al. (2021). We use this analysis as another test case for the performance of our emulator, finding similar speed-up factors provided by replacing the Boltzmann code C with C P . We refer the reader to Appendix B for details and plots for this analysis.

Validation on the Euclid-like cosmic shear likelihood
We re-use the same emulator employed in Sect. 3.3 to perform a full inference analysis of a simulated cosmic shear survey for a typical Stage IV Euclid-like configuration. The prior ranges are reported in Table 3. For the cosmological and baryonic feedback parameters they correspond to the validity ranges of our emulators, reported in Table 1. The fiducial values of the parameters used to calculate the mock data vector are also reported in Table 3. 10 tomographic bins are equipopolated with galaxies following a distribution (Smail et al. 1995;Joachimi & Bridle 2010;Laureĳs et al. 2011): with 0 = 0.64. For the covariance matrix, we use a simple analytical Gaussian computation following Tutusaus et al. (2020), with a sky coverage sky = 0.3, a galaxy number density = 30 arcmin −2 and an ellipticity dispersion = 0.3. We implement the likelihood in the cosmological sampler C (Torrado & Lewis 2019; Torrado & Lewis 2021) and use the Palgorithm (Handley et al. 2015a,b) to sample the posterior distribution. We run the pipelines with parallelisation across 48 Intel Xeon E5640 cores and obtain a speed-up of approximately 50. Fig. 4 shows the excellent agreement between contours obtained with C and C P . The log-evidence log Z = −45.92 ± 0.33 computed for the run with spectra sourced from C also compares favourably with that of the emulator run, log Z = −45.99 ± 0.34.

Emulating CMB temperature, polarisation and lensing power spectra
The three main probes in the analysis of the CMB are the temperature power spectrum ( TT ℓ ), the polarisation power spectrum ( EE ℓ ), and the temperature-polarisation cross power spectrum ( TE ℓ ). We use C to produce the training set, which consists of ∼ 5 · 10 5 power spectra for each probe, with their associated parameters. The parameters are sampled from the range indicated in Table 1 using Latin Hypercube Sampling. We verified that the specific method chosen for sampling parameter space is comparatively less important than the number of training points. We obtained essentially the same accuracy results in our analysis using alternatives to Latin Hypercube Sampling such as Orthogonal Sampling (Owen 1992) or even a simple uniform sampling for each parameter. This is due to the fact that differences between different sampling schemes tend to become negligible as the number of parameters becomes large, as it is in our case.
To emulate the CMB spectra, we consider both methods described in Sect. 2. We use the direct NN mapping for TT ℓ and EE ℓ , while we found that the TE ℓ emulation is better performed by the second method, i.e. PCA compression followed by a NN, due to its zerocrossing values. To show the flexibility of our approach, we also train a PCA plus NN emulator on the lensing potential power spectrum ℓ ; note, however, that this probe is not included in the likelihood analysis performed in the next section.
The accuracy of the emulators over the ℓ range is measured with respect to the instrumental noise given by the upcoming Simons Observatory (Ade et al. 2019) combined with cosmic variance. In particular, we calculate the emulation error as:  We validate our emulators using two sets containing ∼ 2 · 10 4 spectra each. The first one corresponds to cosmological parameters sampled from a restricted range, corresponding to 5 intervals around the best fit values from Planck (Aghanim et al. 2020); the results are reported in Fig. 5 for all probes. As one can see, the distribution of the quantiles is very tight, and almost always 99% of the spectra are within less than 0.1 CMB at all ℓ values.
The second set of spectra corresponds to the same range the emulators were trained on, i.e. the intervals in Table 1; the results are reported in Appendix C. Unsurprisingly, the errors are slightly larger,  Figure 5. CMB power spectra emulation accuracy on the 5 range test set for a) the temperature power spectrum, b) the polarisation power spectrum, c) the temperature-polarisation cross power spectrum, d) the lensing potential power spectrum. The emulation error is defined with respect to both instrumental and statistical noise, and is defined in Eqs. (11)(12)(13). Dark red, red and salmon areas enclose the 68, 95 and 99 percentiles of the test set. Details of the neural models are reported in Appendix A1.
. reaching 0.2 CMB ; however, in the next section we show that this level of accuracy for spectra emulation is sufficient to provide accurate and unbiased inference of cosmological parameters in a CMB analysis.

Validation on the Planck 2018 likelihood
After assessing the accuracy of our emulators by looking at the residuals of their predictions on the testing set, we performed the final validation check by using the emulators to speed up parameter estimation in a Planck CMB inference pipeline (Aghanim et al. 2020). We considered the pure P implementation of the P 2018 -likelihood available from Prince & Dunkley (2019) 7 , which is pre-marginalised over a series of nuisance parameters. The only 7 https://github.com/heatherprince/planck-lite-py. Note that a similar pure P implementation is available in the cosmological sam-remaining calibration parameter is a multiplicative correction factor Planck . The prior ranges for all of the parameters varied in the analysis are reported in Table 4.
To further showcase the strength of C P , to draw from the posterior distribution we use a parallelised affine-invariant MCMC sampler 8 , that is a parallel, GPU-compatible T F implementation of a variant of the algorithm underlying the sampler (Foreman-Mackey et al. 2013), based on Goodman & Weare (2010). This sampler fully exploits the neural emulators by allowing large batches of likelihood calls to be performed in parallel. Using a single GeForce GTX 1080 GPU, we can obtain the full contours in ∼ 10 seconds, while the contours using C and take a total wall-clock time of ∼ 1.2 · 10 5 seconds using 80 CPU cores, for a final  The red contours are obtained in ∼ 1.2 · 10 5 seconds on 80 CPU cores using C , while the blue contours take roughly 10 seconds on a single GPU using our neural emulators. Note that the constraints on 100 S are derived from the rest of the samples using a Gaussian Process. speedup of O (10 4 ). Moreover, we train a single GP, as described in Sect. 2, to derive constraints on 100 S as well, in the same fashion of the 8 contours obtained in Sect. 3. We show the posterior contours in Fig. 6.
To obtain log-evidence estimates we re-run both likelihoods with M N and find logZ = −313.72 ±0.15 and logZ = −313.79 ± 0.15 for the run with C and C P , respectively.

CONCLUSIONS
We presented C P , a suite of cosmological power spectra emulators developed to accelerate by orders of magnitude parameter estimation from Large-Scale Structure (LSS) and Cosmic Microwave Background (CMB) surveys. C P emulates mat-ter and CMB power spectra computed by Boltzmann codes such as C and C . Sourcing power spectra from Boltzmann codes is the computational bottleneck for two-point statistics analyses of cosmological fields; C P bypasses this step, providing ordersof-magnitude acceleration to the inference pipeline. Power spectra emulation is performed using a neural network (NN) to parameterise the mapping between cosmological parameters and power spectra or their Principal Component Analysis (PCA) coefficients.
In this paper, we presented emulators for the linear and nonlinear matter power spectrum, as well as for the CMB temperature, polarisation and lensing power spectrum. We showcased the performance of C P with applications to Stage III and simulated Stage IV surveys, including: a 3x2pt and cosmic shear analysis of the Kilo-Degree Survey (KiDS), a mock Euclid-like cosmic shear analysis and a Planck 2018 joint temperature and polarisation analysis. In all C P 11 of these cases, the power spectra emulators provided unbiased cosmological inference, at a fraction of the time required by the same pipelines run with power spectra sourced from Boltzmann solvers. In the following, we summarise the main properties of C P , compare its performance with that of other emulators, and discuss some of its planned future extensions.

Key properties of C P
The following key properties of C P make it an invaluable tool for application to future Stage IV analyses.
• Speed-up. First and foremost, the use of C P to replace Boltzmann codes in likelihood evaluations provides an impressive speed-up factor. In the applications considered in this paper, C P provided an acceleration factor up to (10 4 ) with respect to standard analyses with Boltzmann codes. These numbers refer to the full inference pipeline; if we restrict to timing a single power spectrum evaluation, the speedup increases even further up to (10 5 ), respectively. These numbers are expected to increase as we extend C P to cosmologies beyond the flat ΛCDM model which was assumed throughout this paper. In this sense, the acceleration quoted in this analysis is to be regarded as a lower bound for the speed-up achievable with C P . • GPU acceleration. Our emulators are based on neural networks implemented in T F . As such, they benefit from an additional speed-up when run on Graphics Processing Units (GPU) or Tensor Processing Units (TPU) 9 .
The speed-up calculated for the full inference pipeline differs from the one computed for a single power spectrum evaluation because a single likelihood evaluation is slower than a power spectrum evaluation: computing the projected angular spectra of the LSS probes, or binning the CMB spectra as performed in the Planck likelihood, requires a series of numerical operations. These parts of the likelihood evaluation are computationally intensive regardless of the method employed to source power spectra. This is particularly true for Stage IV LSS surveys, which will have a high number of redshift bins and hence will require the computation of a high number of bin cross-correlations (cf. Eq. 2). These expensive loops in the likelihood evaluation can be massively accelerated by implementing the likelihood in T F or JAX (Frostig et al. 2018) and running the inference pipeline on GPUs.
In this paper we showed how running C P in a pure T -F -based version of the Planck likelihood, embedded within a framework for cosmological inference also implemented in T -F , provided contours in ∼ 10 seconds. Running an inference pipeline with the Planck likelihood is a notoriously computationally intensive task: this example of speed-up achieved with a T --based likelihood and power spectra clearly shows how beneficial the combination of highly optimised software for power spectra emulation and Bayesian posterior sampling can be. We advocate for moving towards implementations of cosmological inference software in T F or JAX, to fully exploit the power of highly optimised software that can be run on GPU and is auto-differentiable. The reason for this is the high number of nuisance parameters that is expected to be required to model the observables, in addition to the large size of the data vector. Particularly, if one desires to make 9 both freely available on Google Colab http://colab.research. google.com/. In the C P GitHub repository we provide J notebook examples to run C P on Colab GPU. use of the unprecedented amount of data provided by these surveys to investigate beyond-ΛCDM cosmologies, the implementation of cosmological inference frameworks leveraging GPU acceleration is of the utmost importance. C P aims at providing such a framework, incorporating not only trained emulators but also template T F -based likelihoods for LSS and CMB surveys that can be easily adapted for application to different surveys.
• Full differentiability. C P provides emulators for the power spectra that are based on neural networks and implemented in T F (Abadi et al. 2015). Thus, these emulators are by construction fully differentiable, a feature which makes them ideal for gradient-based inference, such as Hamiltonian Monte Carlo (Hajian 2007). If desired, they can also be used for instantaneous Fisher matrix computation and linear data compression with e.g. the MOPED algorithm (Heavens et al. 2000), leveraging the possibility of calculating derivatives with respect to the input parameters by autodifferentiation.
• Accuracy. The procedure followed to validate the accuracy of our emulators guarantees that they can be safely used for analyses of Stage IV surveys. Crucially, we verified this statement not only by checking the residuals between emulated and real power spectra in the testing set, but also by validating our emulators with full posterior inference analyses from state-of-the-art surveys, as well as from simulated Stage IV surveys. In carrying out this comparison at the contours level, we performed an additional validity check between C P and the Boltzmann codes C and C . To train our models we used power spectra generated with C ; instead, when comparing the C P contours against those obtained from a traditional inference pipeline sourcing power spectra from a Boltzmann code, we used C for the latter. As shown in Figs. 3, 4 and 6, contours obtained with the emulators (trained on C ) were always found in excellent agreement with the contours obtained from the inference pipelines sourcing power spectra from C , including in the simulated Stage IV Euclid-like configuration. We also verified that replacing C with C in the inference pipelines provides contours with a similar level of agreement: this means that the difference between C P predictions and Boltzmanncomputed power spectra are not bigger than the differences between power spectra computed with different Boltzmann codes.
• Parameter range. The parameter range over which our models are trained is very large, covering the full Planck prior range in the CMB case and the full KiDS-1000 prior range in the LSS case. The combination of high accuracy and wide validity range allows the user of C P to safely replace Boltzmann codes with our emulators when computing power spectra, even for those practical applications where high accuracy over broad prior ranges is crucial, such as posterior predictive cross-validation. The accuracy of our emulators even in extreme regions of the parameter space considered for their training is confirmed by the good agreement between logevidence values obtained in the likelihood runs with power spectra sourced from C and C P . • Flexibility. By construction, C P emulates cosmological power spectra taking in input only those cosmological parameters that are part of the mapping between input cosmologies and output power spectra. This means that, for example, in the LSS case, the key emulated quantity is the matter power spectrum and not the cosmic shear, galaxy-galaxy lensing or galaxy clustering projected power spectra. The rationale behind this choice is that the angular spectra of cosmological probes are quantities derived from the matter power spectrum, by integrating it over a kernel which depends on the redshift distributions. In addition, contaminant terms due to e.g. intrinsic alignments are also obtained by integration of the matter power spectrum, and modulated by nuisance parameters which are not part of the cosmological model. By avoiding to include those additional parameters in the target mapping for the emulator, C P acquires a unique flexibility that makes it completely independent of astrophysical nuisance parameters, such as intrinsic alignment and galaxy bias parameters, that do not modify the matter power spectrum prediction. This means that our emulators do not need to be trained for different choices of these astrophysical parameters. In particular, no re-training is required if one wishes to implement different prescriptions for the modelling of contaminants such as intrinsic alignments, e.g. by inserting additional nuisance parameters, as long as those parameters do not modify the prediction for the matter power spectrum. A similar argument is applicable to the modelling of redshift distributions, which will likely require even more nuisance parameters than the mean shifts used in our simulated Euclid-like analysis (see e.g. Hasan et al. 2021). As an additional bonus, emulating the 3D matter power spectrum will allow us to investigate in future work the use of C P for emulating cosmological power spectra beyond the Limber approximation (see below).
• Linear and nonlinear power spectra. C P provides emulators for both linear power spectra and nonlinear correction factor. For the latter, the HM prescription is currently implemented in the emulator. The H F model (which HM is based on) is also available to the user. The separation between linear power spectrum and nonlinear correction factor is particularly useful as it allows us to integrate in C P additional models for nonlinearities as they become available. On the linear level, modified Boltzmann codes for beyond-ΛCDM models exist (for example H C for Horndeski models; Zumalacárregui et al. 2017) that provide linear predictions for the matter power spectrum in these extended cosmologies. As we extend C P to these models, we can add new emulators for linear power spectra trained on these modified Boltzmann codes.
• "Train-once-use-repeatedly" approach and interface with cosmological samplers. While we provide all the tools necessary to repeat the training if desired, we stress that this operation has already been performed and does not need to be repeated, as long as the emulators are employed assuming the cosmological model and range indicated in Table 1. In addition, C P can be called from all commonly used cosmological samplers. In this paper, for example, we used C P within the cosmological samplers M P and C . The user of C P simply needs to write a likelihood for the LSS or CMB survey considered, and replace the call to the Boltzmann code, necessary to obtain the matter or CMB power spectra, with a call to C P . • Derived parameters. Emulators developed in the literature usually provide a fixed parameterisation to emulate from. For example, if an emulator is trained using ln10 10 it is not possible to get a prediction for a corresponding value of 8 or 8 . C P provides emulators trained on different combinations of parameters. For example, the 3x2pt KiDS-450+GAMA analysis was performed with an emulator trained on ln10 10 s as input parameter, while the KiDS-1000 analysis used 8 in input. In addition, C P also allows the user to post-process a sampled chain to obtain very efficiently derived parameters that were not originally sampled. Moreover, we provide Gaussian Processes (GPs) to obtain derived parameters that were not used as input to the emulators. The accuracy of these GPs was tested not only on a test set, but also against the actual contours obtained with the Boltzmann codes.

Comparison with previous work
Here we compare our emulators to other existing approaches to power spectra emulation. We start by noticing that C P provides an emulation framework for both LSS and CMB. To our knowledge, this is a unique feature, only partially shared by PICO and C N (both, however, emulating matter transfer functions rather than power spectra). These two packages are not actively maintained nor trained with the same accuracy or across the same parameter ranges, which limits their applicability to Stage IV analyses. As far as the matter power spectrum is concerned, the methods closest to ours in terms of emulation are the one implemented in Aricò et al. (2021), even though limited to the linear power spectrum, and Agarwal et al. (2012), limited to H F nonlinearities. We note that applying the emulator to a complete inference analysis from a simulated Stage IV survey, as done in our paper, is a necessary step to ensure that the newly developed tool can be safely applied in practical analyses. On the contrary, checking residuals in the testing set between predicted and real spectra is not a sufficient accuracy test. While an emulator may be performing with e.g. subpercent accuracy at the level of residuals, this may still not be enough to retrieve unbiased cosmological contours, as we verified first-hand while testing C P . This is due to the fact that the accuracy threshold for the emulation can only be defined by the specific application for which these emulators are designed. In other words, it is the inference pipeline that dictates the accuracy threshold to be met by the emulator. In general, parameter estimation in Bayesian inference pipelines requires a certain level of accuracy in the observables computed, which in the specific case of cosmological two-point statistics analyses reflects into certain accuracy requirements in the power spectra computed by Boltzmann codes. Hence, we argue that the principled approach to validate an emulator accuracy is to compare its performance within an inference pipeline for a target experiment, which in our case is a Stage IV survey configuration. Note that, while testing C P , we experienced first-hand that emulators performing greatly on Stage III experiments failed in producing equally correct contours on a simulated Stage IV survey.
GP-based emulators such as the ones used in Mootoovaloo et al. (2020Mootoovaloo et al. ( , 2022; Ramachandra et al. (2021);Ho et al. (2021) require fewer training samples than a NN emulation framework like C -P ; however, GPs also provide reduced speed-ups compared to NNs. On the other hand, GPs also provide a way to propagate the uncertainty in their prediction to the final posterior distribution, whereas simple NNs like those implemented in this version of C -P lack this feature. In future versions of C P we will investigate the use of Bayesian Neural Networks or ensemble NN predictions for this purpose. Mootoovaloo et al. (2020) developed GP emulators of cosmic shear band powers for the KiDS-450 survey, with the option of compressing the band powers into MOPED coefficients and learning those coefficients with the GP instead of the band powers themselves. Similarly, Manrique-Yus & Sellentin (2019) developed NN emulators of the 3x2pt angular power spectra. Both of these approaches are constrained by the choice of the redshift distributions specific to the survey, which enters the expression of the angular power spectra. In addition, the method of Mootoovaloo et al. (2020) also relies heavily on the choice of nuisance parameters used to model the power spectra; these parameters need to be 'learnt' by their GP. C P is free from any of these restrictions: targeting emulation of the matter power spectra, our emulators are completely flexible to be used for any redshift distribution and choice of nuisance parameters. While the speed-up obtained by emulating the matter power spectrum may C P 13 be smaller than that obtained from emulating the angular power spectra of the different probes, we believe this computational overhead can be avoided by rewriting the likelihood of interest in T F and running it on a GPU together with the emulators.
Finally, Albers et al. (2019) implemented an interesting NN-based acceleration of source functions in the calculation of CMB power spectra within the Boltzmann code C . Emulating source functions provides great flexibility in the pipeline, as for example it allows one to compute higher-order correlators and non-linear transfer functions without retraining. On the other hand, emulation of full power spectra performed in C P provides greater speed-ups and is better suited for implementation of full inference pipelines on GPUs.

C
P is an open-source package provided to the cosmological community as a tool to accelerate intensive computations within Bayesian inference pipelines of LSS and CMB surveys. In this paper we considered the emulation of power spectra, which represents the bottleneck for two-point statistics analyses of cosmological fields. However, this paper also marks the starting point of a longer-term project, with the goal of extending the C P framework to accelerate the forward-modelling of multiple cosmological observables with machine learning.
• Higher-order statistics, systematics and beyond-Limber spectra. We plan to train emulators for higher-order statistics such as the bispectrum. Not only does the bispectrum contain complementary cosmological information to the power spectrum (see e.g. Pyne & Joachimi 2021), but it is also required to calculate (computationally expensive) corrections to the power spectrum as those arising from dropping the reduced shear approximation (Deshpande et al. 2020). In addition, multiple observational systematics in cosmic shear analyses (see e.g. Euclid Collaboration et al. 2020) can be modelled with machine learning techniques and their effect on cosmological parameter estimation can thus be properly accounted for. Finally, for LSS a key advantage within our emulator is given by targeting the matter power spectrum, as opposed to the angular power spectra of the cosmological probes. This choice will allow us to investigate efficient computation of non-Limber projected quantities in future work.
• Beyond-ΛCDM cosmologies. As already mentioned above, we plan to extend C P to models beyond the flat ΛCDM one considered in this paper. For example, emulation of power spectra in non-flat cosmologies is of the utmost importance, since their calculation by Boltzmann codes are computationally intensive (see e.g. Handley 2021). More generally, power spectra computed in alternative cosmologies are considerably more demanding to compute for Boltzmann codes than in ΛCDM. Instead, we expect NN architectures similar to the ones considered in this paper to be equally accurate for emulating beyond-ΛCDM spectra (while possibly requiring larger training sets). Consequently, the evaluation time of these beyond-ΛCDM emulators will remain essentially the same as those reported in this paper. This will produce an even greater speedup factor over Boltzmann codes. Some of these extensions to ΛCDM are already being developed and will be presented in a future publication (Spurio Mancini et al., in prep.).
• A fully differentiable cosmology library. In the longer term, C P will be extended to provide a completely differentiable library for cosmological computations. This will also include much simpler functions to emulate, such as cosmological distances. As we move towards the era of Stage IV surveys, the statistical challenges in analysing those datasets will certainly require increased sophistications in the Bayesian inference engines available. Differentiability is key in unlocking the possibility of efficient gradient-based inference, a promising avenue to tackle the challenge represented by the high-dimensional parameter spaces characterising the analyses of Stage IV surveys. Therefore, endowing the cosmological community with a fully differentiable forward model of multiple observables is a task of paramount importance, which we aim to accomplish with C P . • Interpretable machine learning. Alternative methods for compression and emulation of cosmological quantities, such as autoencoders and symbolic regression (Udrescu et al. 2020), will be explored within C P , with the goal of maximising the computational efficiency and interpretability of the machine learning framework developed.
C P is a tool that allows for principled, non-invasive application of machine learning within a rigorous Bayesian framework for uncertainty quantification. We are confident that emulation techniques like those presented here will greatly enhance the scientific return from Bayesian inference analyses of upcoming Stage IV surveys.

A1 Neural network
A neural network (NN) consists of a sequence of layers connecting some input to some output, in this case cosmological parameters to power spectra (or their Principal Components). As depicted in Fig. 1, each layer performs a linear combination of the previous layer, and applies a nonlinear function to increase the expressiveness of the model. Following Alsing et al. (2020), we choose the following activation function for all hidden layers in our neural networks where and are optimised together with the rest of the network parameters, and indicates the element-wise product. The activation function in Eq. A1 can be seen as a stack of scalar activation functions for each node j with independent hyperparameters j and j . We experimented with other, more traditional activation functions such as the hyperbolic tangent (tanh) or Rectified Linear Unit (ReLU) (Agarap 2018) and we always found them to perform slightly worse than our custom function in terms of accuracy.
In the CMB case the output layer is made of 2507 nodes, i.e. one for each multipole ℓ from ℓ min = 2 to ℓ max = 2508 included. When building the training and testing set, we make sure to ask the Boltzmann code for an explicit calculation of the CMB power spectrum at each one of the multipoles in the range, as this reduces the amount of interpolation performed internally by the Boltzmann code and, in turn, increases the accuracy of the C P emulation. For the matter power spectrum the output layer is made of 420 nodes, corresponding to the sampled -modes: these are chosen such that the baryon acoustic oscillations features are more densely sampled, in an analogous fashion to what is performed internally by and , albeit in this case with a cosmology-independent sampling. Our neural networks always use 4 hidden layers of 512 neurons each, for both CMB and LSS; each neuron is associated to a weight and a bias, as described in Sect. 2 and Fig. 1. We did not perform a fully exhaustive optimization of hyperparameters of the neural networks such as the number of hidden layers or neurons. In our experiments we typically found that an architecture with at least three hidden layers of a few hundred neurons each was needed to achieve high accuracy results, with a four-layer configuration being even more performing. It is possible that equally, if not more accurate results may be achieved with a smaller architecture, albeit with more training samples. However, handling such large datasets may be a computationally non-trivial challenge, particularly in terms of memory requirements.
The network's parameters are optimised using Adam (Kingma & Ba 2014) with default parameters, and the loss function to minimise is chosen to be the mean squared error between the emulated and the true power spectra. We keep apart 20% of the training set for validation purposes. The learning rate is initially set to 1 · 10 −2 , and then decreased by a factor of 10 each time the validation loss does not decrease for 20 epochs, where each epoch corresponds to feeding the whole training set into the network; the final learning rate is 1 · 10 −6 . The batch size is changed accordingly, starting from 1 · 10 3 , then 1 · 10 4 , up to 5 · 10 4 .

A2 Principal Component Analysis
We compared the performance of the direct NN mapping with an alternative emulation method where the spectra are first compressed to their Principal Components, following S (Alsing et al. 2020), which we refer to for the full details. Note that this is necessary for the TE ℓ case, as due to the dynamic range of these spectra it is not possible to consider its logarithmic features; moreover, we found the performance of this second emulator superior over the direct NN mapping when emulating ℓ . We keep 512 and 64 Principal Components for TE ℓ and ℓ , respectively; the neural network is then trained in the same way as described in Appendix A1. To obtain a power spectrum, we map the cosmological parameters to the predicted Principal Components, and then use the learnt change of base to map the Principal Components into the predicted power spectrum.

A3 Gaussian Process
We use a Gaussian Process (GP) to obtain derived cosmological parameter constraints from given samples of the posterior distribution of the other parameters. When obtaining a derived parameter der , we assume that there exists a mapping der = ( ), where indicates the parameters the emulator was trained on, and model ( ) as a GP. Following Spurio , we assume that the function ( ) follows a normal distribution with zero mean and a parametric covariance matrix, usually referred to as kernel ( , ; ), with and indicating two points in parameter space, and representing the trainable hyperparameters.
We choose the Automatic Relevance Discovery (ARD) 3/2 Matérn kernel (Neal 1996;Rasmussen & Williams 2005), which is expressed as ARD−Matérn−3/2 , ; is the length of the vector (in this case the number of cosmological parameters), and the hyperparameters = { , } are the signal standard deviation and a characteristic scale for each input parameter = 1, . . . , . We use the software GP 10 to train the GP, i.e. to optimise the hyperparameters of the kernel, using the tuples ( , der ) from the Boltzmann solver C as training data. Prior distributions are all taken to be uniform across these ranges, except for the -shifts , which are sampled from a correlated Gaussian prior with mean and covariance (see Asgari et al. 2021 for details on their values). Note that in this case we sample 8 instead of ln10 10 s . As explained in the main text, our emulators are trained on different choices of this sampling parameter, and one can always recover the parameter that is not directly sampled in a post-processing step, using Gaussian Process regression.

APPENDIX B: VALIDATION ON THE KIDS-1000 LIKELIHOOD
We include an application of C P to a recent cosmic shear band power analysis of the KiDS survey, covering ∼1000 deg 2 (KiDS-1000, Asgari et al. 2021). For this analysis we need to train a matter power spectrum emulator for a model with one massive neutrino with mass = 0.06 eV/ 2 . We report in Fig. B1 the same accuracy plots shown in Fig. 2 for a massless neutrino model. Similar levels of accuracy of the linear power spectrum and nonlinear correction are achieved. In future extensions of C P we will integrate emulation over the neutrino mass as an additional parameter. Note also that in this analysis, following Asgari et al. (2021), we sample in 8 rather than in ln10 10 s . The uniform prior ranges assumed for this analysis are the same used in Asgari et al. (2021) and they are reported in Table B1. Following Asgari et al. (2021) we include one -shift for each of the five redshift bins, , for = 1, . . . , 5; for these shift parameters we set a correlated Gaussian prior. We refer to (Asgari et al. 2021;Joachimi et al. 2021) for further details, in particular on the redshift distributions and covariance matrix modelling. We run the inference pipeline 11 within the cosmological sampler M P . In Fig. B2 we show the excellent agreement between contours obtained with C and C P , with the latter providing a speed-up factor of approximately 50. Values of the log-evidence are also in good agreement: log Z = −81.45 ± 0.14 and log Z = −81.40 ± 0.14 for the run with C and C P , respectively.

APPENDIX C: RESULTS FOR CMB POWER SPECTRUM EMULATOR ON LARGER RANGE
In Fig. C1 we report the emulation accuracy of our CMB emulators applied to the test set sampled with cosmological parameters in the range of Table 1. The performance reaches no more than 0.2 CMB as 11 adapted from https://github.com/BStoelzner/KiDS-1000_ MontePython_likelihood defined in Eqs. (11-13), which we verified corresponds to an adequate accuracy for Stage IV surveys, as we showed in the main body. This paper has been typeset from a T E X/L A T E X file prepared by the author.  Figure C1. CMB power spectra emulation accuracy on the test set sampled from the range indicated in Table 1 for a) the temperature power spectrum, b) the polarisation power spectrum, c) the temperature-polarisation cross power spectrum, d) the lensing potential power spectrum. The emulation error is defined with respect to both instrumental and statistical noise, and is defined in Eqs. (11)(12)(13). Dark red, red and salmon areas enclose the 68, 95 and 99 percentiles of the test set. Details of the neural models are reported in Appendix A1.