Fast and realistic large-scale structure from machine-learning-augmented random field simulations

Producing thousands of simulations of the dark matter distribution in the Universe with increasing precision is a challenging but critical task to facilitate the exploitation of current and forthcoming cosmological surveys. Many inexpensive substitutes to full $N$-body simulations have been proposed, even though they often fail to reproduce the statistics of the smaller, non-linear scales. Among these alternatives, a common approximation is represented by the lognormal distribution, which comes with its own limitations as well, while being extremely fast to compute even for high-resolution density fields. In this work, we train a generative deep learning model, mainly made of convolutional layers, to transform projected lognormal dark matter density fields to more realistic dark matter maps, as obtained from full $N$-body simulations. We detail the procedure that we follow to generate highly correlated pairs of lognormal and simulated maps, which we use as our training data, exploiting the information of the Fourier phases. We demonstrate the performance of our model comparing various statistical tests with different field resolutions, redshifts and cosmological parameters, proving its robustness and explaining its current limitations. When evaluated on 100 test maps, the augmented lognormal random fields reproduce the power spectrum up to wavenumbers of $1 \ h \ \rm{Mpc}^{-1}$, and the bispectrum within 10%, and always within the error bars, of the fiducial target simulations. Finally, we describe how we plan to integrate our proposed model with existing tools to yield more accurate spherical random fields for weak lensing analysis.


INTRODUCTION
The best current model to describe our Universe is the ΛCDM model, which prescribes the existence of a cosmological constant Λ associated with dark energy, together with cold dark matter (CDM) and ordinary matter (baryons; see e.g.Dodelson 2003).In particular, the ΛCDM model predicts that dark matter is about five times more abundant than ordinary matter, with galaxies forming along the cosmic web structure woven by dark matter, made of filaments connecting different clusters, all surrounded by voids.While its gravitational effects are observed by many probes, dark matter remains a mystery, with multiple experiments still ongoing to shed light on its nature (see e.g.Trimble 1987;Bertone et al. 2005;Buchmueller et al. 2017;de Swart et al. 2017, and references therein).
The most common tool to analyse and track the origin and evolution of dark matter structures are cosmological -body simulations (Holmberg 1941;Navarro et al. 1996;Tormen 1997;Jenkins et al. 1998;Springel 2005;Springel et al. 2005;Boylan-Kolchin et al. 2009;Angulo et al. 2012;Villaescusa-Navarro et al. 2021, 2020;Chacón et al. 2020, and references therein).In its basic formulation, an -body simulation is run by putting a certain number of massive particles in a cubic box, imposing periodic boundary con-★ d.piras@ucl.ac.uk ditions and letting gravity be the only force acting on the particles through its gravitational potential, governed by the Poisson equation (Springel 2005).The initial conditions of the Universe are usually approximated with a Gaussian density field, 1 which can be entirely summarised by a given power spectrum, i.e. by the Fourier counterpart of the correlation function between different particles in the simulation.Starting from high redshift, the position and velocity of the particles are updated iteratively until today ( = 0), while various snapshots are taken at different redshifts.
Several methods to run an -body simulation are available, with different levels of complexity, approximation, and speed (Hockney & Eastwood 1988;Chacón et al. 2020).These include the direct resolution of the equation of motion for each particle (Mikkola & Aarseth 1993), approximated methods like the tree code method (Barnes & Hut 1986;Callahan & Kosaraju 1992), or mean-field approaches like standard (Klypin & Holtzman 1997) or adaptive (O'Shea et al. 2004) particle mesh.In general, though, -body simulations are computationally expensive to run, and usually require access to high performance computing hardware.This limits the possibility of fully exploring the impact of different cosmological parameters on the dark matter evolution in our Universe, and hinders statistical analy-ses of the large-scale structure (see e.g.Taylor et al. 2013;Taylor & Joachimi 2014): -body simulations are essential to associate a covariance matrix to real measurements, and thousands of simulations are required to obtain accurate estimates of such matrices.
In recent years, many cheaper approximations have been proposed, which try to capture both the large-scale structure of the cosmic web and its smaller-scale details.These approximations often rely on Lagrangian perturbation theory (Buchert 1992;Buchert & Ehlers 1993;Buchert 1994), and can produce accurate dark matter halo mock catalogues and dark matter density fields (Monaco et al. 2002;Monaco et al. 2013;White et al. 2013;Kitaura et al. 2013;Chuang et al. 2014;Tassev et al. 2013;Tassev et al. 2015;Howlett et al. 2015;Rizzo et al. 2017;Tosone et al. 2020Tosone et al. , 2021)).While being capable of capturing the large-scale-structure statistics with fewer computational resources, these methods usually fail to accurately produce the correct small-scale statistics.Although such approximate mocks are used to estimate covariance matrices in current large-volume datasets when not enough high-resolution simulations are available, to date no inexpensive exact alternative to -body realisations exists.
Another typical approximation to describe (dark) matter fields is found by resorting to a lognormal random field, which represents the simplest alternative to running an entire -body simulation (Coles & Jones 1991;Peebles 1993;Taruya et al. 2002;Percival et al. 2004;Hilbert et al. 2011;Xavier et al. 2016).A lognormal random field can be easily obtained from a Gaussian random field (see Sect. 3.1 for further details), and can be entirely described by a small number of parameters; moreover, a lognormal variable has a skewed distribution which is suited for e.g. the matter overdensity field, whose values range from -1 in voids to values much higher than 1 in clustered dense regions.However, as reported in Xavier et al. (2016) and shown in Fig. 1, the lognormal approximation comes with its own limitations, and fails to reproduce the correct matter density distribution.
Machine learning (ML) techniques have also been proposed to replace expensive -body simulations.In Rodríguez et al. (2018), generative adversarial networks (GANs, Goodfellow et al. 2014) were successfully trained to generate slices of -body simulations, and Mustafa et al. (2019) applied the same technique to weak lensing convergence maps.Perraudin et al. (2019) andFeder et al. (2020) then extended the application of GANs to 3-D boxes, proving that, while challenging to train, GANs can capture both large-and smallscale features, and are capable of accurately recovering the statistical information contained in the training data.He et al. (2019) and Alves de Oliveira et al. (2020), on the other hand, showed that it is possible to train a U-shaped neural network architecture (U-net, Ronneberger et al. 2015) to map simple linear initial conditions to the corresponding final evolved fields, correctly learning the non-linear growth of structures under the gravitational influence.Kaushal et al. (2021) additionally used Lagrangian perturbation theory to evolve such initial conditions and only learn the difference in the density fields at  = 0.In these latter works, it was also shown that such architectures can perform well even on input data obtained from different cosmological parameters than the training data, thus demonstrating the appealing feature of being able to extrapolate outside the training distribution.Other works have explored the use of super-resolution techniques to -body simulations (Kodi Ramanah et al. 2020;Li et al. 2021), the application of normalising flows (e.g.Papamakarios et al. 2021) as generative models of the large-scale structure (Rouhiainen et al. 2021;Dai & Seljak 2022), wavelet phase harmonics statistics to produce realistic 2-D density fields (Allys et al. 2020), or combinations of ML-inspired techniques with more traditional methods to improve the accuracy of fast -body solvers (Dai et al. 2018(Dai et al. , 2020;;Dai & Seljak 2021;Böhm et al. 2021).
While being useful, all the previous approaches still require a relatively high amount of computational resources, might not scale well to high-resolution fields, or introduce many approximations that prevent them from being used reliably in place of full -body simulations.In this paper, we show that it is possible to improve the lognormal approximation by means of ML techniques, with the longterm goal of integrating our approach with the Full-sky Lognormal Astro-fields Simulation Kit (FLASK, Xavier et al. 2016), in order to be able to cheaply generate more realistic high-resolution full-sky density fields.
For this purpose, we start from the Quĳote -body simulation suite (Villaescusa- Navarro et al. 2020), which offers thousands of realisations of a single cosmological parameterisation, as well as hundreds of simulations at different values of the cosmological parameters.We devise a pipeline to create lognormal density fields which are the approximated counterpart of the simulated density fields.By construction, these lognormal fields have the same power spectrum as the one from the fiducial -body simulations, and the phases of the underlying Gaussian fields are taken from the initial conditions of the simulated fields (all details are reported in Sect.3.1).Having the pairs of lognormal and corresponding simulated density fields, we draw from image-to-image translation techniques based on convolutional neural networks and adversarial training, in order to obtain a model that can map simple lognormal fields to more realistic density fields (see Fig. 1).We extensively validate our model by measuring first-, second-, and higher-order statistics, obtaining good agreement, almost always within 10%, on all scales.We additionally show that we can train our model using simulations run over a latin hypercube of cosmological parameters, obtaining a good generalisation performance over different cosmologies.While a more extensive study for different redshifts and higher resolutions will be explored in future work, these encouraging results indicate that providing the model with a lognormal field as the starting point significantly improves the model's generalisation performance.Additionally, we show that starting from lognormal maps with the correct power spectrum naturally leads to good performance at the power spectrum level, and does not allow the model to "collapse" and learn single modes in the data, a known problem of generative adversarial networks (Metz et al. 2017).
The paper is structured as follows.In Sect. 2 we describe the Quĳote simulation data, on which this work is based.In Sect.3, we detail the procedure that we apply to obtain the training data, and describe the image-to-image translation technique that we employ in this work.In Sect.4, we present the results for different resolutions of the density fields, as well as for different values of redshift and cosmological parameters, and demonstrate the performance of our model through a wide range of statistical tests.We conclude in Sect. 5 with a summary of our work, planned improvements and an outline of possible future applications of our model.

DATA
In this work, we use the Quĳote simulation suite (Villaescusa- Navarro et al. 2020).This set of -body simulations includes 15 000 realisations following 512 3 dark matter particles in a box with comoving length of 1 ℎ −1 Gpc, with the matter density parameter Ω m = 0.3175, the baryon density parameter Ω b = 0.049, the Hubble parameter ℎ = 0.6711, the scalar spectral index  s = 0.9624, the root mean square of the matter fluctuations in spheres of radius 8 ℎ −1 Mpc  8 = 0.834, and the dark energy equation of state parameter  = −1; neutrinos are considered massless.These simulations were  1), for a lognormal random field (red) and an  -body simulation dark matter density field (grey).Middle and right panels: square maps of a lognormal (middle) and  -body (right) density fields, with a side of 512 pixels, corresponding to a comoving length of 1 ℎ −1 Gpc.The depth of these fields is 1.9 ℎ −1 Mpc.In these maps, we clipped the maximum and minimum values before applying a logarithm to reduce their dynamic range; the symbol 'ln' indicates the natural logarithm throughout this paper.The right-hand-side plot is a slice of a simulation from the Quĳote suite (Villaescusa-Navarro et al. 2020), while the middle plot, obtained following the procedure described in Sect.3.1, represents its lognormal counterpart.The goal of this paper is to train a machine learning model (described in Sect.3.2) to transform the lognormal map to the more realistic  -body map, thus improving the statistical power of the fast lognormal approximation.
run using the TreePM code Gadget-III, which is an improved version of Gadget-II (Springel 2005).We consider snapshots of both the initial conditions ( = 127) and today ( = 0), as well the  = 1 snapshot and the latin-hypercube simulations at  = 0 for further validation of our model (see Sect. 4.5).
In each -body simulation, we convert the information on the particles' position to a continuous random field through a mass assignment scheme.We analyse the matter overdensity field (x), defined as: with (x) being the matter density field at each position x, and ρ being the mean density in the volume of the simulation.Following Chaniotis & Poulikakos (2004); Jing (2005); Sefusatti et al. (2016), we consider a regular grid of points in all three directions.The continuous overdensity field is obtained by interpolating the discrete overdensity field on this grid, i.e. by evaluating the continuous function with  (x) being the weight function describing the number of grid points to which every particle is assigned.We choose the piecewise cubic spline interpolation scheme, i.e. we explicitly write the weight function as  (x) =  1D ( 1 /) 1D ( 2 /) 1D ( 3 /), with  being the grid spacing,  1 ( 2 ,  3 ) being the  (, ) direction, and  1D being the unidimensional weight function if 1 ≤ |s| < 2 ; 0 otherwise ; (3) we refer the reader to Sefusatti et al. (2016) for more details.We consider both a grid with  3 high = 512 3 pixels and  3 low = 128 3 pixels, and present the results in Sect.4.3 and Sect.4.4, respectively.

METHOD
Our goal is to obtain 2-D projected density lognormal fields corresponding to slices of the Quĳote simulations, in order to train a model that can take as input a lognormal map and predict a more realistic density field with the same statistics as the simulated one.In the following sections, we describe the procedure that we follow to obtain such a dataset (Sect.3.1), and the machine learning algorithm that we employ to learn the transformation (Sect.3.2).

Obtaining the training data
Since the long-term goal of the project is to increase the accuracy in large-scale structure description of random field maps on the sphere like the ones produced by FLASK (Xavier et al. 2016), we choose to work with slices of the density field rather than the full 3-D boxes.We slice a given box along the third axis, and obtain multiple square density fields from a single simulation (128 in the lowresolution case, and 512 in the high-resolution case); the width of each slice is 1000 ℎ −1 Mpc/128 7.8 ℎ −1 Mpc in the former case, and 1000 ℎ −1 Mpc/512 1.9 ℎ −1 Mpc in the latter case.Our choice of different thicknesses aims to demonstrate that our approach can work at different resolutions and different projection depths.Since we consider 800 simulations in the low-resolution case, and 200 in the high-resolution case, we are left with 102 400 maps in both instances.We also consider the initial conditions of the 3-D boxes, namely the -body simulations at  = 127, which we slice in the same way.
In order to create the lognormal counterpart of the more realistic maps, we start by measuring the power spectrum of each simulation's slice at  = 0, which we wish to impose on the lognormal fields.We recall here that the 2-D matter power spectrum () can be implicitly defined through the Fourier transform (k) of the matter density contrast (x)2 , defined as in Eq. ( 1): We first measure the power spectrum of each slice of the  = 0 boxes (in red in the top panel), which is concatenated with the theory power spectrum obtained using CLASS (Blas et al. 2011; in grey in the top panel).We then generate a lognormal random field with this power spectrum, following Coles & Jones (1991); Percival et al. (2004); the term "Gaussian correlation function" indicates  G ( ) as in Eq. ( 8); the corresponding "Gaussian power spectrum" is obtained using Eq. ( 6).Crucially, when generating the underlying Gaussian field, we use the Fourier phases of the initial conditions of the  -body simulation, which consist of a Gaussian random field at  = 127.In this way, the lognormal field displays increased correlation with the  -body field.The final training data consists of pairs of lognormal (  LN ) and simulated (  SIM ) density fields, with either low (side  low = 128) or high (side  high = 512) resolution, as explained in Sect. 2. The machine learning model employed to learn the mapping from  LN to  SIM is presented in Sect.3.2.
where • denotes an average over the whole Fourier space,  = |k|, and  D (•) indicates the Dirac delta function (Dodelson 2003); this in turn yields the estimator where  modes () is the number of modes in each  bin, and the sum is performed over all k vectors whose magnitude is .The definition in Eq. ( 4) implies that () is the Fourier counterpart of the 2-D matter correlation function  (), with  = |r|, i.e.
where  () is defined as with • representing the average over all locations x in the plane in this case.
In order to generate a lognormal random field with a given power spectrum, we follow the procedure of Coles & Jones (1991); Percival et al. (2004).We start by converting the measured power spectrum to the matter correlation function  LN (), then we calculate the corresponding Gaussian correlation function, transform it back to Fourier space and create a Gaussian random field realisation  G on a grid with this power spectrum and the required resolution ( low or  high ).It is well known that a zero-mean Gaussian field is entirely specified by the given power spectrum, which only depends on the absolute value of the Fourier coefficients: this means that the Fourier phases can be uniformly sampled from the [0, 2] interval (Coles & Chiang 2000;Chiang & Coles 2000;Watts et al. 2003).Crucially, when generating the Gaussian random field, we employ the set of phases of the Gaussian initial conditions of the Quĳote simulation realisation.In this way, the final lognormal density fields will have a high level of correlation with the density fields obtained from the simulations: while the amount of correlation is limited due to the evolution from  = 127 to  = 0, the Pearson correlation coefficient between pairs of maps can be as high as 0.5, if we smooth the fields with a Gaussian kernel on scales of about 50 ℎ −1 Mpc, while it is consistent with 0 if using completely random phases.Therefore, we argue that our choice facilitates learning the mapping from random fields to -body slices.Finally, we obtain the lognormal field  LN by calculating for each grid point where  G is the standard deviation of the Gaussian field.For all these operations we employ the P package (Hand et al. 2018).A flowchart representing the steps followed to produce the training data is reported in Fig. 2.
We observe two limitations due to the fact that we measure the power spectrum from a finite-resolution grid.First, by relying on the boxes only, we are capable of surveying only a limited range in , namely no larger than  ∈ [0.025 ℎ Mpc −1 , 1 ℎ Mpc −1 ] in the highresolution case, and  ∈ [0.025 ℎ Mpc −1 , 0.3 ℎ Mpc −1 ] in the lowresolution case.In order to access larger scales (i.e.lower  values), we concatenate the measured power spectrum with the theoretical one obtained with CLASS (Blas et al. 2011) for  ∈ [10 −5 ℎ Mpc −1 , 0.025 ℎ Mpc −1 ]: this makes the procedure outlined in the previous paragraphs more stable numerically.
Second, we observe a mismatch in power in the lognormal fields with respect to the imposed power spectrum.We attribute this discrepancy to the fact that when converting the Quĳote initial conditions (obtained using second-order Lagrangian perturbation theory) to a density field, the mass assignment scheme and non-vanishing non-linearities arising from perturbation theory introduce extra spurious correlation in the phases.We correct for this effect, which actually introduces non-Gaussian features and is more pronounced at higher resolution, by iteratively rescaling the input power by the ratio of the output and target power at each , until the mismatch across a sample of 100 random maps is smaller than 1% at all  values.We checked that this iterative adaptation scheme effectively removes the power mismatch, leaving the final performance of the model unaffected at prediction time: the results presented in Sect. 4 do not change significantly if, after training the model, we give it as an input a slice of a 3-D lognormal field generated with completely random Fourier phases.We further remark that this correction would not be necessary if we had access to a perfectly Gaussian density field of the initial conditions.
We are left with pairs of square density field maps (dubbed  LN and  SIM ), which we use as the training (80%), validation (10%) and test (10%) data, further discussed in the next section.This split is done at simulation level, so that the test, validation and training datasets are completely independent.Note that to reduce the correlations between slices coming from the same simulation cube we shift the pixels by a random amount along both the first and second axis, independently for each pair of maps, assuming periodic boundary conditions.It could also be possible to randomly rotate and flip the slices in order to augment the training data; while we found it is not needed in our setup, we defer further investigations to future work.Before feeding the pairs into the neural network architecture described in the next section, we additionally preprocess each map by calculating ln (1 + ) to decrease the dynamic range of each density value .

Image-to-image translation
As discussed in Sect. 1, machine learning generative techniques have extensively been applied to -body simulations.In this work, we aim at mapping lognormal fields to more realistic fields, hence we employ the pix2pix network structure, first proposed in Isola et al. (2017).The model is composed of two parts, as sketched in Fig. 3; all implementation details are reported in Appendix A. The first part is a U-net (Ronneberger et al. 2015), which takes as an input a lognormal map  LN , obtained and preprocessed as described in Sect.3.1.The map is passed through various convolutional layers to yield a compressed feature map, which is then upsampled back to the original resolution.Crucially, these upsampling steps are concatenated with the corresponding downsampled feature maps, which allow various scales to be accessible in the output map; removing these skip connections significantly impairs the performance of the model.We call the output map the generated map  GEN .
We want the generated map to carry the same statistical information as the  SIM density field.We tested that minimising a simple ℓ 1 or ℓ 2 norm between  GEN and  SIM is not sufficient to yield accurate results.For this reason, following Isola et al. (2017), we employ a second convolutional block as a discriminator, and express the loss in the framework of adversarial training.In the standard GAN framework (Goodfellow et al. 2014), the generator network  is trained together with the discriminator network  until an equilibrium where neither  or  can improve their performance is reached: while  attempts to generate realistic images,  tries to distinguish between real and fake examples.Since we found this framework to be particularly unstable during training, we actually implemented the Wasserstein GAN with gradient penalty (WGAN-GP, Arjovsky et al. 2017;Gulrajani et al. 2017), which we found superior both in performance and training stability.In this framework, a generator  is trained alongside a critic  to minimise the following cost function: where  GEN =  ( LN ), E  GEN and E  SIM indicate the expectation value over samples of the generated and simulated maps (usually estimated through sample averages), respectively, δ represents a linear combination of  GEN and  SIM3 , || • || 2 indicates the ℓ 2 norm, and  1 and  2 are two positive hyperparameters that allow us to tune the amount of regularisation given by the gradient penalty and the ℓ 2 norm, respectively.In short, Eq. ( 10) indicates that we wish to minimise the Wasserstein-1 (or earth mover) distance between the real data and generated data distributions, while constraining the gradient of the critic network to be close to unity; this is needed since the formulation of the Wasserstein distance as in the first two terms of Eq. ( 10) only holds when the critic is a 1-Lipschitz function, i.e. when its gradient is bound (see Gulrajani et al. 2017, for more details).We observe that in the standard WGAN-GP formulation  2 = 0, while in our case we found it key to minimise the ℓ 2 norm between simulated and generated maps as well in order to obtain improved results.
To train the networks, we use the Adam optimiser (Kingma & Ba 2015) with learning rate 10 −5 ; we set the additional Adam hyperparameters  1 = 0 and  2 = 0.9, following Gulrajani et al. (2017), and  2017), we have two convolutional neural networks, the generator (bottom left) and the critic (top right).We feed the lognormal maps through the generator, which is a U-net (Ronneberger et al. 2015), that first downsamples and then upsamples each image using various convolutional layers, with all details reported in Appendix A. To improve the performance of the model, each upsampling step is concatenated with the output of a downsampling step, as indicated by the dashed lines (skip connections).
The output of the generator, dubbed  GEN , is then compared with the target data  SIM by the critic network, which is again made of various convolutional layers, ending with a dense layer in order to have a single output.We chose this architecture based on those described in the literature (e.g.Isola et al. 2017); a full investigation over different architecture designs is beyond the scope of this paper.The critic and generator networks are trained together, minimising the loss function of Eq. ( 10).Note that in addition to the standard adversarial loss, we include a penalty term in the form of the mean squared error between the generated and target maps, which we found to significantly improve the performance of our model; this is indicated by the short-dashed lines (identity).
refer the reader to Kingma & Ba (2015) and Gulrajani et al. (2017) for more details.We feed the data in batches of 32 pairs at each iteration, and train our model for 10 epochs (150 in the low-resolution case), where each epoch consists of feeding the entire training set through the network.For each batch, we update the critic parameters  critic = 10 times, and the generator parameters only once.Multiple iterations of the critic are usually set to ensure its optimality while still allowing the generator to learn (Arjovsky et al. 2017;Gulrajani et al. 2017); in our work, we only explored  critic = 5 and  critic = 10, and used the latter since it showed slightly better results.Each epoch takes about 0.5 h (4 h) for the low (high) resolution case, on a Tesla P100 GPU; after training, mapping a lognormal map through the generator takes O (1 s) on the same hardware, and can be efficiently done in batches.
We save the model after each epoch.In order to select the best model amongst the saved ones, for each of them we run the statistical tests described in Sect.4.2, and measure the mean percentage difference between the target and predicted maps for randomly-sampled maps of the validation set.The best model is chosen as the one which minimises the sum of the mean percentage differences over all tests; the results are then shown on maps from the test set.In the highresolution case only, we actually found that the trained model can generate maps whose power spectrum is significantly different (more than 10%) from the input and target ones, which we attribute to instabilities of the WGAN-GP framework; we show one such example in Appendix B. For this reason, we propose a ranking system that takes all the predictions from the test set, and orders them based on the mean difference between the input power spectrum and the predicted power spectrum, since they must match by construction.We select the best 100 maps according to this metric, and discuss possible ways to make the model more stable in Sect. 5. We envision that in a realistic scenario where an arbitrary number of lognormal maps can be generated with the goal of producing  augmented fields, one could iteratively generate augmented lognormal maps through our model and discard those whose precision is below a desired threshold, until all  maps are produced.
We show the results of our best models in the next section.These models are found with  1 = 100 and  2 = 10; we defer a full grid search over these hyperparameters to future work.

Lognormal
Prediction N-body  1, with the prediction of our model (middle) given the lognormal field.In these maps, we clipped the maximum and minimum values before applying the logarithm to reduce their dynamic range.The model is described in Sect.3.2.We remark that we are not interested in an exact match of the middle and right panels, as we explain in Sect.4.1, and thoroughly test that the predicted fields carry the same statistical information as the  -body maps from Sect.4.2.

RESULTS
In this section, we validate the performance of the trained model by comparing the statistics of the generated and simulated maps.In Appendix C, we also show that our model is not affected by mode collapse.

Qualitative comparison
While the appearance of the maps is irrelevant for the purpose of our statistical analysis, a visual inspection is nonetheless useful to intuitively understand whether our model is on the right track to learn the -body features.In Fig. 4, we show a lognormal map, its -body counterpart and the prediction of our model given the lognormal map for the high-resolution case.Our goal is not to obtain an exact visual match between the model's prediction and the -body map, given that we used the random phases of the  = 127 simulations, which only have partial correlations with the  = 0 slices.For the applications we focus on (discussed in Sect.5), the actual position of peaks and voids in the lognormal map is irrelevant, since it is dictated by the random sampling of the phases: we only aim to generate maps which carry the same statistical signal as the -body maps on average, improving on the lognormal approximation.We observe that while the predicted field does not match the -body pattern pixel by pixel, the model has learnt the correct morphology of the large-scale structure on top of the lognormal field.

Statistics
While visual inspection of the generated maps against the target ones is a necessary zeroth-order test to provide intuition on whether the model was adequately trained, it is then fundamental to compare the summary statistics of interests and carefully quantify their agreement.We compare the generated and simulated fields through four different summary statistics, which we briefly describe here.

Pixel counts
The first test consists of binning the pixels of the generated and target density fields into a histogram.While the lognormal distribution is a good approximation of the simulated fields, there is a significant difference between the two (see e.g.Fig. 1).We show in panel (a) of Fig. 5 and Fig. 6 the performance of our model with respect to the pixel counts for high and low resolution, respectively.

Power spectrum
We compare the power spectrum as defined in Eq. ( 4) for the simulated maps and the ones predicted by our model given the lognormal maps.While it could be argued that this is a trivial task (given that the input and output maps have the same power spectrum by construction), it is not obvious that our model does not modify the power spectrum information while learning the new density distribution, and as anticipated at the end of Sect.3.2 we actually found some failures of the trained model which yield discrepant power spectra in the high-resolution case.We therefore compute the power spectra and show the results in panel (b) of Fig. 5 and Fig. 6, for high and low resolution, respectively.Since -body simulations are mainly used to associate a covariance matrix to real measurements, we have also computed power spectrum covariance matrices for all datasets and models we considered.These are very similar for all cases, and while we do not show them for brevity, they further validate the performance of our model.

Bispectrum
To probe the non-Gaussian features of the density fields, we measure the matter bispectrum of the maps, i.e. the counterpart of the three-point matter correlation function in Fourier space.The matter bispectrum ( 1 ,  2 ,  3 ) for a 2-D field is defined implicitly as (see e.g.Sefusatti et al. 2006): where (k) indicates the Fourier transform of the matter overdensity (x),   = |k  |, and all k  vectors are in the plane of the simulation box slices.To further assess that our model correctly captured the information beyond the power spectrum, we also measure the reduced matter bispectrum ( 1 ,  2 ,  3 ), see e.g.Liguori et al. (2010), defined as: We measure bispectra and reduced bispectra for different configurations depending on the resolution; different triangle configurations usually probe different inflationary models (Liguori et al. 2010), and one must include as many configurations as possible to break degeneracies when inferring cosmological parameters (Bergé et al. 2010).Moreover, different bispectra configurations can shed light on the size of collapsing regions, as well as on the relative position of clusters and voids in the large-scale structure (Munshi et al. 2020).

N-body
We calculate bispectra and reduced bispectra based on an estimator of the binned bispectrum (see e.g.Bucher et al. 2016); we consider the centroid of each bin as the value of k at which the bispectrum is evaluated.We report the results in panels (d)-(g) of Fig. 5 and Fig. 6 (for high and low resolution, respectively) as a function of the angle  between the vectors k 1 and k 2 .

Peak counts
To further assess whether the model has correctly learnt the most non-Gaussian features of the simulated density fields, we verify that the peak counts of the generated and target maps match within the error bars.A peak is defined as a density pixel which is higher than the 8 surrounding pixels.Peak count statistics have been shown to carry significant cosmological information, especially in weak lensing studies, as they trace the most dense regions (Pires et al. 2009(Pires et al. , 2012;;Dietrich & Hartlap 2010;Marian et al. 2011;Mainini & Romano 2014;Lin & Kilbinger 2015a,b;Lin et al. 2016;Kacprzak et al. 2016;Shan et al. 2018;Martinet et al. 2018;Harnois-Déraps et al. 2021).We bin the peak values for both the simulated and target maps, and compare them in panel (c) of Fig. 5 and Fig. 6, for high and low resolution, respectively.

High resolution
In Fig. 5 we compare the performance of the predictions of our model against the target maps, for the case with 512 2 pixels.We run the statistical tests on 100 maps sampled from the test set as described at the end of Sect.3.2; the solid lines show the mean values and the dashed areas represent the error on the mean.
In panel (b), we show that the trained model is capable of preserving the correct power spectrum on all scales from 0.025 ℎ Mpc −1 to 1 ℎ Mpc −1 , with percentage differences going no higher than 3%, and always within the error bars.At the same time, the model improves on the lognormal approximation as far as the pixel counts and peak counts are concerned, with however significant differences in particular for  < 0 in the latter case.We believe that the performance in this case could be ameliorated by e.g.exploring different network architectures.In panels (d)-(g), we show the results for the (reduced) bispectrum, for  1 = 0.4 ℎ Mpc −1 ,  2 = 0.6 ℎ Mpc −1 -panels (d) and (e) -and for  1 = 0.4 ℎ Mpc −1 ,  2 = 0.4 ℎ Mpc −1 -panels (f) and (g).The performance is very good overall, with the percentage difference between target and predicted maps being within the error bars except for a few individual values of , significantly improving on the lognormal approximation.We observe that the model's performance is almost always within the 5% range, except for the bispectra, where significant differences are present at high ; we discuss these discrepancies in Sect.4.4.Despite these differences, our model still outperforms the lognormal approximation.

Low resolution
In Fig. 6 we compare the performance of the predictions of our model against the target maps, for the case with 128 2 pixels.We use 100 randomly-sampled maps from the test set.We observe good agreement between predicted and target maps for the pixel counts, power spectrum and peak counts statistics, with the power spectrum in particular being almost always within 2%.As far as the bispectra are concerned, we consider two configurations, one with and (e) -and for  1 = 0.2 ℎ Mpc −1 ,  2 = 0.2 ℎ Mpc −1 -panels (f) and (g).Since the fields have a lower resolution, the scales we probe are larger than in Sect.4.3.We observe a good agreement overall, except at high : we argue that to improve the performance of the model we could use the same ranking approach based on the power spectrum as in the high-resolution case.

Redshift and cosmology dependence
So far, we have shown the performance of our model on a given fiducial set of cosmological parameters and redshift, from which the training data were obtained.However, in order for the method to become practical, it is critical to assess whether the performance degrades when the model is tested on lognormal maps obtained with a different cosmology or at different .Examples of good generalisation properties of machine learning models applied to cosmological We checked that the performance of our model does not degrade much when acting on fields with slightly different (within 2%) values of Ω m and  8 , even though a more complete analysis on bigger variations is required.We additionally verified that feeding our model, trained using maps at  = 0, with lognormal maps at  = 0.5 or  = 1.0 does not yield satisfactory results, with percentage errors going well above 50%.This failure is not unexpected: the different dynamic range of the lognormal maps at different redshifts highlights that our model is not capable of extrapolating to such different input values.
To overcome these limitations, we use the Quĳote simulations run on a latin hypercube of the cosmological parameters, which are publicly available together with the simulations run at the fiducial cosmology (Villaescusa- Navarro et al. 2020).We consider 800 of such simulations at low resolution  low = 128 and  = 0, and repeat the procedure described in Sect.3.1 to generate a dataset of highly-correlated pairs of lognormal and -body fields.For each of the first 700 simulations, we keep 90 slices for training, 19 for validation and 19 for testing (not used in this instance), and train the same model described in Sect.3.2 for 150 epochs.Note that we do not provide explicitly the label corresponding to the different cosmological parameters during training.We select the best model according to the best performance on the summary statistics over the validation set, as in the fiducial case.
We test the best model by applying it to 100 lognormal maps gen- erated at the fiducial cosmology (with results shown in Appendix D), as well as at the cosmological parameters of one random simulation from the test set (Ω m = 0.1987, Ω b = 0.0446, ℎ = 0.5601,  s = 1.1707,  8 = 0.7665) in Fig. 7.These results, while slightly worse than those presented in Fig. 6 since e.g. the power spectrum shows up to 5% discrepancies, indicate that the model trained on the latin-hypercube simulations has a good generalisation performance, which extends to cosmologies that were never shown at training or validation time.We plan to extend these encouraging results to higher resolutions and different redshifts in future work.

N-body
Other possible solutions to obtain a good generalisation performance include normalising the maps, either after or before training the model.For instance, one could rescale the field at  = 1 through the linear growth factor (Eisenstein et al. 1999) to  = 0, and then invert this transformation after feeding the map through the generator trained as above.However, this approach would ignore the non-linear scales, and directly dividing each pixel value by the linear growth factor could lead to unphysical fields with  < −1.Alternatively, instead of the dark matter overdensity field, we could consider the corresponding peak height field, calculated by measuring for each pixel 1.686/(), where () is the mass enclosed within a given scale; since the peak height is known to extrapolate better, having a weaker dependence on cosmological parameters (Press & Schechter 1974;Bond et al. 1991;Percival 2005;Kravtsov & Borgani 2012), we expect a model trained on this field to have an improved generalisation performance.
Finally, we show that with our current setup we can successfully train a second model on data generated as described in Sect.3.1 with  ≠ 0. We show the results for a model trained on fields at  = 1, which have a lower contrast and less non-linearity than  = 0, in Fig. 8 for the low-resolution case, with good performance overall.We also expect that it would be possible to train a conditional model by providing the redshift 'label' together with the input lognormal map, thus obtaining a conditional WGAN-GP (see e.g.Mirza & Osindero 2014;Yiu et al. 2021); such a model could be trained e.g. on maps with  = 0 and  = 1, and then used to predict maps at  = 0.5, similarly to Chen et al. (2020).All these points indicate that it will be possible to make our model conditional on  and different cosmological parameters; we defer these studies to future work.

CONCLUSIONS
In this paper, we employed the Quĳote simulations as a starting point to train a machine learning model that is capable of transforming projected lognormal realisations of the dark matter density field to more realistic samples of the dark matter distribution.We employed stateof-the-art image-to-image translation techniques, combining convolutional neural networks and adversarial training, to learn such a model, and extensively validated its performance through a thorough set of statistical tests.We observed a significant reduction in the error of non-Gaussian features like peak counts and bispectra, from tens of percent for the pure lognormal model to no more than 10% obtained by our model in most cases; the latter frequently shows an order of  magnitude improvement over the former.Furthermore, the mapping is extremely fast, taking O (1 ) on a single GPU.
In order to avoid running large suites of -body simulations, the proposed method has to generalise well to other redshifts and cosmologies.We demonstrated that it is possible to train a model on simulations run over a latin hypercube of cosmological parameters and have good performance on the fiducial cosmology as well as on unseen cosmologies.We outlined a few promising avenues to investigate in order to extend these results to different redshifts and to higher resolutions.Moreover, while in this work we trained different models for different resolutions of the density field, we also expect an improved model to be able to deal with a varying slice thickness.
We plan to extend this work to random fields on the sphere, and integrate it into the FLASK package developed in Xavier et al. (2016).We aim to extend our approach to spherical random fields by iteratively applying our model to square patches of the sky, thus providing the community with a tool to quickly generate realistic dark matter realisations that overcome the limitations of the lognormal approximation.We also plan to compare our approach to a direct generation of spherical fields by means of spherical convolutional layers, as proposed e.g. for mass maps in Yiu et al. (2021).
Additionally, we believe that the image-to-image technique outlined in this paper could be applied to augment analytical approximations to -body simulations (like L-PICOLA, Howlett et al. 2015, or FastPM, Feng et al. 2016), as well as semi-analytic models of galaxies, which, in the same vein as lognormal random fields, provide a fast approximation to hydrodynamical simulations by modelling complicated baryonic processes (White & Frenk 1991;Kauffmann et al. 1993;Cole et al. 1994;Somerville & Primack 1999;Lacey 2001).In such instances, one could e.g.train a model to learn the mapping between an -body simulation augmented with semi-analytical models and the corresponding hydrodynamical simulation.We further plan to explore the possibility to employ the dataset described in this work to reduce the variance in the statistics of large-scale structure observables using a small number of expensive simulations (Chartier et al. 2021;Chartier & Wandelt 2021;Ding et al. 2022), as well as to replace our WGAN-GP model with either a possibly more stable GAN version (Kwon et al. 2021), or with a more compact model, like the one proposed in the context of Lagrangian deep learning (LDL, Dai & Seljak 2021), using graph neural networks (GNNs, see e.g.Zhou et al. 2020 for a review) or through normalising flows (e.g.FFJORD, Grathwohl et al. 2019, or more recently TRENF, Dai & Seljak 2022).This will be investigated in future work.Zhou J., et al., 2020, AI Open, 1, 57

APPENDIX A: MODEL ARCHITECTURE
In its basic formulation, a layer in a convolutional neural network (CNN; see e.g.Fukushima 1980;Krizhevsky et al. 2017) is made of a certain number of square filters, each associated to learnable parameters, usually called weights.During training, each filter is convolved through each input data-point: this means that the dot product of the learnable weights and the input pixels is calculated, representing a single output for that particular filter.Repeating this operation while moving the filter across the input data creates an output map, which is then passed through an activation function to introduce non-linearities in the network.This operation is done for multiple filters, and each output map becomes the input to the following convolutional layer.Stacking convolutional layers allows one to extract progressively larger scales from the input data, and represents a more efficient implementation of a neural network with respect to standard dense layers when dealing with high-dimensional data like images (Le Cun et al. 1989;Goodfellow et al. 2016).
As anticipated, our model, depicted in Fig. 3, consists of two neural networks.The first neural network (the generator) contains four downsampling blocks, followed by four upsampling blocks.Each downsampling block first pads the input data assuming periodic boundary conditions, and then applies a convolution operation with 4x4 filters.There are 64 convolutional filters in the first place, and this number doubles for each block.Note that no pooling layers are present (Yamaguchi et al. 1990), and we are able to reduce the dimensionality of the extracted feature maps by shifting each filter by two pixels in both directions; in other words, we set a stride of 2. The compressed map is then symmetrically upsampled using the transposed convolution operation (Dumoulin & Visin 2016).At each block, each feature map is concatenated with the corresponding downsampled map by simply stacking them along the last spatial axis; this is done in order to better learn the representations at each level (Ronneberger et al. 2015).The activation function used after each downsampling layer is the rectified linear unit (ReLU, Glorot et al. 2011), while for the upsampling blocks we found the leaky ReLU (Maas et al. 2013) with  = 0.3 to perform better.A final convolutional layer with linear activation function outputs the generated map  GEN .Note that all downsampling and upsampling blocks include batch normalisation (Ioffe & Szegedy 2015), which during training subtracts the batch mean and divides by the batch standard deviation, in order to make the training procedure more stable.The second neural network is done similarly, with three downsampling blocks followed by two convolutional layers with Leaky ReLU as the activation function, and a final dense layer with a single output and a linear activation function.Input and output shapes for each layer are reported in Table A1.We implement our neural networks in T F (Abadi et al. 2015), and will make the trained models available upon acceptance of this work.

APPENDIX B: HIGH-RESOLUTION MODEL FAILURES
In Fig. B1, we show an example field in the high-resolution case for which the prediction has a power spectrum with an average 20% disagreement with respect to the expected one.We attribute such problems to instabilities in the WGAN-GP model we considered, and describe a possible ranking system that addresses this problem in Sect.3.2.

APPENDIX C: MODE COLLAPSE
As we explained in Sect. 1 and Sect.3.2, our choice of providing the model a lognormal field as input, as well as the choice of the WGAN-GP loss function, should prevent the model from memorising the training set, or only focus on single modes of the data.Following Mustafa et al. (2019), we provide evidence that the generator is capable of producing diverse maps and is not affected by mode collapse.
In Fig. C1, we show three random predictions from the test set, and the closest map in the training set according to pixel-wise distance; we focus on low-resolution data only to limit the computational cost.Despite showing similar texture, and having summary statistics in agreement (not shown for brevity), the maps are clearly different, thus confirming that our model is immune to mode collapse.

APPENDIX D: LATIN-HYPERCUBE MODEL APPLIED TO FIDUCIAL COSMOLOGY
In Fig D1 we show the summary statistics results for the model trained on latin-hypercube simulations and applied to fields at the fiducial cosmology, as described in Sect.4.5.The redshift is fixed at  = 0.

Figure 1 .
Figure 1.Left panel: histograms of the matter overdensity , defined in Eq. (1), for a lognormal random field (red) and an  -body simulation dark matter density field (grey).Middle and right panels: square maps of a lognormal (middle) and  -body (right) density fields, with a side of 512 pixels, corresponding to a comoving length of 1 ℎ −1 Gpc.The depth of these fields is 1.9 ℎ −1 Mpc.In these maps, we clipped the maximum and minimum values before applying a logarithm to reduce their dynamic range; the symbol 'ln' indicates the natural logarithm throughout this paper.The right-hand-side plot is a slice of a simulation from the Quĳote suite (Villaescusa-Navarro et al. 2020), while the middle plot, obtained following the procedure described in Sect.3.1, represents its lognormal counterpart.The goal of this paper is to train a machine learning model (described in Sect.3.2) to transform the lognormal map to the more realistic  -body map, thus improving the statistical power of the fast lognormal approximation.

Figure 2 .
Figure2.Flowchart of the steps to create the training data, as described in Sect.3.1.We first measure the power spectrum of each slice of the  = 0 boxes (in red in the top panel), which is concatenated with the theory power spectrum obtained using CLASS(Blas et al. 2011; in grey in the top panel).We then generate a lognormal random field with this power spectrum, followingColes & Jones (1991);Percival et al. (2004); the term "Gaussian correlation function" indicates  G ( ) as in Eq. (8); the corresponding "Gaussian power spectrum" is obtained using Eq.(6).Crucially, when generating the underlying Gaussian field, we use the Fourier phases of the initial conditions of the  -body simulation, which consist of a Gaussian random field at  = 127.In this way, the lognormal field displays increased correlation with the  -body field.The final training data consists of pairs of lognormal (  LN ) and simulated (  SIM ) density fields, with either low (side  low = 128) or high (side  high = 512) resolution, as explained in Sect. 2. The machine learning model employed to learn the mapping from  LN to  SIM is presented in Sect.3.2.

Figure 3 .
Figure 3.A representation of the generative model employed in this work, as described in Sect.3.2.Following Isola et al. (2017), we have two convolutional neural networks, the generator (bottom left) and the critic (top right).We feed the lognormal maps through the generator, which is a U-net(Ronneberger et al. 2015), that first downsamples and then upsamples each image using various convolutional layers, with all details reported in Appendix A. To improve the performance of the model, each upsampling step is concatenated with the output of a downsampling step, as indicated by the dashed lines (skip connections).The output of the generator, dubbed  GEN , is then compared with the target data  SIM by the critic network, which is again made of various convolutional layers, ending with a dense layer in order to have a single output.We chose this architecture based on those described in the literature (e.g.Isola et al. 2017); a full investigation over different architecture designs is beyond the scope of this paper.The critic and generator networks are trained together, minimising the loss function of Eq. (10).Note that in addition to the standard adversarial loss, we include a penalty term in the form of the mean squared error between the generated and target maps, which we found to significantly improve the performance of our model; this is indicated by the short-dashed lines (identity).

Figure 4 .
Figure 4.The lognormal (left) and  -body (right) density fields as in Fig.1, with the prediction of our model (middle) given the lognormal field.In these maps, we clipped the maximum and minimum values before applying the logarithm to reduce their dynamic range.The model is described in Sect.3.2.We remark that we are not interested in an exact match of the middle and right panels, as we explain in Sect.4.1, and thoroughly test that the predicted fields carry the same statistical information as the  -body maps from Sect.4.2.

Figure 5 .
Figure5.Comparison of the statistical tests described in Sect.4.2 for the lognormal (  LN , in red),  -body (  SIM , in grey), and predicted (  GEN , in cyan) maps, considering a resolution of  high = 512.The performance is measured at the bottom of each panel by calculating the relative difference of  -body against predicted and lognormal maps (dashed lines).All solid lines indicate the mean values over 100 maps, and the error bars represent the error on the mean (or propagated error, in the case of the relative differences).We observe that, except for the range  < 0 in panel (a) and (c) and some individual  values in panels (d)-(g), the prediction always matches the target statistics within the error bars, performing significantly better than the lognormal approximation.

Figure 6 .
Figure6.Same as Fig.5, with a lower field resolution  low = 128.The solid lines indicate the mean values over 100 maps.We observe that the model's performance is almost always within the 5% range, except for the bispectra, where significant differences are present at high ; we discuss these discrepancies in Sect.4.4.Despite these differences, our model still outperforms the lognormal approximation.

Figure 7 .
Figure7.Same as Fig.6, but for a model trained on latin-hypercube simulations and applied to a different cosmology, as described in Sect.4.5.Except at low values of  in panel (a), the results show good agreeement between the model predictions and the target  -body fields, demonstrating that our model has good generalisation performance across different cosmologies.

Figure 8 .
Figure 8. Same as Fig.6, for a different model trained on data at redshift  = 1.We observe a good overall performance of our model, which generally outperforms the lognormal approximation.

Table A1 .
Size of each layer's output in the generator and the critic neural networks, detailed in Sect.3.2 and Appendix A, for the high-resolution case.The low-resolution architecture is built analogously.