Glass-Like Random Catalogues for Two-Point Estimates on the Light Cone

We introduce grlic, a publicly available Python tool for generating glass-like point distributions with a radial density profile $n(r)$ as it is observed in large-scale surveys of galaxy distributions on the past light cone. Utilising these glass-like catalogues, we assess the bias and variance of the Landy-Szalay (LS) estimator of the first three two-point correlation function (2PCF) multipoles in halo and particle catalogues created with the cosmological N-body code gevolution. Our results demonstrate that the LS estimator calculated with the glass catalogues is biased by less than $10^{-4}$ with respect to the estimate derived from Poisson-sampled random catalogues, for all multipoles considered and on all but the smallest scales. Additionally, the estimates derived from glass-like catalogues exhibit significantly smaller standard deviation $\sigma$ than estimates based on commonly used Poisson-sampled random catalogues of comparable size. The standard deviation of the estimate depends on a power of the number of objects $N_R$ in the random catalogue; we find a power law $\sigma \propto N_R^{-0.9}$ for glass-like random catalogues as opposed to $\sigma \propto N_R^{-0.48}$ using Poisson-sampled random catalogues. Given a required precision, this allows for a much reduced number of objects in the glass-like random catalogues used for the LS estimate of the 2PCF multipoles, significantly reducing the computational costs of each estimate.


INTRODUCTION
The large-scale spatial distribution of galaxies contains crucial information about the fundamental physics and the evolution of the Universe.For this reason, in the past decades substantial effort has been put into analysing its statistical properties in order to extract as much of that information as possible with the tools at hand (Davis & Peebles 1982;Vogeley et al. 1992;Feldman et al. 1994;Maddox et al. 1996;Efstathiou et al. 2002;Peacock et al. 2001;Tegmark et al. 2004;Cole et al. 2005;Tegmark et al. 2006;Wang et al. 2013;Shi et al. 2016).Common summary statistics of interest in these analyses are the two-point correlation function (2PCF) and its Fourier space counterpart, the power spectrum.
The 2PCF, commonly denoted as  ( ì ), quantifies the excess probability of finding an object (e.g., a galaxy or a dark matter halo) at a position ì  + ì  separated by ì  from another object at position ì , relative to the ensemble-average background number density n: where ( ì ) is the number-density contrast of the objects, with ( ì ) the number density at position ì .In other words, the two-point correlation function of the galaxy distribution captures its clustering properties by quantifying the correlation between pairs of galaxies at different separations.
★ E-mail: sebastian.schulz@uzh.chSpecifically, the 2PCF has been used to extract the comoving length scale of the sound horizon during the cosmic period known as matter-radiation decoupling.At that time, baryon-acoustic oscillations (BAO) were imprinted into the matter distribution (and consequently into today's galaxy distribution).This imprint is measurable in the galaxy 2PCF in the form of a peak at ≈ 100 Mpc/ℎ, which can act as a "standard ruler" and is hence useful for understanding the expansion history of the Universe (Eisenstein et al. 2005;Blake et al. 2012;Percival et al. 2010;Alam et al. 2017Alam et al. , 2021)).
There is more information to be extracted from the 2PCF by taking into account redshift space distortions (RSD) that mainly arise from the peculiar motion of the galaxies in their gravitational potential well (Kaiser 1987).The RSD then provide a way for measuring the growth rate of structure (Guzzo et al. 2008;Beutler et al. 2012;Reid et al. 2012;de la Torre et al. 2013;Pezzotta et al. 2017;Zarrouk et al. 2018;Hou et al. 2018;Ruggeri et al. 2019), which in turn provides insights into the laws of gravity on large scales.
Future galaxy surveys such as the space based Euclid survey (Laureĳs et al. 2011;Amendola et al. 2018) or the ground based DESI survey (Aghamousa et al. 2016;Levi et al. 2019) aim to increase the accuracy and precision of the inferred cosmological parameters by quantifying the clustering statistics of galaxies in much larger volumes with accurate measurements of the redshifts of tens of millions of galaxies.
In order to estimate the galaxy 2PCF, it is generally assumed that the observed galaxy distribution represents a Poisson sample of some underlying galaxy density field (Peebles 1980).Given the point distribution of observed galaxies, it is natural to use estimators based on pair counts.Many estimators of this kind have been proposed over the years (Peebles & Hauser 1974;Hewett 1982;Davis & Peebles 1982;Hamilton 1993;Landy & Szalay 1993).Among those, the most commonly used for the 2PCF of large-scale structure is the Landy-Szalay (LS) estimator (Landy & Szalay 1993), which is defined in its general form for the cross-correlation between two data catalogues  1 and  2  LS (, ) =  1  2 (, ) −  1  2 (, ) −  1  2 (, ) +  1  2 (, ) introducing two random catalogues  1 and  2 with the same radial density profile and survey mask as the corresponding data catalogues.
For autocorrelations within one data catalogue,  1 and  2 are identical.In Eq. ( 3),  is the absolute value of the separation between a pair of objects within the catalogues and  is the cosine of the angle between the line of sight to the pair ì  and the separation vector ì .The  1  2 ,  1  2 ,  1  2 and  1  2 correspond to histograms of pair counts between the data (  ) and random (  ) catalogues, binned in  and  and normalised by the total number of pairs between the respective two catalogues.In general, the normalised histogram of pair counts between two catalogues  and  is given by (, ) = (, ) where in our case ,  ∈ { 1 ,  2 ,  1 ,  2 }.Here, (, ) is the number of pair counts between catalogue  and catalogue  in the respective (, ) bin,   is a Kronecker-Delta which is 1 if  =  and 0 if  ≠ .Hence, for cross-correlations of two different data catalogues  1 and  2 , the   term vanishes, but for autocorrelations, where  = , it is included because in this case pairs are usually not double-counted.The same arguments apply for the     pairs, while for the     pairs the    term always vanishes, since those pairs are always counted between two separate catalogues,  ≠ .
When the correlations are small ( ≪ 1), which is true for the galaxy distribution of the Universe on large scales, the LS estimator has the lowest bias and variance of all possible estimators based on pair counts if a very densely sampled random catalogue which follows the same redshift distribution as the data catalogue is provided (Landy & Szalay 1993;Kerscher et al. 2000;Keihänen et al. 2019).In this case, the variance of the estimator is almost Poisson.Commonly the number of objects in the random catalogue exceeds the number of objects in the data catalogue by a factor of order ten or higher to ensure this criterion is fulfilled in order to keep the bias and variance small (Samushia et al. 2012;de la Torre et al. 2013;Sanchez et al. 2017;Bautista et al. 2020).We note that at smaller scales (below ≈ 10 Mpc/ℎ), the galaxy 2PCF can become of order one, in which case the LS estimator is not the optimal estimator anymore.To account for this, Vargas-Magana et al. (2013) have derived a new optimal estimator that remains unbiased on smaller scales as well.
The random catalogue serves to approximate the pair counts in a homogeneous distribution, which are then used together with the pair counts in the data distribution and the pair counts across data-and random catalogues to derive an estimator for the 2PCF.A common choice for the random catalogue for large-scale galaxy surveys is to Poisson-sample from a distribution that is uniform in angular space and follows the redshift distribution of the data.A survey mask is taken into account by setting the number of objects in the random catalogue to zero outside of the mask.
We note that commonly the uncertainty in the 2PCF for galaxy clustering observations is dominated by sample variance, originating from the fact that the 2PCF in one region of the sky can differ from the 2PCF in another region of the sky, and from shot noise due to the data points being Poisson-sampled from the underlying density distribution.Here we focus instead on the uncertainty in the estimator arising from the statistical fluctuations in the number of data-random and random-random pairs, neglecting the sample variance.Keeping the variance of the estimator as small as possible is desired, so that the only remaining dominant contribution to the variance is the sample variance.
The variance of the LS estimator is dominated by the density fluctuations of the random catalogue (Keihänen et al. 2019).The amplitude of these fluctuations is given by the power spectrum, which, in case of a Poisson-sampled catalogue, is () = 1/ n, where  = 2/ is the absolute value of the wave vector of the Fourier-space fluctuations with wavelength .The variance within a sphere of radius  is related to the power spectrum (Gabrielli et al. 2002): where  ( ) is the Fourier transform of a top-hat window function, For the Poisson sample, which has a constant power spectrum, the variance goes like  2 () ∝  −3 .Increasing the number of objects in the random catalogues will decrease the amplitude of the power spectrum and thereby the variance of the LS estimator, but this comes at a significant additional computational cost as the paircounting process goes like O ( 2 ).With ever increasing catalogue sizes approaching hundreds of millions of galaxies, having random catalogues with many more objects than the data catalogues will eventually become unfeasible.Since most of the computation time for the estimate is taken up by the counting of the random-random pairs, it is promising to look for approaches that reduce the time taken to perform this computation given a desired precision of the estimator, for example by requiring smaller random catalogues, or none at all.Keihänen et al. (2019) propose a method to increase the speed of the     calculation by splitting the random catalogue  into   sub-catalogues   and then taking the average of the normalised paircounts within each sub-catalogue.This method does not affect the accuracy of the estimator and can therefore be used to significantly speed up the computation time.However, the cost can still become high if the survey is very large or if the estimator has to be calculated for a large number of data catalogues.
Breton & de la Torre (2021) provide a scheme to analytically calculate the pair counts involving the random catalogue, given the survey window function and the radial selection function, speeding up the estimate significantly.The method requires an estimate of the 2PCF of the angular selection function beforehand.An analytical method for higher-order statistics is yet to be derived, making random catalogues necessary for estimates of ( > 2)-point correlation functions.
Alternatively, it is worthwile exploring random catalogue configurations that have a lower variance than the Poisson-sampled catalogues.The fastest decay of the variance with increasing  for any statistical point distribution in three dimensions is  2 () ∝  −4 which is achieved by a class of distributions that exhibit smaller power on large scales than the uniform Poisson distribution with a power spectrum () ∝  4 .These distributions are called super-homogeneous and examples include periodic grids or glasses (Gabrielli et al. 2002).
Glasses have smaller power than a Poisson distribution on scales larger than the average inter-particle separation, which is why they have been of interest in the scientific literature in the past, but in a slightly different context.It was suggested that these kind of distributions can be used to generate pre-initial conditions for cosmological simulations, which need to have as little power as possible (White 1994;Baugh et al. 1995;Hansen et al. 2007;Joyce et al. 2009).However, it was later found that glasses are not superior to periodic crystals in this regard.Therefore glasses are less commonly used in today's cosmological simulations because of the difficulty that lies in creating them.Recently, Dávila-Kurbán et al. ( 2021) have shown that using glasslike catalogues leads to a smaller variance in the LS estimates of the 2PCF in comparison to using Poisson random catalogues.They demonstrated this on a mock catalogue of simulated galaxies on an equal-time hypersurface in a periodic box.They found that an LS estimate with a desired precision would require glass catalogues with fewer objects than what would be needed using Poisson-sampled random catalogues.
In general, glass distributions can be created from a uniform Poisson distribution by running an N-body simulation with repellent gravitational forces, but this requires a lot of computing resources.As an alternative, Dávila-Kurbán et al. ( 2021) have suggested a fast way of generating glass-like catalogues using the Zeldovich approximation of Lagrangian perturbation theory (Zel'dovich 1970).Starting with a Poisson sample of a uniform distribution in a periodic simulation box, one can iteratively displace the objects in a direction opposite to the one given by the first-order displacement field ì Ψ.After many iterations, the power spectrum of the catalogue is approximately the one of a glass, with () ∝  4 .In their work, Dávila-Kurbán et al. (2021) adapted the publicly available BAO reconstruction code of Bautista et al. (2018) 1 based on the Fourier-space algorithm of Burden et al. (2015).
In this article, we aim to extend this approach to catalogues on the light cone with a redshift-dependent comoving number density of objects ( ()), which can be due to a radial survey selection function or due to an intrinsic redshift-dependent abundance of the objects (for example, high mass dark matter halos are less abundant at high redshifts).We provide the publicly available Python tool grlic, to easily create glass-like random catalogues given the survey specifications (opening angle, survey mask, redshift range and the redshift dependent number density ( ())).We stress the fact that the glass-like catalogues might not be considered truly "random" due to the correlated nature of their data points.Nevertheless, each glasslike catalogue is a random realisation of the underlying background distribution and we will occasionally continue to use this term to underscore the glass-like catalogues' function as the reference "random" dataset in the pair-count estimations of the 2PCF.
Glasses are particularly useful in this context as they allow for a smooth evolution of the number density with radial distance, without any discontinuities as they would be unavoidably present in a crystallike structure.
We use the glass-like random catalogues in the LS estimator to estimate the multipoles of the two-point auto-and cross-correlation functions of a set of simulated catalogues of dark matter particles and halos and find that the resulting estimates using the glass-like random catalogues have a much smaller variance than the estimates using a Poisson-sampled random distribution.There is effectively no bias with respect to the result obtained using a much larger Poissonsampled random catalogue with an identical ( ()).
The paper is structured as follows: in Sec. 2 we describe the methods used to extract the radial density profile from a data catalogue, to Poisson sample a random catalogue with an identical radial density profile and to generate the glass catalogues from them.The application of the Zeldovich method is explained in detail and the measurement procedure for the 2PCF multipole estimates is described.In Sec. 3 we describe the simulations used to create the data catalogues, the data catalogues themselves, and the the glass catalogues used in the LS estimates.In Sec. 4 we present our results on the estimates of the 2PCF multipoles, as well as measurements of their bias and standard deviation.Sec. 5 contains a summary of this work and a final discussion of the results.Appendix A contains convergence tests of our fiducial setup for creating the glass-like random catalogues.Appendix B contains results on an estimate of the 2PCF after applying an exemplary mask to the catalogues.

METHODS FOR CREATING GLASS CATALOGUES ON THE LIGHT CONE
The goal of the following pipeline is to create a glass-like catalogue that mimics the redshift-dependent comoving number density ( ()) of a given data catalogue.The resulting glass has to be locally isotropic in comoving coordinates, which is why it is convenient to first convert the observed redshifts  of the data catalogue into comoving distances  assuming a fiducial background cosmology with specified cosmological parameters.Then, the comoving number density is derived from a histogram of the number counts  (), taking the survey geometry into account.
Alternatively, the user of the code can directly provide a tabulated ( ()) together with the cosmological parameters that fix the mapping between  and .
We then Poisson sample a random catalogue that follows the ( ()) of the data catalogue, and iteratively displace the objects in comoving coordinates until they reach a glass-like distribution.The steps will be outlined in detail in the following subsections.

Extracting 𝑛(𝑟) from the data catalogue
Taking into account that galaxy surveys probe times well into matter domination, the observed redshifts in the data catalogue are converted into comoving distances according to where  is the speed of light,  0 ≡ 100 ℎ km s −1 Mpc −1 the value of the Hubble constant today, Ω  the density parameter for the matter content in the Universe today and Ω Λ the density parameter of dark energy today.In the remainder of this work we will implicitly assume that the inferred comoving distance  () depends on the observed redshift  and drop the argument of the function,  () → .Note that redshift-space distortions (RSD) are not taken into account here, as the goal is to derive the large-scale () behaviour, which will not be affected by RSD significantly due to continuity within the survey.
The number of objects  () within a bin of width Δ centered at the comoving distance  is related to the observed number density () via where  (, , ) is the survey window function that generally depends on , the polar angle  and the azimuthal angle .In the remainder of this work, we will assume simple survey shapes for light cones with  ( min ≤  ≤  max ,  ≤  max , ) = 1 and  = 0 elsewhere.In this case,  max is the half-opening angle of the survey and  min and  max are the nearest and farthest inferred distances of any object in the survey to the observer.The window function can easily be generalised to more complex survey shapes.The number density within each bin can then be estimated from the number of objects in it: where we assumed that Δ is small compared to  and  min ≡ cos  max .

Poisson sampling a random catalogue
The purpose of the random catalogue used within pair-count estimators such as the LS estimator is to serve as a sample of a homogeneous distribution that mimics the overall radial distribution of the data catalogue.In general the () measured from the data will exhibit fluctuations around the underlying background evolution (which is the quantity we are interested in) due to galaxy clustering, and these fluctuations become more pronounced as the half-opening angle of the survey gets smaller (effectively this is due to increasing cosmic variance).Hence, in order to keep the information to be gained from the clustering along the line of sight in the estimate, smoothing the () is desirable.The analytical form of this function is not generally defined, as it depends on the survey selection function and on the intrinsic redshift evolution of the abundance of the observed objects, which in turn depends on the physical properties of the objects, i.e. their masses, luminosities, or colours.Within this work, we perform a third-order polynomial fit to the measured (), to model the overall background evolution of the number density of "observed" dark-matter halos (see Sec. 3 for more details).
We then Poisson sample a random catalogue from the () distribution using the inversion method, where random numbers are drawn from a uniform distribution between zero and one and converted into a distance  by inverting the cumulative probability distribution, where  is the total number of objects in the data catalogue.Sampling   =  objects results in a density profile ().In the case of the third-order polynomial fit to (), the integral for the cumulative probability can be easily calculated analytically.If the functional form of the () is not known, e.g. if it is simply a result of smoothing the measured () with some kernel, the integral can be performed numerically.We interpolate such () with a cubic spline kernel to achieve high numerical precision in the integral.
In general the functional form of  cumul is not known, and even if it is known, finding the inverse is not always trivial.Hence we simply find the mapping between  and  cumul using again a cubic spline interpolation.
We randomly sample the cos  from a uniform distribution between  min and 1, and the  from a uniform distribution between 0 and 2.These Poisson-sampled randoms are then the initial catalogues used for creating the glass-like random catalogues.

From Poisson random catalogues to glass-like random catalogues
Starting from an initial set of Poisson-sampled objects distributed randomly and uniformly in a periodic box one can create a glass by iteratively displacing the objects with a repellent force acting between each pair of objects until an equilibrium distribution is reached.It is convenient to assign some mass to the objects and simply use the force of gravity acting in the opposite direction.A full N-body simulation would be the most accurate implementation of this displacement, but this requires additional computing resources which might become too large in comparison to the eventual gain in accuracy of the estimator.Hence, in this work we employ the Zeldovich approximation as suggested by Dávila-Kurbán et al. (2021).While this is more or less straightforward to implement for distributions with a uniform number density, i.e. () = const., for a non-uniform () additional steps are required to include the effects of external forces that force the objects to take the desired distribution.Those steps will be outlined below.

Zeldovich approximation
Lagrangian perturbation theory models the gravitational dynamics on large scales in the Universe, where matter density perturbations are small, to high accuracy.A central quantity in this theory is the displacement field ì Ψ, which relates the initial Lagrangian position of a fluid element ì  to Eulerian positions ì  observed at time t, Assuming mass conservation between the Eulerian and Lagrangian frame, the displacement field can then be related to the Eulerian density perturbations.The linear-order solution of the displacement field is where  1 is the linear contribution to the density contrast defined in Eq. ( 2) and ì ∇  is the differential nabla operator in Lagrangian coordinates, i.e. ì ∇  • ì  gives the divergence of the vector field ì .The displacement field is solved for in Fourier space: where we use the following Fourier convention: Switching the sign of the displacement field effectively mimics a situation with repellent gravitational forces.This can be used to create a glass-like distribution from an initial Poisson distribution, if such a displacement is applied iteratively until an equilibrium configuration is reached, where all the repulsive forces are balanced.
Numerically this procedure is implemented as follows.First, the domain is discretised into a sufficiently fine grid and the masses of the particles are assigned to the grid cells with a mass assignment scheme, e.g. one of the "nearest grid point", "cloud in cell" or "triangular shaped cloud" schemes.The density contrast within each grid cell is estimated according to Eq. ( 2) and then Fourier transformed using a fast Fourier transform (FFT) algorithm.The Fourier-space displacement field in each grid cell is then estimated from the linear density contrast in each grid cell following Eq.( 13), and transformed back with an inverse FFT.Finally the displacement field is interpolated at the particle positions with a scheme consistent with the chosen mass-assignment scheme.Its negative value is then used to update the particle positions.The density contrast within each cell is updated and the process is repeated until the displacements are sufficiently small.
If a given redshift-dependent () distribution is to be reproduced, it can be incorporated into the background number density from which the density contrast is calculated, n → n(), i.e. the density contrast in each grid cell is calculated with respect to the desired radially dependent number density of the glass-like random catalogue.
In order to achieve periodic boundary conditions, the observer is placed in the center of the box, i.e. the distance  to the center of the box is set to zero, and the box size is chosen so that its side length is at least twice the maximum comoving distance  max in the survey.A buffer zone is added for distances within the box that are larger than  max , with constant background number density n( >  max ) = ( max ) which ensures smooth and differentiable boundary conditions required for the FFT within the periodic box.Grid cells that are located at  <  min are assigned a background number density of n( <  min ) = ( min ).
We note that the so-defined n() is not differentiable at  min and  max , but as long as () evolves reasonably weakly within the survey, this does not cause strong edge effects when later Fourier transforming the number densities on the grid, as the non-differentiability gets smoothed out by the sampling of the background number density on the grid.For stronger evolutions, the n() can be extrapolated to  that lie sufficiently outside of the survey, so that the resulting glass catalogue does not suffer from edge effects near  min and  max .However, this requires increasing the size of the buffer zone to ensure smoothness on the boundaries, which in turn increases the required grid resolution to achieve the same resolution within the actual survey volume.
The glass is then initialised by Poisson sampling a random catalogue as outlined in Sec.2.2 within the full spherical survey shell, i.e. setting  min = −1, and Poisson sampling uniformly from ( <  min ) ≡ ( min ) and, respectively, ( >  max ) ≡ ( max ) in the buffer zones.We then iteratively displace the objects with the reversed Zeldovich displacement field  iter times with the goal to achieve a glass-like structure.After  iter iterations, only the objects within the specified survey volume are kept in the final glass-like catalogue.
Depending on the Fourier grid resolution, after a certain amount of iterations, the objects will start to align with the grid, effectively introducing long-range periodicities in the distribution, which is to be avoided if the goal is to create a non-periodic glass.Hence, a sweet spot is to be found, where each iteration reduces the initially Poisson variance, while too many iterations lead to a periodic crystallike structure.Again, such a crystal-like structure is undesirable for the task at hand, because it introduces artificial periodicities and anisotropies into the random catalogue.

Measuring the 2PCF
The LS estimate of the 2PCF is perfomed according to Eq. ( 3), where we use a mid-point line of sight definition: ì  = ì  + ì /2.Eq. ( 3) can be used for the cross-correlation between two sets of objects,  1 and  2 , as well as for the autocorrelation by substituting  2 and  2 with  1 and  1 .
We note that in the case of autocorrelations, the glass approach requires using two separate glass catalogues, since the data points within the glass are now correlated.Using only one glass catalogue would lead to a biased estimate of the  1  2 (, ) (Dávila-Kurbán et al. 2021).This is not a necessary step when estimating autocorrelations using a Poisson-sampled random catalogue, but since using two separate random catalogues lowers the variance of the estimator (Dávila-Kurbán et al. 2021), we will also use two separate Poissonsampled random catalogues for fair comparisons of variances of the autocorrelation estimates using glasses and Poisson samples.
In this work we are using a modified version of the publicly available code CUTE (Alonso 2012), that implements the described LSestimator in a very efficient way making optimal use of parallel computing.The modifications follow the procedure of Breton et al. (2019), which allows for the measurement of odd multipoles in the full 3D correlation function.We note that we use the cross-correlation algorithm of CUTE even for the autocorrelations, meaning that we double count the pairs in the data catalogues and hence set    to zero in Eq. ( 4).
The multipoles are then estimated from the full correlation function according to where L ℓ () is the ℓ-th order Legendre polynomial and Δ is the width of the -bin.

Variance of the LS estimator
Let us reconsider the LS estimator of Eq. ( 3).The bias and variance of the LS estimator has been derived in detail in Landy & Szalay (1993), assuming an infinitely large random catalogue.In this limit, and with infinitesimally small bins, the estimator is unbiased and has almost Poisson variance.In Keihänen et al. (2019) the derivation was generalized to finite random catalogues, which introduces a small bias to the estimate.While the variance of the LS estimate depends on the size of the bins used for the pair counts, as well as on geometric properties of the survey (these are the so-called edge terms in the variance), to zeroth order in  the LS variance is dominated by the Poisson variances of the pair counts.Then, keeping the number of objects in the data catalogues fixed, the biggest contribution to the variance of the LS estimate comes from the     terms, as their Poisson variance decreases as  2 (    ) ∝  −1 . The contribution from the  1  2 term is suppressed, as it has a variance that decreases more quickly, . For very large numbers of objects in the Poisson-sampled random catalogue, the variance of the LS estimator is hence expected to be almost proportional to  −1   .Using glass-like catalogues instead of Poisson-sampled catalogues leads to a different expected variance of the LS estimator.We will assume that its variance is still dominated by the variance of the     terms, which is now no longer Poisson.As discussed before, the variance within a sphere of radius  scales like  −4 for a glass distribution, in contrast to the  −3 scaling of a Poisson distribution.for a glass catalogue.If the number of objects in the random catalogue is much larger than the number of objects in the corresponding data catalogue by a factor of , i.e.    =    , then we expect the standard deviation  of the LS estimate to decrease approximately like  ∝  −0.5 for the estimate utilising the Poissonsampled random catalogues, and like  ∝  −2/3 for the estimate derived from the glass-like catalogues.

DATA CATALOGUES
To test the validity of our approach for generating glass-like random catalogues with a specified () distribution, we estimate the two-point autocorrelation and cross-correlation functions for a set of particle and halo light cones and compare the bias and variance of the estimators using different numbers of objects within the glass and random catalogues.The particle light cone used in this work is generated with gevolution, a relativistic N-body code for cosmological simulations which produces light-cone data in comoving space during run time (Adamek et al. 2016).Notably, the full space-time metric on the light cone is stored, allowing for self-consistent ray tracing of the objects on the light cone.In this work, we make use of the full-sky particle light cone of the unity2-lowz simulation, with a number of  part = 5760 3 particles within a periodic box of volume  box = (4032 Mpc/ℎ) 3 .This translates to a mass resolution of  part = 3 × 10 10  ⊙ /ℎ.Within gevolution the force of gravity acting on each particle is calculated with a particle-mesh scheme -the mesh resolution within the unity2-lowz simulation is 700 kpc/ℎ, i.e. the number of grid cells is exactly the number of particles in the periodic simulation box.The underlying cosmology is a standard ΛCDM model with three neutrino species with masses of 0 eV, 0.008689 eV and 0.05 eV, respectively.The cosmological parameters are set to   = 2.215 × 10 −9 ,   = 0.9619, ℎ = 0.67, Ω b = 0.049, Ω cdm = 0.26858 and  CMB = 2.7255 .Adding the massive neutrino contribution to the total matter density, we end up with Ω  = 0.31898 and Ω Λ = 0.68095, i.e. the global spacetime is flat.The initial conditions are generated from the linear transfer functions of class (Blas et al. 2011).In (Lepori et al. 2023), dark matter halos are identified as spherical overdensities by running the rockstar halo finder (Behroozi et al. 2013) on the particle light cone in comoving space.The halos and particles are then raytraced through the metric light cone using the ray-tracing algorithm described in Lepori et al. (2020) to finally obtain a full-sky particle catalogue and a full-sky halo catalogue on the light cone in redshift space, including RSD and other relativistic effects.In this work, we make use of those catalogues.Specifically we select a subset of particles and halos from the full light cone outputs at redshifts 0.05 ≤  < 0.5, which translates to  min ≈ 148 Mpc/ℎ and  max ≈ 1313 Mpc/ℎ.For the particle light cone we randomly select ∼ 5 × 10 6 objects and from the halo catalogue we extract two subsamples: one that contains lowmass halos that are each made up of 30 ≤  part < 40 FoF particles as identified by rockstar, corresponding to an FoF mass range of 9 × 10 11  ⊙ /ℎ ≤  FoF < 1.2 × 10 12  ⊙ /ℎ, and one that contains high-mass halos that are made up of  part ≥ 300 particles each, corresponding to masses  FoF ≥ 9 × 10 13  ⊙ /ℎ.The catalogues used in this work are summarised in Table 1.For the halo catalogues, we fit a polynomial of third order to the (), weighting the fit by the inverse of the Poisson error within each shell, i.e. we minimise where   is the number density within the -th -bin, n the value of the polynomial at the -th -bin and   the standard deviation of the number density within the -th -bin, which is assumed to be approximately Poisson, where   is the number of objects in the  − ℎ bin and   is the volume of the -th shell with width Δ.For the particle light cone, we assume () to be constant, as the number of dark matter particles is conserved in the simulation and there is no selection function for our survey, i.e. all objects are observed with a probability of  = 1.The number density of the particles () = const.= n is estimated from dividing the total number of particles in the catalogue by the survey volume.We plot the radially dependent number density of objects for each catalogue in Fig. 1, together with the background () derived from the polynomial fits to the binned halo number densities and the constant value associated with the particle number density.

Glass catalogue specifications
Following the steps outlined in Sec.value of , we create 20 pairs of independent random catalogues  1 and  2 to be used for the LS estimate according to Eq. ( 3), giving 40 independent Poisson-sampled and 40 independent glasslike random catalogues per  per data catalogue.We note that the Poisson-sampled catalogues have exactly    =    objects, while the glass-like catalogues can end up with more or fewer objects as some objects can be displaced across the survey volume boundaries during the Zeldovich iterations.The fluctuation in the number of objects in the glass-like catalogues is below one percent of the desired number.
We choose a number of Zeldovich iterations  iter = 2, the impact of this choice is tested by creating additional glass-like catalogues for the high-mass halo catalogue, using numbers of iterations  iter ∈ {1, 2, 3, 5, 10} in Appendix A. The box size is chosen such that the buffer zone spans ∼ 400 Mpc/ℎ on both sides of each direction, i.e. the side length of the cubic box in which we evolve the glasses is  box = 2 × ( max + 400) Mpc/ℎ ≈ 3427 Mpc/ℎ.Following the approach of Dávila-Kurbán et al. ( 2021) the number of grid points is chosen such that the cell size is approximately one quarter of the average inter-particle separation in the light cone, i.e. for the particle light cone, with  ≈ 0.5 (ℎ/10 Mpc) 3 the average interparticle separation is  inter =  −1/3 ≈ 12.6 Mpc/ℎ.This leads to a required number of  grid ≈ 1080 grid cells in one grid dimension, i.e. the total number of grid cells is ( grid ) 3 .In order to speed up the FFT, we choose the nearest number that is a power of two, which results in  grid = 1024.We test the impact of this choice by varying the number of grid cells  grid ∈ {128, 256, 512, 1024} in Appendix A. We employ a "cloud in cell" mass assignment scheme that is implemented in the public code Pylians (Villaescusa-Navarro 2018) to distribute the particle point masses onto the grid and to interpolate the displacement field at the particle positions.
In Fig. 2 the () measured from each data catalogue (dotted curves) are plotted together with the () of corresponding Poissonsampled catalogues (dash-dotted curve) and glass catalogues (solid curve) with  = 1.The density distributions of the Poisson-sampled random catalogues as well as those of the glass-like catalogues are in good agreement with the measured () of the data.The () of the Poisson-sampled catalogues fluctuate more strongly than the () of the glass-like random catalogues.
In Fig. 3  The number density of the particle catalogue is conserved, which means that the objects in the Poisson-sampled random catalogues and the glass-like catalogues are distributed uniformly in the periodic box (before removing the objects outside of the survey volume).We show the power spectrum of these catalogues estimated with Pylians using a 512 3 grid in Fig. 4. The case without any Zeldovich iterations,  iter = 0, corresponds to the Poisson-sampled catalogue, and the cases with  iter > 0 correspond to the glasslike catalogue after  iter Zeldovich iterations.The Poisson-sample has a shot noise power spectrum, () = 1/ n, as expected.The Zeldovich iterations decrease the power on scales larger than the average inter-particle separation.The Nyquist wave number is marked with the blue dashed vertical line.The orange dashed vertical line marks the wave number that corresponds to fluctuations with wavelength equal to the maximum separation considered in the LS estimate of the 2PCF,  =  max = 120 Mpc/ℎ.Fluctuations with wavelengths much larger than that are expected to contribute only weakly to the correlation function and its variance below  max .We find that for  above that wave number and below the Nyquist wave number the power spectrum of the glass-like catalogue is very close to () ∝  4 after  iter = 2 Zeldovich iterations, which is the result for a glass distribution.

2PCF multipoles
We estimate the full three-dimensional 2PCF of all possible combinations of data catalogues (DD), with 500 bins in −1 ≤  ≤ 1 and 25 bins in 0 Mpc/ℎ ≤  ≤ 150 Mpc/ℎ according to Eq. ( 3).This leads to -bin widths of Δ = 6 Mpc/ℎ.For each DD pair, we estimate the 2PCF using different types of random catalogues (C), which can be either Poisson-sampled random catalogues (R) or glass-like random catalogues (G), and different values of .For each of these cases, we perform 20 independent estimates of the 2PCF, using independent pairs of the same type of random catalogue.From each full 2PCF estimate, we proceed to estimate the monopole, dipole and quadrupole (ℓ = 0, ℓ = 1, ℓ = 2) following Eq.( 16).
Then, the sample mean ⟨ C,  ℓ,DD ()⟩ within each bin of  is estimated from the 20 individual 2PCF multipole estimates according to where the  C,  ,ℓ,DD () are the individual LS estimates of the 20 2PCFs using the same type of random catalogue and the same value of .Then we estimate the sample standard deviation  C,  ℓ,DD () as: which is the square root of the sample variance estimator.Fig. 5 shows the sample mean of the multipoles of the particle (P), low-mass halo (L) and high-mass halo (H) auto-and cross-correlation multipoles using the Poisson-sampled random catalogues with  = 20, which is understood to be an estimate close to the true 2PCFs of the respective data catalogues.For the even multipoles the absolute values are plotted.The monopole is negative for values of  ⪆ 117Mpc/ℎ, and the quadrupole is always negative in the range of  shown in the Figure .We show here only the dipoles of the cross-correlations, as the autocorrelations have a vanishing dipole.The errorbars indicate the standard deviations in each bin of .We additionally plot the theoretical prediction for the matter autocorrelation at the effective redshift of the particle catalogue, z = 0.364 produced by coffe (Tansella et al. 2018), which includes all linear order relativistic effects in the full-sky two-point correlation function.
The monopole signal of the particle autocorrelation agrees well with the theoretical linear prediction for the matter autocorrelation.The BAO peak in the correlation of the particle catalogue is broader than the one of the linear prediction, which is an expected result from nonlinear clustering in redshift space (McCullagh & Szalay 2012, 2015).
The dipole signal is an expected result for redshift space crosscorrelations of differently biased tracers (Bonvin et al. 2014), such as low-and high-mass halos (in this context, the linear bias  is a quantity which encapsulates how much stronger a field  H is clustered in comparison to the matter field  m via the relation  H =  m , from which it is evident that the linear bias is also expressed through the amplitude of the 2PCF monopole compared to the 2PCF monopole of matter).Here, linear theory predicts that the amplitude of the dipole signal is positively affected by large differences in the linear bias between the correlated populations.The linear biases of the high-mass and low-mass halo catalogues can be estimated from their monopoles and the particle autocorrelation monopole: where DD is either HH for the high-mass halos or LL for the low-mass halos.Alternatively, one can divide the respective halo-particle crosscorrelation monopole by the particle autocorrelation monopole to get a separate estimate of the linear bias.Using both of these methods, we derive a linear bias  HH ≈ 1.76 for the high-mass halo catalogue and  LL ≈ 1.47 for the low-mass halo catalogue.The linear bias of the particle catalogue is per definition  PP = 1, so the linear bias difference between the high-mass halo catalogue and the particle catalogue is larger than the linear bias difference between the highmass halo catalogue and the low-mass halo catalogue.Taking only this linear bias difference into account, the expected dipole amplitude of the cross-correlation between the high-mass halos and the particles should be larger than that of the cross-correlation between high-mass halos and low-mass halos, but the opposite is true.
There is an additional contribution to the redshift-space 2PCF dipole from the so-called evolution bias (Maartens et al. 2021) where () is the comoving number density of the catalogue and  is the cosmic scale factor,  = 1/(1 + ).The evolution bias quantifies the evolution of a population's comoving number density with respect to redshift due to e.g.merging -as can be seen in Fig. 1 the number density of the high-mass halos decreases with redshift, while the number density of low-mass halos increases.Similar to the linear bias, a large evolution bias difference leads to a predicted increase of the dipole amplitude.We estimate the evolution bias of the halo catalogues according to Eq. ( 22), and get  evo HH ≈ 0.85 and  evo LL ≈ −0.28.Again, per definition the evolution bias of the particle catalogue is zero, so indeed there is a larger bias difference between the high-mass halo catalogue and the low-mass halo catalogue, which might explain why the dipole amplitude is larger for the high-mass halo low-mass halo cross correlation.However, plugging these values into the theoretical model of coffe (not shown here), we still find a disagreement with the dipole amplitude of our measurements.We suspect that a contributing factor could be the unequal distribution of the two halo populations across different redshifts.While coffe predicts the dipole signal at a fixed effective redshift, the dipole we observe is taken over a relatively large redshift range.At large redshifts, there are more low-mass halos and fewer high-mass halos, possibly contributing to the final dipole with a larger bias difference than what is inferred from the 2PCF monopoles taken across the whole redshift range.The exact theoretical modeling of this effect is beyond the scope of this paper and will be investigated in future work.
The linear biases are also expressed in the quadrupole estimates.While the linear theory prediction follows approximately the trend of the particle catalogue estimate, it does not match exactly.We suggest this can be attributed to non-linear contributions like the finger-of-God effect (Jackson 1972), which is an apparent elongation along the line of sight of the redshift-space tracer distribution close to the centers of clusters, induced by a Doppler shift from the velocity dispersion.The discrepancy between the linear theory prediction of Kaiser (1987) and measurements on scales below ∼ 20 Mpc/ℎ is well known and attempts at including this effect into the modeling have been made (see e.g.Peacock et al. (2001); Scoccimarro (2004); Bianchi et al. (2015Bianchi et al. ( , 2016)); Satpathy et al. (2017)).In the quadrupole the finger-of-God effect acts opposite to the redshift space distortions on larger scales which appear as a flattening of the tracer distribution due to the Doppler shift induced by coherent infall velocities of the tracers.Indeed, in Fig. 5 the estimated quadrupole of the particle catalogue 2PCF is suppressed with respect to the theory prediction on scales below ∼ 80 Mpc/ℎ, with the difference becoming larger going to smaller scales, which supports the assumption of non-linear velocity contributions playing a role in explaining this discrepancy.

Bias and standard deviation of the 2PCF multipole estimates
We now quantify the accuracy and precision of the different LS estimates.In this context, we define the bias Δ C,  ℓ,DD () of each sample mean 2PCF multipole with respect to the corresponding estimate using the Poisson-sampled random catalogues with  = 20: which contains information on the accuracy of the 2PCF estimate.The precision of each estimate is quantified by the ratio of its standard deviation to the standard deviation of the corresponding measurement using the Poisson-sampled random catalogue with  = 20: The biases and standard deviations of the 2PCF estimates of the different data catalogue pairs DD ∈ {HH, HL, LL, HP, LP, PP} are similar when the particular choice of random catalogue and  is the same.In Fig. 6 we plot the bias of the multipoles of cross-correlation between the high-mass halo catalogue and the particle catalogue for different choices of random catalogues and values of .The biases of the LS estimates of the other data catalogue pairs are plotted with a semi-transparent line style to demonstrate the expected range of the bias depending on the used data catalogues.The absolute value of the bias for all measurements stays well below 10 −4 for all multipoles of interest and separations  > 20 Mpc/ℎ.Increasing the value of  leads to less fluctuations in the bias for all multipoles irrespective of the chosen type of random catalogue, and to a smaller span between

|(d)
Figure 5.The sample mean monopole, dipole and quadropole of the LS estimates of the particle (P), low-mass halo (L) and high-mass halo (H) auto-and cross-correlations using 20 pairs of independent Poisson-sampled random catalogues with  = 20.We show the absolute values of the even multipoles; the monopole becomes negative at  ⪆ 117 Mpc/ℎ, while the quadruopole is always negative in the depicted range of .For the dipole, only the cross-correlations are shown, as the autocorrelation dipoles vanish.The errorbars indicate the standard deviation in each bin. the The dotted grey lines are the linear theory predictions for the particle autocorrelation multipoles at the effective redshift z = 0.364 produced with coffe.
the minimum and maximum bias across the different data catalogue combinations.At small scales,  < 20 Mpc/ℎ, the bias fluctuates more strongly, e.g. for the monopole using the glass-like catalogue with  = 0.5, it increases up to 2 × 10 −4 .We suspect this is mainly due to a more noisy estimate of the 2PCF multipoles on smaller scales, as on these scales there are fewer pairs to be counted.
It is expected that the measurements using the glass-like catalogues behave like the measurements using the Poisson-sampled random catalogues with the same  at scales below the average inter-particle separation.Nevertheless, most evident for the monopole estimates, the glass results start to disagree systematically with the results from using the Poisson-sampled random catalogues at small scales,  < 20 Mpc/ℎ, as the bias increases to values of ∼ 2 × 10 −4 at  = 10 Mpc/ℎ for any of the glass-like catalogues used in this work.We suspect that this additional bias at small scales can be attributed to the limitations of the Zeldovich approximation, which requires a discrete grid over which the density distribution is smoothed.If the size of the grid cells is not significantly smaller than the average inter-particle separation, a bias is introduced to the LS estimate of the 2PCF multipoles on scales below the grid cell size (see also Appendix A).
We plot the standard-deviation ratio of the estimated multipoles of the cross-correlation between the high-mass halo catalogue and the particle catalogue for different choices of random catalogues and values of  in Fig. 7.The standard-deviation ratios of the LS estimates of the other data catalogue pairs are plotted with a semi-transparent line style to demonstrate the expected range of the standard-deviation ratio depending on the used data catalogues.The glass-like catalogues outperform the Poisson-sampled random catalogues significantly and to similar degree on all scales considered in this work.At  = 1 the standard-deviation ratio using the glass-like random catalogues is only R , =1 ℓ,HP () ∼ 2, while for the Poisson-sampled random catalogues, it is R , =1 ℓ,HP () ∼ 5, i.e. in this case the estimate using the glass catalogue is more than twice as precise as the estimate using the Poisson-sampled random catalogue.The standard deviation using the glass-like catalogue with  = 2 is almost the same as using the Poisson-sampled random catalogue with  = 20, which is a significant improvement.This holds approximately for all auto-and cross-correlations considered in this work, with minor fluctuations.
Figure 7. Standard-deviation ratio of estimated multipoles of the crosscorrelation between the high-mass halo catalogue and the particle catalogue for different choices of random catalogues and values of  in Fig. 7. Solid lines represent results from using glass-like random catalogues, while dashdotted lines represent results from using Poisson-sampled random catalogues.
The standard-deviation ratios of the LS estimates of the other data catalogue pairs are plotted with a semi-transparent line style.

Scaling of the standard deviation with 𝛼
We now have a closer look at the scaling of the standard deviation of the LS estimate with increasing numbers of objects in the Poissonsampled or glass-like catalogues, respectively.To this end we plot the standard-deviation ratio of the estimated multipoles of the crosscorrelation between the high-mass halo catalogue and the particle catalogue in the bin centered at  = 99 Mpc/ℎ against , depending on the chosen type of random catalogue, in Fig. 8. Again, the transparent lines show the results for the LS estimates of the remaining data catalogue pairs.Here, we also include results from using a glass-like catalogue with  = 0.1, which have a significantly higher median standard deviation than the other catalogues with larger .
From the considerations in Sec.2.5 we expect different power-law behaviours of the standard deviation ratio with increasing , depending on the type of random catalogues used in the LS estimate.We fit power laws to the data (grey dashed lines) and find that the standard-deviation ratio of the monopole estimate using the glasslike catalogues behaves like R G,  0,HP ( = 99 Mpc/ℎ) ∝  −0.9 while for the Poisson-sampled random catalogues the power law is less steep: R R,  0,HP ( = 99 Mpc/ℎ) ∝  −0.48 ≈  −0.5 .Hence, the larger Standard-deviation ratio of the estimated multipoles of the crosscorrelation between the high-mass halo catalogue and the particle catalogue in the bin centered at  = 99 Mpc/ℎ using glass-like random catalogues (solid curves) and Poisson-sampled random catalogues (dash-dotted curves), versus the value of .The transparent lines show the results for the LS estimates of the other data catalogue pairs.We find different powerlaw behaviours for the LS estimates of the 2PCF monopole, depending on the type of random catalogue used: R G,  0,HP ( = 99 Mpc/ℎ) ∝  −0.9 and R R,  0,HP ( = 99 Mpc/ℎ) ∝  −0.48 (dashed grey lines).The power law behaviour of the other two multipoles is similar.Additionally, the grey dotted line shows the expected power law for LS estimates utilising glass-like catalogues according to the considerations in Sec.2.5. the choice of , the bigger the advantage of the glass-like catalogues.A similar behaviour is found for the other two multipoles investigated in this work.The grey dotted line shows the expected power law for LS estimates utilising glass-like catalogues according to the considerations in Sec.2.5.The measured power law for the LS estimate utilising glass-like catalogues disagrees slightly with the expected result, but is within the range of fluctuations from the measurements between all the data catalogue combinations.

CONCLUSIONS
Building on top of the ideas outlined in Dávila-Kurbán et al. ( 2021), we developed the publicly available code grlic, which can be used to generate glass-like point distributions with radially-dependent number densities.The main application of this code is the generation of random catalogues for pair-count estimates of -point correlation functions of e.g.galaxy-clustering data on cosmological light cones.
The commonly used LS estimator relies on the assumption that a random catalogue with a very large number density is used.If this assumption holds the LS estimator is unbiased, and its variance is dominated by the variance of the random catalogue.Hence, it is common practice to generate Poisson-sampled random catalogues that mimic the radially-dependent number density and contain a factor of order ∼ 10 to ∼ 100 more objects than the data catalogues.
An alternative approach has been explored by Dávila-Kurbán et al. (2021).Glass-like catalogues can be generated from Poissonsampled random catalogues using the Zeldovich approximation, and these catalogues exhibit considerably less power on scales above the average inter-particle distance, and hence require a lower number density than the usual Poisson-sampled random catalogues to achieve a similar variance in the LS estimator.This method has been applied to data catalogues with uniform background number densities, e.g.simulated galaxies within a periodic box, on an equal-time hypersurface.Glass-like catalogues are a convenient choice here, because they have already been investigated on their possible use for generating pre-initial conditions within cosmological N-body simulations, which evolve particles under the influence of gravity on an equal-time hypersurface within a periodic box.For the first time, we show that it is possible to create glass-like catalogues that follow a given background density distribution on large scales but qualitatively preserve the glass-like properties on small scales.This is a big advantage to periodic crystals which will have discontinuities if their large scale density is to be inhomogeneous.
We apply this idea to radially-dependent distributions as they are common in galaxy or halo catalogues with selection functions and verify that the non-uniform glass-like random approach gives similarly promising results as they have been reported in Dávila-Kurbán et al. (2021).For this purpose, we estimate the auto-and crosscorrelations of three sets of simulated data catalogues using different Poisson-sampled and glass-like random catalogues with various numbers of objects: one particle catalogue with constant comoving number density, one high-mass halo catalogue with decreasing comoving number density as the comoving distance increases, and one low-mass halo catalogue with an increasing comoving number density as the comoving distance increases.We extract the first three multipoles of the LS estimator of these 2PCFs and find that the glass-like catalogues generated in this way outperform the Poissonsampled catalogues significantly.
We find that no significant bias is introduced on most scales when using glass-like random catalogues.Only on scales below  = 20 Mpc/ℎ there is a slight increase of the bias with our fiducial setup, but relative to the signal, it is still very small.The tests performed in Appendix A hint to the idea that this additional small bias is caused by inaccuracies in the Zeldovich approximation due to a limited grid resolution.Hence, it will be interesting to explore the results using an N-body PM 3 scheme, which works just like the Zeldovich approximation on large scales, but uses direct particleparticle force calculations on smaller scales.It is expected that this approach will be less biased, but the required additional computing resources might not make up for the reduction of the bias.
Currently, the variance in the estimate of the cosmic 2PCF of galaxies is dominated by the sample variance.Given the ongoing technological advances for cosmological surveys we speculate that future surveys that cover a larger volume of the observable Universe and contain a higher density of tracers of the underlying density distribution will have reduced sample variance and hence might eventually warrant a need for a more precise estimator of the 2PCF, which can be acquired by gaining control of the variance arising from the statistical fluctuations in the random catalogues.This variance can be reduced by increasing the number of objects in the random catalogue.
Alternatively, glass-like catalogues can be used instead of Poissonsampled random catalogues, as the LS estimates using the glasslike random catalogues have significantly reduced variance.We find a power-law behaviour of the standard deviation ratio at  = 99 Mpc/ℎ, where the glass-like estimates of the monopole of the cross-correlation between the high-mass halos and the particles follow a steeper power law than the Poisson-sampled random estimates: R G,  0,HP ( = 99 Mpc/ℎ) ∝  −0.9 and R R,  0,HP ( = 99 Mpc/ℎ) ∝  −0.48 .While we recover the expected behaviour for the standard deviation of the LS estimate using the Poisson-sampled random cat-alogues, the power-law slope for the glass-like random catalogues is steeper than the expected value of −2/3.We propose one possible explanation for this discrepancy: for the measurements with  < 1 we suspect that the variance of the     term could become important and contribute to the power law with a different slope.For the glass distributions, the standard deviation of the     term is expected to . The combined contributions of the     and     terms could then lead to a broken power law which only goes like  −2/3 for larger values of , and like  −4/3 for small values of .The value we measure is in between the expected values for the two dominant contributions, which could hint that indeed both contributions are relevant in the range of sampled values of .
In general, slight deviations from the Poisson variance are expected even for the LS estimate utilising the Poisson-sampled catalogues.This is because the bin width and survey geometry affect the variance via the so-called edge terms (see e.g.Landy & Szalay (1993); Keihänen et al. (2019)).Since the width of the -bins used in the 2PCF estimate affects its variance, it matters also when assessing the benefit of using glass-like random catalogues over Poisson-sampled random catalogues (see also Appendix A).If the bin size is chosen to be too small, i.e. much smaller than the average inter-particle separation, the variance of the estimate using glass-like catalogues is similar to the variance of the estimate using Poisson-sampled random catalogues, because in the glass-like catalogue the fluctuations are only suppressed on scales larger than the average inter-particle separation and fluctuations in the number counts between two small neighbouring bins are uncorrelated, just as they are in the Poisson-sampled random catalogue.As the bin width is increased, the correlated nature of the larger scale fluctuations in the glass-like catalogues makes the variance decrease faster than in the Poisson sample, whose fluctuations remain uncorrelated on all scales.This could provide another explanation for the unexpected power law-slope for the glass-like random catalogues: as we decrease , but keep the bin width fixed, the variance contribution from the bin width can become non-negligible such that the variance using glass-like random catalogues becomes more similar to the variance one gets when using Poisson-sampled random catalogues.
In future work, it would be interesting to perform a robust theoretical modeling of the variance of the LS estimator using glass-like catalogues, in a similar fashion as in Keihänen et al. (2019), to gain a better understanding of this scaling.
For the data catalogues used in this work, we find that the LS estimate using a glass-like catalogue with  = 2 is as precise the LS estimate using a Poisson-sampled random catalogue with  = 20.While this translates to a hundredfold speedup of the O ( 2 ) computation of the  1  2 pair-counts, increasing  further would make the advantage of the glass-like catalogues even more pronounced, due to the steeper power law with the glass-like catalogues (even though we expect the power law slope to become closer to −2/3 for larger ).
Our results apply specifically to catalogues with survey volumes, number densities and estimator bin widths as specified in this work, but should translate to catalogues with different volumes and number densities if the -bin width is adapted accordingly and if a wellresolved Zeldovich grid is computationally affordable.
As an example, consider a typical survey with lower number density, such as the SDSS DR14Q quasar catalogue (Pâris et al. 2018) which contains around 80 quasars per square degree at redshifts 0.9 <  < 2.2 within a survey area of 2044 deg 2 .Given the redshift depth and the survey area, the volume  that contains  = 80 deg −2 × 2044 deg 2 = 163520 quasars is calculated,  = 4/3( ( = 2.2) 3 −  ( = 0.9) 3 )  , where  is the fraction of the survey area with respect to the full sky,  = 2044/41253, and the comoving distances to each redshift are  ( = 2.2) ≈ 3756.5 Mpc/ℎ and  ( = 0.9) ≈ 2115.4Mpc/ℎ.From these one can estimate the observed number density of quasars in that redshift bin to be / =  ≈ 1.8 × 10 −5 ℎ 3 /Mpc 3 , which is approximately 28 times lower than the number density of the particle catalogue investigated in this work.The average inter-particle separation is then  inter ≈ 38 Mpc/ℎ and an improvement of the variance is expected if the -bin width is chosen to be not much smaller than  inter /.The required box size for such a survey is given by twice the distance to the maximum redshift  = 2.2 with an additional buffer zone of ≈ 400 Mpc/ℎ on each side, which results in approximately 8400 Mpc/ℎ.Then, the required number of grid cells such that the cell size is approximately equal to one quarter of the average interparticle separation is  grid = 4 × 8400/38 ≈ 880.The required resolution is similar to the resolution used in the paper, so it is reasonable to assume that creating glass-like catalogues for this case is efficient.Due to the smaller number densities in such a survey, the shot noise of a Poisson-sampled random catalogue at a fixed  would be larger.One could minimize the dominant Poisson variance contribution of the data-random pairs by choosing a larger  such that the number of random-random pairs with separations below the maximum scale of interest remains small enough to be computable in a reasonable amount of time.Creating a glass-like catalogue could reduce the variance contribution of the data-random pairs even further, and the results presented in this work suggest that this relative reduction is stronger for larger values of .The larger volume of the quasar survey makes the contribution from the sample variance smaller and thus increases the relative importance of the variance of the estimator itself.Therefore, creating glass-like catalogues for the estimation of the 2PCF in such a survey is expected to be worthwhile.
The ()-distributions of the catalogues investigated here vary by relatively small amounts over the survey depth.Introducing selection functions to the survey can lead to less trivial variations of the ()distribution for each catalogue.Tests on artificial catalogues with stronger variations in () have shown that for these catalogues, glasslike catalogues produced with grlic lead to similar improvements in the variance of the estimated 2PCF as was found for the catalogues investigated in this work.
We attempt to model the effects of a survey mask in Appendix B and find that the estimator variance is reduced in a similar fashion when using glass-like random catalogues compared to using Poissonsampled random catalogues for the specified mask, suggesting that the geometric contributions to the variance are sub-dominant.It is unclear whether a more complicated mask could eventually lead to the geometric contributions to the variance to dominate.When this happens, we expect that the advantage of using glass-like random catalogues over Poisson-sampled random catalogues might differ from what is reported in this work -we suspect that the glass-like catalogues will exhibit a lower variance contribution from geometric effects if compared to Poisson-sampled catalogues, because in general the edges of the survey will be sampled more smoothly by the glass-like catalogue, since its fluctuations are suppressed with respect to the Poisson-sampled case on most scales.A detailed study of the effects of masks on our results is beyond the scope of this paper and is postponed to future work.
In practice, if grlic is to be used for masked survey data, it is required that the underlying () distribution is derived from the survey beforehand by using Eq. ( 8) and a version of Eq. ( 9) that models the actual window function of the survey instead of setting it to 1 everywhere.From this underlying () distribution, an unmasked glass-like catalogue can be sampled with grlic, and the survey mask is applied to the glass-like catalogue, e.g. by weighing each object accordingly, when computing the 2PCF estimate.
We suggest that the glass-like random catalogues will also prove to be useful for pair-count based estimates of higher order statistics such as the three-point correlation function, where the computation time scales like O ( 3 ).Creating glass-like random catalogues can also be beneficial for Fourier space analyses using e.g. the FKP estimator described by Feldman, Kaiser and Peacock (Feldman et al. 1994), where high values of  are required to suppress the estimator variance.Here, the requirement of large  does not pose such a big problem for the computation time, because FFT algorithms can solve the discretised equations quickly on a grid regardless of the size of the random catalogue.However, being able to reduce the required  by using glass-like catalogues could be advantageous in some cases, e.g. if a speed up of the mass-assignment procedure used to assign the densities to the grid on which the FFT is perfomed is desired, or if there are memory limitations for storing the random catalogues.
The main limitations in creating the glass-like catalogues with grlic are memory requirements for the discretised grid used in the Zeldovich approach.The grid needs to be finely spaced in order to avoid biasing of the 2PCF estimate at small scales.For surveys with maximum redshift  max ≈ 0.5 and comoving number densities of  ≈ (ℎ/10 Mpc) 3 , a grid with  grid = 1024 is sufficient for unbiased measurements for  ≥ 20 Mpc/ℎ, but the demand for a larger number of grid cells increases as the maximum redshift or the comoving number density of the survey becomes larger in order to ensure a well-resolved Zeldovich grid with as few objects per grid cell as possible.In the current implementation of grlic, the observer is put into the center of the discretized box, and the box size is adjusted to fit the maximum comoving distance of the objects within the data catalogue, effectively increasing the physical size of the grid cells.This will make creating glass-like catalogues for very deep pencil beam surveys with high number densities difficult.In the future, this aspect can be made more memory efficient: instead of creating a full spherical glass with the given () in a periodic cube, from which the final glass-like random catalogue is cut out, the survey volume can be placed into a minimum bounding box.Here, avoiding unwanted edge effects is more involved, because of the discontinuous number densities on the periodic boundaries of the bounding box.A transitional region between the survey volume and the bounding volume can be implemented to ensure stability within the survey volume.In the current implementation of grlic, the bulk of the memory requirement is covered by the entities stored on the discretised grid.The background number density, the density contrast and the displacement field are stored using single precision floating point numbers.On a grid with  grid = 1024 these already require ≈ 4 + 4 + 13 = 21 GB of random-access memory.Again this could be circumvented by using PM 3 N-body simulations which do not require high grid resolutions to accurately simulate the repellent forces on small scales.Nevertheless it is expected that in the near future, memory resources will become more available which will make the use of glass-like catalogues even more advantageous compared to using Poisson-sampled random catalogues.

APPENDIX A: CONVERGENCE TESTS
We test the convergence of our fiducial setup by varying the number of Zeldovich iterations  iter as well as the number of grid cells  grid used in the generation of glass-like catalogues with  = 1 for the LS estimate of the 2PCF multipoles for of the high-mass halo catalogue.
In general, each successive iteration of the Zeldovich displacement represents a higher order correction to the initially Poisson-sampled random catalogue, so already after one iteration, the Poisson noise is expected to be reduced by the largest amount due to the firstorder correction.One can increase the number of iterations to add second-and higher-order corrections, but the question is whether at a certain point discretisation effects originating from the Zeldovich grid become evident, imposing a maximum on the acceptable number of iterations  iter .
Fig. A1 and Fig. A2 show the resulting bias for the 2PCF monopole using the fiducial number of 25 bins in  corresponding to Δ = 6 Mpc/ℎ and using 100 bins in  which translates to Δ = 1.5 Mpc/ℎ, respectively.For the fiducial binning, we find that increasing the .Bias of the monopole of the high-mass halo autocorrelation using glass-like random catalogues with  = 1 and Δ = 1.5 Mpc/ℎ, created with different numbers of iterations of the Zeldovich approximation  iter (top panel) and different numbers of grid cells  grid (bottom panel).The grey dotted vertical lines represent the spacing between the grid cells in the fiducial grid with  grid = 1024.As the number of iterations increases, the monopole bias oscillates more and more around zero, with a period corresponding to the grid cell separation.This is attributed to grid alignment of the objects in the glass-like catalogue after too many iterations.Decreasing the number of grid cells increases the small scale bias of the estimates.For a fixed number of iterations  iter = 2, decreasing the grid resolution makes the oscillatory feature apparent, with periods whose dominant contribution is the respective grid cell separation.number of iterations (top panel) reduces the bias on scales below 20 Mpc/ℎ, but introduces an oscillation of the estimator bias around zero after no more than ten iterations.The oscillatory feature in the bias can be understood better when inspecting the results in Fig. A2.Here, the oscillation becomes apparent already after as few as three iterations of the Zeldovich approximation.We suspect that after too many iterations, the objects in the catalogue eventually align with the grid.If that is the case, there are expected to be spikes in the LS estimate at separations that are equal to separations that are prominent in the the periodic grid point distribution.We add vertical dotted grey lines that are spaced with Δ = 3.346Mpc/ℎ which corresponds exactly to the distance between each grid point with  grid = 1024.The period of the oscillations coincides very well with the spacing of the vertical grey lines.For Δ = 1.5 Mpc/ℎ, we find that using more than three iterations is sub-optimal, as then the oscillations start to become noticeable.
Reducing the number of grid cells at Δ = 6 Mpc/ℎ (bottom panel of Fig. A1) leads to an additional small scale bias in the monopole, and this additional bias becomes apparent at larger scales as the number of grid cells is reduced.Further, the bias starts to oscillate around zero as soon as  grid = 256.This is again more apparent for the binning with Δ = 1.5 Mpc/ℎ (see bottom panel of Fig. A2).While with a grid resolution of  grid = 1024 for  iter = 2 there are no oscillations in the bias, decreasing the grid resolution to  grid = 512 leads to an apparent oscillation of the bias around zero with a period that corresponds to the new grid cell separation (i.e.twice the grid cell separation that corresponds to  grid = 1024).This hints to an interplay between the required grid resolution and the maximum possible number of iterations in the Zeldovich approximation in order to get an unbiased result, i.e. optimally, the grid should be as fine as possible to prevent biasing at small separations, and the number of Zeldovich iterations needs to be balanced such that it is high enough to give a converged glass with low bias at small separations, but small enough to prevent the oscillatory feature caused by discretisation of effects.
Fig. A3 shows the same test results as Fig. A1, but for the quadrupole of the 2PCF.The small scale bias is reduced in a similar fashion if the number of iterations is increased, but the oscillation around zero is not as present as for the monopole.Reducing the Zeldovich grid resolution introduces an additional bias on small scales, which grows larger and affects larger scales as the resolution decreases.
Fig. A4 shows the standard deviation ratio of the monopole and quadrupole of the high-mass halo autocorrelation estimate using the glass-catalogue with  = 1 with different  iter and  grid .The standard deviation of the estimate is largely unaffected by different choices of  iter and  grid .There is a slightly larger standard deviation if a smaller grid with  grid = 128 is used.This can be attributed to the fact that for such a very low number of grid cells, the creation of a glass-like catalogue is not working efficiently, as the grid cell size also imposes a lower limit down to which scales the power of the original Poisson-sampled catalogue can be suppressed.For better resolved grids we find that already after one iteration of the Zeldovich approximation, the standard deviation is basically converged.
Since the variance of the catalogue is well-suppressed already after  iter = 1, we suggest that a safe choice in general is not to use values of  iter larger than 2 in order to avoid discretisation effects while ensuring convergence of the glass.We note that Dávila-Kurbán et al. ( 2021) used a larger number of iterations up to  iter = 50.They find that after as few as five iterations the variance is converged.A difference between their work and the work presented here is the definition of the density contrast that is the basis for calculating the Zeldovich displacement.Dávila-Kurbán et al. (2021) choose to use a Gaussian kernel to smooth the particle distribution and estimate the densities on a grid.We use the cloud-in-cell mass-assignment scheme, and self-consistently interpolate the displacement field at the particle positions using a matching kernel.We suspect that using a Gaussian kernel to smooth the particle distribution from which the densities are estimated leads to an underestimation of the repulsive gravitational forces for each particle, because the amplitude of the displacement depends on the amplitude of the density contrast, which is smaller if the densities are smoothed beforehand.This could lead to a larger number of required Zeldovich iterations to reach a glass-like distribution.
We additionally plot the estimator variance for the crosscorrelation between the high-mass halo catalogue and the particle catalogue for different choices of -bin widths Δ in Fig. A5, using either glass-like random catalogues with  = 2 (orange curves) or Poisson-sampled random catalogues with  = 20 (blue curves).It is evident that the variance reduction when using glass-like random catalogues in the estimate is affected by the specific choice of bin width -when using glass-like catalogues, the estimator variance decreases with increasing bin width, while for Poisson-sampled random catalogues, the estimator variance remains largely unaffected.Therefore, the advantage of using glass-like catalogues is larger the larger the width of the -bins in relation to the average inter-particle separation.While the maximum bin width of Δ = 15 Mpc/ℎ tested here gives the best improvement of the variance, for our fiducial setup we use Δ = 6 Mpc/ℎ, to have a finer sampling of the 2PCF estimates.

Figure 1 .
Figure 1.Number density of objects in the particle and halo catalogues studied in this work.The solid lines represent the number of objects per (10 Mpc/ℎ) 3 within spherical shells of thickness Δ ≈ 12 Mpc/ℎ.The orange dashed line represents the constant average number density of the particles, while the blue dashed line represents the third-order polynomial fits to the ( ) curves of the halo catalogues.

Figure 2 .
Figure 2. Number density of objects in the particle and halo data catalogues (dotted), and of corresponding Poisson-sampled random catalogues (dashdotted) and glass catalogues (solid).The curves represent the number of objects per (10 Mpc/ℎ) 3 within spherical shells of thickness Δ ≈ 12 Mpc/ℎ.

Figure 3 .
Figure 3. Projected number of objects in central slices with thickness Δ = 100 Mpc/ℎ through the high-mass halo light cone (top), a high-mass halo Poisson-sampled random catalogue with  = 1 (bottom right) and a corresponding glass catalogue (bottom left).

Figure 4 .
Figure 4. Power spectrum of the particle catalogue in the full box with  grid = 512, for different numbers of Zeldovich iterations  iter .Before applying any Zeldovich displacement the power spectrum is equal to the Poisson shot noise,  ( ) = 1/ n.After one Zeldovich iteration the power on scales larger than the average inter-particle separation approaches  ( ) ∝  4 .The blue dashed vertical line marks the Nyquist wave number  Nyq , the orange dashed vertical line marks the wavelength corresponding to the maximum scale of interest in our LS estimator of the 2PCF,  =  max = 120 Mpc/ℎ.

Figure 6 .
Figure6.Bias of the multipoles of the estimated cross-correlation between the high-mass halo catalogue and the particle catalogue with different choices of random catalogues and values of .Solid lines represent results using glass-like catalogues, while dash-dotted lines represent results from using Poisson-sampled random catalogues.The biases of the LS estimates of the other data catalogue pairs are plotted with a semi-transparent line style.

NFigure A1 .N
Figure A1.Bias of the monopole of the high-mass halo autocorrelation using glass-like random catalogues with  = 1 and Δ = 6 Mpc/ℎ, created with different numbers of iterations of the Zeldovich approximation  iter (top panel) and different numbers of grid cells  grid (bottom panel).The grey dotted vertical lines represent the spacing between the grid cells in the fiducial grid with  grid = 1024.Increasing the number of iterations decreases the bias on scales below 20 Mpc/ℎ.After no more than ten iterations, the estimator bias oscillates around zero.Decreasing the number of grid cells increases the small scale bias of the estimates.

NFigure A3 .
Figure A3.Bias of the quadrupole of the high-mass halo autocorrelation using glass-like random catalogues with  = 1 and Δ = 6 Mpc/ℎ, created with different numbers of iterations of the Zeldovich approximation  iter (top panel) and different numbers of grid cells  grid (bottom panel).The grey dotted vertical lines represent the spacing between the grid cells in the fiducial grid with  grid = 1024.Decreasing the number of grid cells increases the small scale bias of the estimates.

Table 1 .
Summary of data catalogues used in this work.All catalogues are light cones on the full sky with a redshift range of 0.05 ≤  < 0.5.Increasing  at fixed number density of objects is equivalent to increasing the number density of objects within a sphere at fixed .From this argument and from    ∝  3 it follows that in the limit of very large    the variance of     approximately evolves like