Convergence of halo statistics: code comparison between Rockstar and CompaSO using scale-free simulations

In this study, we perform a halo-finder code comparison between Rockstar and CompaSO. Based on our previous analysis aiming at quantifying resolution of $N$-body simulations by exploiting large (up to $N=4096^3$) simulations of scale-free cosmologies run using Abacus, we focus on convergence of the HMF, 2PCF and mean radial pairwise velocities of halo centres selected with the aforementioned two algorithms. We establish convergence, for both Rockstar and CompaSO, of mass functions at the $1\%$ precision level and of the mean pairwise velocities (and also 2PCF) at the $2\%$ level. At small scales and small masses, we find that Rockstar exhibits greater self-similarity, and we also highlight the role played by the merger-tree post-processing of CompaSO halos on their convergence. Finally, we give resolution limits expressed as a minimum particle number per halo in a form that can be directly extrapolated to LCDM.


INTRODUCTION
Dark matter is thought to account for more than 85% of the total matter in the universe.It forms clumps by gravitational attraction, channelling baryons together and serving as birthplaces for galaxies.Cosmological -body simulations track dark matter particles, calculating their position and velocity at discrete time-steps.However, our central goal is to use them to produce predictions for observable objects, such as galaxies, and provide a framework for testing cosmological models.To this end, simulated dark matter particles are processed and bound together into large virialized objects, dark matter halos, whose evolution is expected to closely trace that of their hosted galaxies.Our procedure to identify such structures (halo-finding) requires certain choices on the definition of a halo: in particular, how its boundary is defined, where its centre is, or how particle membership is treated.As a consequence, the extracted properties of such objects are very sensitive to the particular model used to define these structures, and the precision of most statistical measurements is much poorer than for the dark matter field.
The issue of the precision of relevant halo properties is particularly complex, as it combines two distinct issues: that of the precision with which mass distribution in the -body simulation represents the physical limit (previously analysed in Joyce et al. 2021;Maleubre et al. 2022;Maleubre et al. 2023), and that of the halo definition and extraction (which will be the focus of this paper).Numerous ★ E-mail: sara.maleubremolinero@physics.ox.ac.uk ways have been proposed and exploited (see Knebe et al. 2013, for a review), but still different halo-finders running on the same simulation indeed show different results (Knebe et al. 2011).This illustrates the previous statement, indicating that a large fraction of the uncertainty in retrieving information about halo properties is actually due to the process of halo finding itself.We will study here the accuracy at which we can measure different halo properties and statistics when using two different halo finders (Rockstar and CompaSO).
Here, we use the techniques introduced in Joyce et al. (2021) and developed and applied also in Leroy et al. (2021); Garrison et al. (2021a); Garrison et al. (2021c) and Maleubre et al. (2022) to derive resolution limits arising from particle discretization for different halo statistics by analysing deviations from self-similarity in scale-free cosmological models.In particular, we expand the analysis in Maleubre et al. (2023) of the radial component of the pairwise velocity in the matter field to halos, as well as to assess and quantify the limits on the precision with respect to halo-finder.In addition, we revisit and develop further the analysis in Leroy et al. (2021) of the mass functions and two-point correlation function of halos (that uses fof and Rockstar halo-finders), extending the comparison to now include the new halo finder CompaSO (Hadzhiyska et al. 2022;Bose et al. 2022), as well as both larger simulations and scale-free models with different exponents.We underline that our methods allow us to test each halo finder individually for their convergence properties, but does not allow us to conclude whether one is better than the other for constructing observables (as this largely depends on the observable and the scales of interest), or which halo definition is the more physically relevant one.What we can establish are the limits in mass and scale at which halos found using a particular halo finder, and some of their relevant statistics, aren't affected by unphysical scales at a certain precision.
This article is structured as follows.The first part of section 2 recaps what scale-free cosmologies are and how their self-similar evolution can be used to determine the accuracy at which different statistics can be measured in -body simulations.We end the section with a description of the halo statistics that will be analysed.section 3 contains a summary of the simulations used, as well as a brief description of Abacus, the -body code used for their computation.It also summarizes the methods used to estimate convergence of the different statistics, and ends with a summary of the halo finders we compare (Rockstar and CompaSO).In section 4 we present and analyse our results, and finally, we summarize them in section 5.

Scale-free simulations and Self-Similarity
The self-similarity of scale-free models has been widely exploited since the early development of -body simulations, as an instrument to check the reliability of results (e.g Efstathiou et al. 1988;Colombi et al. 1996), and study halo properties (e.g Cole & Lacey 1996;Navarro et al. 1997;Knollmann et al. 2008;Elahi et al. 2009;Diemer & Kravtsov 2015;Ludlow & Angulo 2016;Diemer & Joyce 2019).Following our previous investigations (Joyce et al. 2021;Leroy et al. 2021;Garrison et al. 2021a;Maleubre et al. 2022;Maleubre et al. 2023) we will analyse self-similarity of scale-free cosmologies to extract quantitative constraints on resolution for different halofinders.
In scale-free simulations, the initial power spectrum of fluctuations is a power law of the form () ∝   , where the spectral index n is fixed for each cosmology.They have an Einstein de Sitter (EdS) background (Ω tot = Ω  = 1) following an expansion law () ∝  2/3 , and thus are characterized by just one scale, the scale of nonlinearity.This length scale may be defined by where  2 lin is the variance of normalized linear mass fluctuations in a sphere at a given time.Using linear perturbation theory we infer (2) which gives the relation between time and scale in these types of cosmological models.
From Equation 2we can deduce that, in the absence of additional independent length scales, clustering evolution must behave selfsimilarly.This means that for any dimensionless clustering statistic, its scale and time dependence can be simplified to where each  NL, () encodes the temporal dependence of any quantity with the dimensions of   , inferred from self-similar rescaling.
From Equation 1 and 2 we can define the rescaling quantities used in the current analysis, the characteristic length and mass scales of non linearity.Defining   ≡  lin (Λ,   ) where   is the value of the scale factor at the start of the simulation, we can infer and subsequently where ρ is the mean (comoving) mass density and   = Λ 3 ρ is the mass of a particle in the simulation (with Λ = / 1/3 the mean inter-particle spacing of the initial grid).
In our analysis, as we have been doing for the previous studies, we will use this property of self-similarity to assess the range of scales that a simulation can reproduce at a desired precision, for some given statistic.In particular, this work will treat the reliability of different halo-finders, notably how the resolved scales depend on a halo's particle number.

Halo quantities
In subsection 4.1 we will use self-similarity to test two different halo selection algorithms (Rockstar and CompaSO).
We will start by analysing the convergence of the mass function (HMF) as a function of rescaled mass, as clustering statistics are measured as a function of the mass of halos.We recall that the HMF is just the number density of halos of a given mass at a given redshift.Following the treatment of Press & Schechter (Press & Schechter 1974), it is convenient to express it in terms of the "multiplicity" function  (Jenkins et al. 2001;Tinker et al. 2008) defined by where ρ is the mean matter density, and  2 lin is expressed as a function of mass using  NL ∝  3 NL .For scale-free cosmologies, we can conveniently write  ( lin ) in terms of the rescaled mass / NL as where we've defined ñh ≡  h / and from Equation 2 ln  −1 lin / ln  = (3 + )/6.We refer hereafter to  as the halo mass function (HMF).
We will continue our analysis with the halo-halo 2PCF,  hh (, , ), which is a dimensionless function of the separation  and of the halo mass , calculated at a given snapshot: where  ℎ is the fluctuation in the halo number density.
Similarly, the radial component of the pairwise velocity can be computed as the correlation between two centres weighted by their projected velocity where the velocity difference (v 1 − v 2 ) of a pair of halo centres is projected on to their separation vector r, and < • • • > denotes the ensemble average.In both cases, if self-similarity applies, these statistics are conveniently rewritten in terms of the dimensionless rescaled functions as  hh (, , ) =  hh (/ NL , / NL ) (10) Following the procedure introduced in Maleubre et al. ( 2023), we have used a modification of the analysis tool Corrfunc (Sinha & Garrison 2019, 2020) to calculate both the 2PCF and the radial component of the pairwise velocity.

Abacus code and simulation parameters
We report results based on the simulations listed in Table 1.We make use of the Abacus -body code (Garrison et al. 2021b), which offers high performance and accuracy.It is based on CPU calculations of the far-field forces by a high-order multipole expansion, and an accelerated GPU calculation of near-field forces by pairwise evaluation.The  = 1024 3 simulations were run using local facilities at the Harvard-Smithsonian Center for Astrophysics (CfA), while the larger  = 4096 3 simulations are part of the AbacusSummit project (Maksimova et al. 2021), which used the Summit supercomputer of the Oak Ridge Leadership Computing Facility.
In this work we analyse two different exponents ( = −1.5,  = −2.0),relevant to standard (i.e.LCDM-like) models.We use simulations of two different sizes (, ) but otherwise identical parameters, allowing us to study finite box size effects.For the larger ( = 4096 3 ) simulations, the statistics have been calculated on (random) sub-samples of different sizes (25%, 3%) to facilitate the assessment of finite sampling effects.
We work in units of the mean inter-particle (i.e. initial grid) spacing, Λ = /1/3 , and of the particle mass of the simulation,   = Λ 3 ρ.The essential time-stepping parameter in Abacus has been chosen as  = 0.15 for all simulations, and the additional numerical parameters have been set as detailed in Maleubre et al. (2022) and summarized below.These choices are based on the extensive convergence tests of these parameters reported in our previous studies (see also Joyce et al. 2021;Garrison et al. 2021a).
As in our previous studies, the Gaussian initial conditions are specified at a time  =   fixed by the value of top-hat fluctuations at the particle spacing They are set up using a modification to the standard Zel'dovich approximation as detailed in Garrison et al. (2016).They apply a correction described by particle linear theory (PLT) as reported in Joyce & Marcos (2007), as well as second order Lagrangian perturbation theory (2LPT) corrections.The former corrects the initial conditions for discreteness effects at early times, so that linear theory evolution is exact at a target time  =  PLT .For all our simulations, this time has been chosen to coincide with the first output epoch as described below.This first output epoch ( =  0 ), corresponds to i.e. approximately the time of formation of the first non-linear structures, when fluctuations of peak-height  ∼ 3 are expected to virialize in the spherical collapse model ( ∼   /, with   = 1.68).This time has also been chosen to coincide with the target time of the PLT corrections  PLT .Subsequent output values are spaced by a factor √ 2 in the nonlinear mass scale.Plugging this into Equation 5, we get: In practice, we use log 2 (/ 0 ) as the time variable of our analysis, which indicates how many epochs have passed since the first output, but internally we also make used of a variable  = 0, 1, 2, ..., corresponding to the different outputs of the simulation, with The force softening of all simulations has been fixed in physical coordinates (evolving as  () ∝ 1/ in comoving ones), and taking the value  ( 0 )/Λ = 0.3.Nevertheless, a fixed comoving softening  ( <  0 ) = 0.3 was also imposed to avoid excessively large softening values at earlier times, down to the first output of the simulation.This has been previously tested in Garrison et al. (2021a) and Maleubre et al. (2022), and shown to provide both accuracy and computational efficiency for the present spectral indices.

Estimation of converged values
As in our previous papers, the convergence to the physical limit of the targeted statistics will be studied by analysing the behaviour of their temporal evolution, which becomes time-independent, in rescaled coordinates, in the case of self-similarity.Our final objective is to perform a quantitative analysis of convergence -i.e. to identify estimated converged values, and converged regions at some precisionfor which we need to fix a criterion.The conclusions drawn should not depend significantly on the method, as can be inferred by comparing the results for the simulation with  = −2.0 and  = 1024 3 analysed with Rockstar in this paper with those of an earlier study in Leroy et al. (2021), which used slightly different criteria 1 .The described methods will be equivalent for all halo statistics (  (/ NL ),  hh ,   ,hh /), so we will denote them by  in the following.
The criterion imposed for the current analysis follows that presented in Maleubre et al. (2022), and further used in Maleubre et al. (2023), which we will recap here.It will allow us to estimate both a converged value and a converged temporal region at a chosen precision, per rescaled bin in mass (and halo-separation) for each of the relevant statistics.
We start by calculating an estimated converged value,  est , as the average of the statistic in a fixed-size temporal window minimizing where  max ,  min and   are, respectively, the maximum, minimum, and average values in the window.We say that a bin is converged at precision  if the minimum value of Δ is less than .
We note that  est is calculated in a fixed temporal window, and it is only used to assess whether the statistic is converged it a particular value of the rescaled variable.For the purpose of this study, the width of this temporal window was chosen to be  = 5, where  represent the number of consecutive snapshots, such that the non-linearity scale increases by  NL ∼ 2 /2 ( NL ∼ 2 /6 ).The choice for the size of this window is somewhat arbitrary, but the results should not significantly depend on it.A window that is chosen to be too small will lead to "false positives" in convergence, and contradictory results with the step 2 explained below.On the other hand, a window which Table 1.Summary of the  -body simulation data used for the analysis of this paper.The first column shows the spectral index of the initial PS, and  is the number of particles of each simulation.The third column gives the ratio of the effective Plummer force smoothing length  to mean inter-particle separation (equal to the initial grid spacing Λ), at the time of our first output.This smoothing is fixed in proper coordinates.The last column indicates the halo-finder utilized.
3 Rockstar is too large will exclude smaller but apparently converged temporal regions.
In a second step, we define the entire region of convergence by finding the largest connected temporal window (containing at least three consecutive snapshots, though again this number is not crucial) verifying This allows us to correct for missed converged snapshots in step one, by finding all points "close" (within the required region of accuracy) to the estimated converged value  est .
The reported converged value of the statistic at each rescaled bin is then calculated as the mean value of the statistic within the full resolved region (i.e. the region verifying the condition in Equation 17), and it is denoted by  conv .In this case, the precision at which the statistic is evaluated in the simulation is given by , and when we say that we have a precision at % we mean that  = /100.
While choosing the binning of the different studied statistics, we ensure that they match at different snapshots when rescaled by  NL ( NL ), to facilitate the comparison between them.We use constant logarithmic spacing such that 1 + (Δ/) ≈ 2 1/3 and 1 + (Δ/) ≈ 2 1/12 .In order to reduce statistical noise sufficiently, we have rebinned by grouping two (four) such bins, corresponding to Δ/ ≈ 0.55 (Δ/ ≈ 0.26).All bins in the rescaled variables reported in the results below are labelled by their geometrical centre.

Halo Finders: Rockstar and CompaSO
In this paper we analyse results from two different group-finding algorithms, comparing their resolution for a set of halo-statistics, as well as the accuracy of convergence.COMPetitive Assignment to Spherical Overdensities (CompaSO, Hadzhiyska et al. 2022) is a newly developed halo-finder specifically created to meet the demanding requirements of the AbacusSummit cosmological -body simulations.It runs on-the-fly, as part of the simulation code itself, with two of its primary requirements being keeping up with the high speed of Abacus (Maksimova et al. 2021), and supporting the creation of catalogues and merger trees to be used in the Dark Energy Spectroscopic Instrument (DESI, DESI Collaboration et al. 2016) project.On the other hand, Robust Overdensity Calculation using K-Space Topologically Adaptive Refinement (Rockstar, Behroozi et al. 2013) is a well established, widely used halo-finding algorithm, which uses information from both position and velocity of the particles.
The CompaSO algorithm is a configuration-space, FoF and SO algorithm to compute halos from -body simulations.It first obtains a measurement of the local density using a kernel of the form  = 1 −  2 / 2 kernel , where typically  kernel = 0.4Λ.Particles are linked together into FoF groups (L0 halos) as long as their local density (Δ) is higher than a chosen threshold.The main halos (L1 halos) are then formed inside these groups.Within each group, the algorithm finds the particle with the highest kernel density-the first halo nucleusand makes a preliminary assignment to it of all particles within a radius  L1 (innermost radius enclosing Δ < Δ L1 = 200 in EdS).
Particles outside 80% of  L1 are eligible to become their own halo centre as long as they are the densest within their kernel radius.The algorithm then finds the next highest density among eligible particles, which becomes the next halo nucleus.Particles are assigned to the first nucleus, but if a particle belongs to two halos, the algorithm performs a competitive assignment.This reassigns a particle to a new halo if its enclosed density with respect to the new halo is at least twice that of the old one.The search for new halo centres within L0 continues until no particles remain that are likely to nucleate halos of sufficient density.
CompaSO can sometimes fragment elongated halos into multiple objects, due to its spherical nature, or identify substructure as a distinct halo at one epoch that was already identified as a monolithic halo at a previous epoch.For this reason, a cleaning procedure is performed in post-processing, relying on merger-trees information Bose et al. (2022).This procedure checks what fraction of the particles of a halo at time   come from a much larger halo located at a similar position at time  i−1 and  i−2 .If a sufficiently large fraction did, then the newer halo is deemed a "potential split" and merged into the larger halo.In addition, if at an earlier redshift a halo peak mass exceeds more than twice its present day mass, it is also merged into a more massive neighbour, from whom it had presumably split off.
We will be particularly testing how this procedure improves halos determined by CompaSO.The described cleaning method affects, in general, low-mass halos around more massive ones, appending their particle list to the latter, and resulting in cleaned halo catalogues with a lower number of smaller halos vs. a larger number of bigger halos (Bose et al. 2022, reported that this post-processing method removes 1-5% of objects).As we will show in subsection 4.1, this shifts the value of the HMF in each mass-bin exactly in the correct direction to preserve self-similarity, which is evidence for the good performance of the procedure.
Rockstar is a group-finding method that uses information from both the position and the velocity of the particles.It works in a six dimensional phase-space framework, with an optional time refinement algorithm that tracks mergers.The code starts by creating FoF groups of a linking length larger than standard ( = 0.28 by default), which assures that virial spherical overdensities can be determined within.For each of these FoF groups, a phase-space metric is defined by normalizing the positions and velocities of the particles by the position (2  ) and velocity ( 2  ) dispersions of the group, such that for two particles  1 and  2 the distance metric is defined by: The algorithm now performs a modified FoF in phase-space within each group, where it links particles with and adaptive phase-space linking length such that a constant fraction of them (default 70%) is always linked together with at least another particle into subgroups.
The process repeats for each subgroup, creating a hierarchical set of structures until a minimum size substructure is found at the deepest level.Seed halos are placed at this final structure, and particles at higher levels are assigned to the closest seed halo in phase-space, where now the metric (Equation 18) is calculated with respect to the seed halo.More than one seed can be found within each of the first level FoF groups, corresponding to either a halo or subhalo.This categorization is performed by including temporal information of previous steps, following particle-halo associations across timesteps.During its final step, Rockstar calculates the gravitational potential of all particles using a modified Barnes-Hut method in order to unbind particles.Rockstar defines halo masses by using various (user-specified) Spherical Overdensity (SO) criteria.We generally (except when stated otherwise) restrict ourselves in this study to using the SO mass corresponding to the virial radius, including all halo structures and considering only gravitationally bound mass STRICT_SO_MASSES=0 2 .All halo centres and velocities are calculated using a subset of the innermost particles (∼ 10% of the halo radius), minimizing a Poisson error / √ .Finally, the algorithm has been run using default parameters but for: TEMPORAL_HALO_FINDING = 0 and MIN_HALO_OUTPUT_SIZE = 25.

RESULTS
We will start with the analysis of self-similarity of the HMF, and then explore the convergence of the radial pairwise velocities of halos, comparing it also with that of the 2PCFs.The analysis will compare results from Rockstar and CompaSO halos, with the latter focused on "cleaned" vs. "un-cleaned" results, for the different simulations reported in Table 1.

Halo mass function
We first show in Figure 1 the HMF as a function of rescaled mass / NL , as defined in subsection 2.2.The two left panels correspond to the CompaSO catalogue obtained from the  = 4096 3 simulations of  = −1.5 (upper) and  = −2.0(lower), while the right panels are for the Rockstar catalogues of a  = 1024 3 simulation with the same two exponents.All plots show that the self-similar rescaling appears to apply to a good approximation, especially at late times.The smaller Rockstar boxes show greater deviations at larger rescaled masses, which are simply due to the reduced number of halos in the smaller volume.Comparing the two indices for each halo finder, we observe that the self similarity for  = −2.0suffers small deviations (more notably in the smaller boxes analysed with Rockstar) at smaller scales and times than  = −1.5 (look at halos beyond 10 3 particles).This difference mirrors what we observed in previous analysis for dark matter statistics (Joyce et al. 2021;Maleubre et al. 2022;Maleubre et al. 2023), and reflects the increasing importance of finite box size as the index of the initial power spectrum reddens.
We next study these qualitatively-apparent deviations from convergence quantitatively by considering vertical slices in 1, assessing the self-similarity of the HMF as a function of time in bins of fixed rescaled mass.
Results for the comparison between Rockstar and CompaSO (before and after performing the cleaning process) are shown in Figure 2, for  = −1.5 (left panels) and  = −2.0(right panels).In this figure, each plot shows the results of all halo-finders analysed at three chosen representative rescaled mass bins.We also indicate, on the upper -axis, the number of particles in a halo (/ part ), where  part is the particle mass of our simulations, on the upper -axis.The horizontal lines in the panels indicate the estimated converged value when such convergence is attained at 1% precision, using the criteria as detailed in subsection 3.2 (with  = 0.01).The uppermost two panels correspond to the smallest rescaled mass at which such convergence is obtained (for at least one of the finders), and the bottom panels to the largest such rescaled mass.This value of 1% is chosen because it is approximately the smallest value of  for which we obtain a significant range of contiguous bins satisfying our convergence criteria.
As  NL grows as a function of time, the halos populating a given / NL bin contain more and more particles as time progresses.Thus, each plot effectively shows the measured mass function as a function of increasing resolution.At the same time, as halos get bigger, their number density gets smaller.The average number of halos in each bin decreases monotonically: this number is proportional to the simulation volume in units of the characteristic volume  3 NL , meaning that in the approximation of  2 NL (, ) constant, it's proportional to 1/ NL .Thus, we expect the effects of sparseness of sampling in finite bins to make the signal noisy at late times.We indeed can see this effect in our results, in Figure 2 for the HMF, but further in the analysis also for the 2PCF and the pairwise velocity.Nevertheless, we see that most of our plots are clearly not dominated by such sampling noise, and we can clearly identify systematic dependences in resolution alone.We note that the different halo finders have different mass definitions, so in these figures we do not expect agreement in the value of  (/ NL ) (already reported in Hadzhiyska et al. 2022) but we are interested instead in comparing the time/particle number range in which a convergence to a constant behaviour (i.e.self-similarity) is attained.
Examining these plots, we see several clear trends depending on the halo-finder.Rockstar catalogues show generally good convergence: looking at Figure 2, we see that a 1% precision level is attained starting from the order of 100 particles (regardless of the spectral index ), with degrading convergence at larger mass/later time due to sampling and smaller box size.Regarding CompaSO halos, they show equally good convergence as Rockstar (1% precision) beyond ∼ 1000 particles when the cleaning is performed, while the raw CompaSO catalogues never meet the convergence criteria and show instead a clear monotonic dependence on the resolution.On one hand, the larger number of particles needed for convergence in CompaSO is expected, as the kernel density scale is fixed and does not scale self-similarly (i.e., a new scale is introduced in the problem, which is expected to break self-similarity at low mass).On the other hand, the behaviour displayed by the raw CompaSO is very similar to that observed in Leroy et al. (2021) for FoF-selected halos.Thus, the merger-tree based cleaning (discussed in subsection 3.3 above) appears to correct very appropriately the mass of halos, by increasing by the right amount the number of larger halos at each given time to restore the self-similarity.This behaviour is also expected, as the described cleaning method affects, in general, low-mass halos around more massive ones, appending their particle list to the latter, and resulting in cleaned halo catalogues with a lower number of smaller halos vs. a larger number of bigger halos.As a result, this shifts the value of the HMF in each mass-bin exactly in the correct direction to preserve self-similarity, which provides clear evidence for the good performance of the procedure.
The panels of the bottom row in Figure 2, which probe the most massive halos resolved, show a clear upper cut-off in the convergence of the cleaned CompaSO catalogues at a few times 10 4 particles.Comparing with the plain behaviour seen in the same bin for the smaller Rockstar boxes, which appear to show a down-turn of the data away of the converged value at a slightly earlier time, it appears that these deviations can be attributed to finite box size effects.Further tests against larger Rockstar boxes would be desirable to confirm this and exclude any evidence for residual resolution depen-dence in the cleaned CompaSO, as well as test against self-similarity for different cleaning parameter values.
Indeed, we note that one of the more general conclusions we can draw is that the self-similarity tests on scale-free models are an excellent tool for testing resolution of halo finders.Furthermore, we underline again that we do not claim that our tests allow us to say anything about whether one halo-finder is more physically relevant than the other.We say that self-similarity is a necessary condition for a good physical behaviour, although not a sufficient one, allowing us to place minimal convergence limits on halo finder algorithms.

2PCF and radial pairwise velocities
We now turn to our analysis of the 2PCF, and the mean radial pairwise velocities of halo centres.We have first considered the HMF, as we would expect that any other halo statistics -which are generically expected to depend on / NL -will be self-similar to a good approximation at a given rescaled / NL only if the HMF is too.Amongst other considerations, we will examine below the extent to which this is the case quantitatively for the 2PCF and mean radial pairwise velocity.Looking at Figure 3 and 4, they show, respectively, for the same three rescaled-mass bins as in Figure 2, the temporal evolution of the 2PCF and mean radial pairwise velocity for halo centres as a function of rescaled separation.We display results for the cleaned CompaSO catalogues in the  = 4096 3 simulations and the indices  = −1.5 and  = −2.0.We plot the values of the statistic as a function of / NL , and for all redshifts with data in the given bin of rescaled mass.In each plot we have marked by a black vertical line the scale 2 vir / NL corresponding to twice the virial radius,  vir , of the corresponding rescaled mass.In addition, the shaded area marks the corresponding scale to the minimum and maximum mass limits on the finite bin.Although CompaSO halos may be separated by less than 2 vir (as they are neither spherical nor have a spatial extent directly determined by  vir ) we expect a scale of this order to be an effective lower cut-off to the range in which a physical halo correlation function can be measured.
Figure 3 and 4 display qualitative behaviour similar to that in the statistics we have analysed in previous work: both statistics show clear self-similarity propagating in time from larger to smaller scales.As anticipated, the scale 2 vir / NL does seem to give a good indication of the lower cut-off scale.Perhaps surprisingly, at the latest times computed for the simulations, self-similarity seems even to extend to separations as small as  vir .Further, the plots appear to show, again perhaps surprisingly, that the convergence of   is slightly better than that of the 2PCF.Although the variance of   is greater than that of the 2PCF, and thus should present a weaker convergence, we will see later on that their dependence on the HMF's convergence will play an important role on explaining this behaviour.
Following the analysis detailed in subsection 3.2, to assess and quantify these behaviours, we take vertical slices in Figure 3 and 4. As   / and  are each functions of the two rescaled variables / NL and / NL , each such plot in Figure 5 thus corresponds now to a specific bin of each of these two variables (and self-similarity, again, to a time independent behaviour of the dimensionless statistics).Limitations of space here impose the choice of a few illustrative values of / NL and / NL .
In Figure 5 we show three plots for each of the two statistics, for Rockstar and cleaned CompaSO halo catalogues obtained in the  = 1024 3 and  = 4096 3 simulations of  = −1.5, respectively.The bins correspond to the same three values of / NL as in Figure 2, 3 and 4, matching the bins in which we obtain a satisfactory converged HMF.The statistics are converged at the 2% level, and the converged values are indicated by horizontal lines, with the allowed precision marked by the shaded regions.The value of / NL in each bin has been chosen to correspond approximately to 2 vir / NL , which is roughly the smallest scale from which we observe convergence of both statistics (using the same criteria).Just as in the plots for the HMF in Figure 2, we also plot in the upper x-axis the number of particles in the analysed halos as a function of time.We do not display the results for the raw CompaSO catalogue because this data is almost exactly superimposed on that for the cleaned catalogue for   /: differently to what we observed for the HMF (and for the 2PCF), the accuracy of this halo statistic, and indeed their convergence (see below) is insensitive to the associated re-assignment of particles.The value of 2% precision has (like in the corresponding HMF plots above) been chosen because it is approximately the smallest value of  for which we obtain a significant range of contiguous bins satisfying our convergence criteria, corresponding to the highest precision at which we can in practice establish convergence using our data.
As anticipated from Figure 3 and 4, the convergence of the pairwise velocity (left panels) is indeed significantly better than that of the 2PCF (right panels).Convergence is attained (at a given precision, here 2%) starting from a smaller particle number (i.e. earlier in time).This difference becomes more pronounced in the largest mass bin, as clearly illustrated here in the chosen bin (bottom plots in the figure) in which the 2PCF alone fails to meet the convergence criteria.We believe the explanation for this comes from the very different dependencies of the two statistics on the rescaled mass.
Comparing the converged values of the two statistics in the different mass bins in Figure 5, we see that the pairwise velocity is only very weakly dependent on the mass compared to the 2PCF: the former varies by only 20%, while the latter changes by a factor of 5 (as the mass itself varies by a factor of 50).Errors in the mass assignment of the halos selected in a given mass bin will thus feed through to give a much larger error in the 2PCF.
Examining further the lower bounds to convergence, we observe that the Rockstar data converges on small scales at fewer particles per halo than the CompaSO, while both perform equivalently at larger scales.This is clearer in the lowest mass bin, extending to the larger mass bins, albeit somewhat obscured by the relative noisiness at larger scales of the Rockstar data (due to smaller box size).

Resolution limits for halo statistics in scale-free and LCDM-type simulations
As we have discussed, the lower cut-offs to convergence for the halo statistics we have analysed can be stated as cut-offs on the number of particles per halo, and in the case of the correlation functions and pairwise velocity (which depend also on separation) also in terms of a cut-off on separation in units of the virial radius.Further, in the data shown above we have seen that in practice the requirement on particle number, for a given halo finder, seems not to depend significantly on the mass bin for the HMF or the pairwise velocity at a given scale, at least for the approximately fixed separations (in units of virial radius) which we examined.Figure 6 presents a more complete view of the data to test whether these behaviours are really valid in general: for  = −1.5 (upper panels) and  = −2 (lower panels).The leftmost panels show in each case the lower cut-off to convergence expressed in particles per halo for the HMF as a function of the rescaled mass, while the other two panels show, for the pairwise velocity and 2PCF respectively, the analogous quantity as a function of separation in units of  vir , and for different bins of rescaled mass.In each plot, the two sets of curves shown correspond to the two indicated halo finders (full line/circles to cleaned CompaSO and dotted lines/stars to Rockstar), and each of the curves (or points) to different mass bins / NL .The dashed-thick lines correspond to best (least-squares) fits of a linear dependence on / vir -/ NL for the HMF case -to the data (for each of the halo-finders separately).All results correspond to our best reported precision: 1% for the HMF and 2% for the 2PCF and pairwise velocity.
The plots for   (middle panels) show that the anticipated behaviours indeed hold: the bounds for the different mass-bins collapse approximately onto a single line and can thus be well approximated as a bound on the number of particles per halo as a function of / vir exclusively.On the other hand, the column (rightmost) plotting the  data shows that, although the dependence on / NL is weak for the converged values, the quality of convergence for large mass-bins (and large scales) is significantly reduced with respect to the former statistic.
The behaviours also confirm and further quantify the trends we MNRAS 000, 1-14 ( 2022) capacity to detect the subtle differences resulting from the cleaning of the CompaSO catalogues.
Our exploitation of simulations of different sizes, of several realizations, and of scale-free models with different exponents has allowed us not only to improve the results from previous work, but has also been essential for extrapolating the results to LCDM-like cosmologies.
We have been able to use our data to establish resolution limits at the 1% precision level for the HMF, and at the 2% level for the 2PCF and pairwise velocity in both the Rockstar catalogues and the CompaSO catalogues, provided the cleaned version described in Bose et al. (2022) is used for the HMF.We express the lower limits to resolution for the HMF as a lower limit on the number of particles, which turns out to be roughly independent of the non-linear mass.For the pairwise velocity, which is also a function of separation, we find that the lower bounds on the number of particles are, to a good approximation, independent of mass when plotted as a function of separation in units of the virial radius (corresponding to the given mass).The converged value of the 2PCF highly depends on the massbin analysed, and subsequently on the good categorization of halos' masses, encoded in the HMF.We thus found that convergence of the 2PCF is only attained when the HMF is converged.
Plotting the inferred lower bounds on particle numbers for each of the three statistics, for  = −1.5 and  = −2 simulations, shows that the results have no significant dependence on  and thus can be confidently adopted to LCDM-like simulations.At the 1% level, Rockstar is not able to resolve the HMF below ∼ 100−200 particles, the cleaned version of CompaSO breaks self-similarity below ∼ 1000 particles, and its raw version never achieves this convergence at the same precision.For the 2PCF and pairwise velocities, the 2% precision level is attained with significantly smaller particle numbers than the previous statistic, with the latter requiring the least.For these, the effects of cleaning CompaSO are less significant, as the dependence on mass-bin is suppressed.At small scales, Rockstar exhibits self-similarity starting at a smaller particle number than CompaSO, plausibly explained by the introduction of a fixed kernel density scale in the latter, which the authors assume will certainly affect self-similarity of small objects.This difference decreases as the separation increases and disappears at (10 − 20) vir .
We conclude by pointing out that our analysis has confirmed that self-similarity is a powerful tool to put halo algorithms to the test and compare their resolution.Once again, we underline that we cannot make any claims about which halo-finder is more physically relevant, but we can indeed put limits on their individual resolution.Finally, it would be interesting to explore, in particular, whether the CompaSO algorithm can be further modified in order to improve its resolution at low halo mass, while maintaining its computational speed.

DATA AVAILABILITY
Data access for the simulations part of AbacusSummit is available through OLCF's Constellation portal.The persistent DOI describing the data release is 10.13139/OLCF/1811689.Instructions for accessing the data are given at https://abacussummit.readthedocs.io/en/latest/dataaccess.html.
Data corresponding to the smaller simulations as well as the derived data generated in this research will be shared on reasonable request to the corresponding author.

Figure 1 .
Figure1.HMF as a function of rescaled mass / NL for simulations of spectral index  = −1.5 (upper panels) and  = −2.0(lower panels).The left column shows results from CompaSO (after the cleaning procedure detailed inBose et al. 2022) for the  = 4096 3 simulation.The right column shows results from Rockstar for a single  = 1024 3 simulation.Self-similar evolution corresponds to the superposition of the curves in each plot.The times shown correspond to every third snapshot  = 0, 3, 6, ... over the total span of the simulation.

Figure 2 .
Figure2.Evolution of the HMF for the index  = −1.5 (left column) and  = −2.0(right column) as a function of log 2 (/ 0 ), lower x-axis, and halo particle number (/ part ), upper x-axis, for a set of given mass-rescaled bins / NL .Blue triangles correspond to Rockstar for a single  = 1024 3 simulation, while circles correspond to CompaSO for the  = 4096 3 simulation (orange corresponds to results before merger-tree cleaning and red corresponds to results after).Horizontal dashed lines represent the converged value of the HMF, and the shaded regions indicate that within ±1% of this value.

Figure 4 .
Figure 4. Same as Figure 3 but for the estimation of   /.
ACKNOWLEDGEMENTS S.M. thanks Sownak Bose for guidance and technical support with the halo merger tree code for AbacusSummit, used to clean the CompaSO catalogue of the scale-free simulations used in this paper.She also thanks the Institute for Theory and Computation (ITC) and the Flatiron Institute for hosting her in early 2022, and acknowledges the Fondation CFM pour la Recherche and the German Academic Exchange Service (DAAD) for financial support.S.M. and M.J. thank Pauline Zarrouk for useful discussions.D.J.E. is supported by U.S. Department of Energy grant, now DE-SC0007881, NASA ROSES grant 12-EUCLID12-0004, and as a Simons Foundation Investigator.This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.The AbacusSummit simulations have been supported by OLCF projects AST135 and AST145, the latter through the U.S. Department of Energy ALCC program.