A new test of gravity - I: Introduction to the method

We introduce a new scheme based on the marked correlation function to probe gravity using the large-scale structure of the Universe. We illustrate our approach by applying it to simulations of the metric-variation 𝑓 ( 𝑅 ) modified gravity theory and general relativity (GR). The modifications to the equations in 𝑓 ( 𝑅 ) gravity lead to changes in the environment of large-scale structures that could, in principle, be used to distinguish this model from GR. Applying the Monte Carlo Markov Chain algorithm, we use the observed number density and two-point clustering to fix the halo occupation distribution (HOD) model parameters and build mock galaxy catalogues from both simulations. To generate a mark for galaxies when computing the marked correlation function we estimate the local density using a Voronoi tessellation. Our approach allows us to isolate the contribution to the uncertainty in the predicted marked correlation function that arises from the range of viable HOD model parameters, in addition to the sample variance error for a single set of HOD parameters. This is critical for assessing the discriminatory power of the method. In a companion paper, we apply our new scheme to a current large-scale structure survey.


INTRODUCTION
The cosmological constant Λ was initially introduced by Einstein to produce a stationary universe solution to his field equations (for a historical review see O'Raifeartaigh et al. 2017).However, the nature of this component, now thought to be responsible for the accelerated cosmic expansion, remains unknown and may point to the need to change the theory of gravity (Jain et al. 2013;Heymans & Zhao 2018;Baker et al. 2021).The evolution of the Universe after the Big Bang is imprinted on the large-scale structure, also called the cosmic web, through the interplay between gravity and the expansion rate.This means that the large-scale structure of the Universe not only contains information about the cosmological model but also about the nature of gravity on cosmological scales.Moreover, many models that modify the theory of gravity from general relativity replicate the accelerated expansion without invoking a cosmological constant (Clifton et al. 2012;Joyce et al. 2015;Koyama 2016;The FADE Collaboration et al. 2022;Martinelli & Casas 2021).For such models, new degrees of freedom are introduced, which must be coupled to matter, altering the formation of structure over time compared to GR. Alternative models of gravity will inevitably modify structure formation in a manner that depends on the environment.
Observational constraints on gravity, such as from the dynamics of the solar system, and the more recent detection of binary neutron star mergers, have led to several classes of modified gravity (MG) models that were until recently under consideration now being ruled out (Lombriser & Taylor 2016;Creminelli & Vernizzi 2017; Ezquiaga ★ E-mail: joaquin.a.armĳo-torres@durham.ac.uk & Zumalacárregui 2017; Baker & Harrison 2021).Such theories of gravity modify the propagation velocity of gravitational waves detected in a vacuum, which is inconsistent with the current detections of "multi-messenger" events.However, many models of modified gravity are still viable as they are allowed by local tests of gravity in the solar system and on galactic scales.Such models are continually being tested and further constrained, and include chameleon theories, for example  () gravity (Appleby & Battye 2007;De Felice & Tsujikawa 2010) and Brans-Dicke type theories including the Dvali-Gabadadze-Porrati (DGP) model (Dvali et al. 2000).These modified gravity models are important as they can be used to test GR and the equivalence principle on cosmological scales.In the last decade, Nbody simulations of modified gravity have been developed to study MG in the large-scale Universe, allowing new probes of gravity to be explored (Li et al. 2012;Winther et al. 2015;Arnold et al. 2019a) To further constrain MG models, we need probes that are sensitive to changes in the environment of large-scale structures brought about by the changes to gravity, compared with GR.These impacts are seen in phenomena such as weak lensing (Kilbinger 2015;Durrer 2022), redshift space distortions (Jennings et al. 2012;Ruan et al. 2022), and the marked correlation function statistic (White 2016;Aviles et al. 2020).Here, we focus on the latter, which is a relatively new statistical tool that contains information beyond the traditional galaxy-galaxy correlation function, whilst still being a second moment quantity.The marked correlation function has been used to study the connection between properties of galaxies and their environment, such as luminosity and environmental density, and halo mass (Sheth & Tormen 2004;Percival et al. 2004;Wechsler et al. 2006).
Our main aim here is to introduce a new cosmological probe of gravity, a marked correlation function in which the mark depends on density.To meet this aim we have developed a pipeline to make realisations of mock galaxy catalogues from N-body simulations, using a simple halo model approach.A key feature of our analysis is an assessment of the uncertainty in the model predictions due to the range of halo models that give acceptable fits to the measured two-point correlation function and the number density of the tracers; this uncertainty is often ignored in the literature and could result in an overly optimistic view of the performance of any diagnostic that depends on clustering (Armĳo et al. 2018;Hernández-Aguayo et al. 2019;Valogiannis & Bean 2018;Satpathy et al. 2019).Here, we introduce the new methodology, which improves upon the modelling developed in Armĳo et al. (2018).We apply the approach introduced in this paper to current surveys in a companion study (Armĳo et al. 2023, hereafter Paper II); here we show results for one of the samples considered in Paper II to illustrate the method and leave the discussion of the details of how the mock catalogues are constructed to that paper.
The outline of this paper is as follows: in § 2 we give an overview of the  () theory of gravity, which is the model we use to compare to GR and hence to illustrate our method.The simulations used to understand the modelling of modified gravity are presented in § 3 and the creation of mock galaxy catalogues to replicate the observations is described in § 4. The calculation of the marked correlation function is presented in § 5. Finally, we explain the direction in which this work could go in the future and draw our conclusions in § 6.

THE F(R) THEORY OF GRAVITY
The  () theory of gravity (Sotiriou & Faraoni 2010) is a viable alternative to general relativity.In the standard ΛCDM model, the cosmological constant, Λ, drives the accelerated expansion of the universe at recent times.Instead of invoking Λ,  () gravity models explain the quickening expansion by invoking new physics that arises from the additional degrees of freedom introduced into the equations of motion for gravity (see for example Li et al. 2007).
The  () model of gravity can be viewed as an extension of standard GR through the inclusion of a function,  , of the Ricci scalar, , in the Einstein-Hilbert action where  2 = 8,  is Newton's constant,  is the determinant of the metric   and L  is the Lagrangian density of matter.The form of the  () function can be chosen to mimic the expansion history of the ΛCDM model which is well constrained by observations of the cosmic microwave background and the large-scale structure in the galaxy distribution.The addition of this extra term in Eqn. 1 leads to the modification of all the equations of GR, including the Einstein field equations where ∇  is the covariant derivative of the metric tensor,   ≡ d  ()/d is the new scalar and dynamical degree of freedom that arises from the introduction of the  () term.To solve this new equation and obtain the equations of motion for massive particles, one can take the trace of Eqn. 2 and solve for the case of a perturbation around the standard Friedmann-Lemaître-Robertson-Walker metric.This description of the background evolution of the universe gives two equations of motion.The first is the modified Poisson equation: and the other is for the new scalar field,   : where  m is the matter density field, and an overbar indicates quantities ( ρ and R) defined as mean values for the background cosmology.As we have now defined the Ricci scalar as a function of   in both Eqns 3 and 4, we can combine these to obtain which is a new equation of motion for massive particles including a term which comes from the new scalar degree of freedom.We can understand this new term as the potential −1/2   of an extra force, the fifth force, mediated by the scalar field   , which is sometimes referred to as the scalaron (Gannouji et al. 2012).

The chameleon mechanism
The equations of motion of  () gravity are different from those in standard gravity, and different predictions may result.Nevertheless, local tests already constrain these predictions with great accuracy on certain scales, such as in the solar system (Guo 2014).This means that modified gravity must include mechanisms to hide the new physics which arises from the extra degree of freedom in Eqn. 5 on these scales.This feature is referred to as a screening mechanism (Khoury & Weltman 2004), and is a scale-dependent property of chameleon theories such as  () gravity.On scales where the model is expected to behave as standard gravity, such as in the deep Newtonian potential of the Solar system, Eqn. 4 is dynamically driven to |   | → 0. In this limit, Eqn. 5 reduces to the standard Poisson equation and GR is recovered, hence this theory is viable on these scales (Hu & Sawicki 2007).On the other hand, on scales where the Newtonian potential becomes shallower, the term  − R in Eqn. 4 is negligible and Eqn. 5 reduces to which is the same as the standard Poisson equation, but enhanced by a factor 4/3 when the amplitude of the fifth force is at its maximum and no screening is triggered.An interesting feature of this theory is that to obtain Eqn. 5 no assumption about the form of the  () function is required, which means that this mechanism is independent of the choice of  ().

The Hu & Sawicki model
A popular choice for the functional form of  () is the one proposed by Hu & Sawicki ( 2007) where  2 = 8 ρ0 /3 is called the mass scale, ρ0 is the value of the background matter density today, and ,  1 and  2 are free parameters of the model.The form of this function is motivated by the aim of ensuring that for high curvature values compared to the mass scale,  2 , the term  2 / goes to zero and  () can be expanded as In the limit  2 / → 0, the term  1 / 2 acts as the cosmological constant of this model, and is independent of scale.As we have an explicit form for  () we can set  1 / 2 = 6Ω Λ,0 /Ω ,0 , where Ω ,0 is the matter density parameter today, and Ω Λ = 1 − Ω  .With this configuration, the model follows the same expansion history as the ΛCDM model by construction.Meanwhile, the scalaron field can also be approximated by and we can also evaluate the expansion history today, where  0 ≫  2 .In this scenario, the scalaron solution of Eqn. 4 sits in the minimum of the effective potential, then the Ricci scalar can be solved using the background values (Brax et al. 2008) which removes the dependence between (   ) and the scalaron   .
Then this approximation can be used to solve the term  1 / 2 2 in Eqn 9: which is evaluated with the value of the scalaron today,  0 .By fixing these values the model depends on only two free parameters,  and  0 .These parameters can be constrained using the large-scale structure at late times.One of the fundamental measurements to obtain these constraints is the power spectrum for a range of models with different values of the scalaron amplitude |  0 | when fixing  = 1.

N-BODY SIMULATIONS AND MOCK CATALOGUES.
In this section we describe the N-body simulations used ( § 3.1), the halo catalogues extracted from them ( § 3.2), and the HOD framework adopted to populate the halos with galaxies ( § 3.3).

Simulations of modified gravity
We use simulations of the cold dark matter cosmology with different gravity flavours, standard GR and modified  () gravity from Arnold et al. (2019b).These calculations use the 2016 Planck cosmological parameters (Planck Collaboration et al. 2016): ℎ = 0.6774, Ω m = 0.3089, Ω Λ = 0.6911, Ω b = 0.0486,  8 = 0.8159, and  s = 0.9667.We use a model of  () with amplitude  0 = 10 −5 denoted as F5 and a model of standard general relativity referred to as GR.These simulations use 2048 3 collisionless particles in cubic boxes of length  box = 768 ℎ −1 Mpc resulting in a particle mass of  p = 4.9 × 10 9 ℎ −1 M ⊙ .Here we use the simulation output at redshift  = 0.3.

Haloes and subhaloes
Haloes are identified using the subfind algorithm (Springel et al. 2001).In the first step, the friends-of-friends (FoF) percolation scheme is run on the simulation particles in a given snapshot.The minimum number of particles per group retained after the FoF step is set to 20. subfind is then applied to find the local density maxima in the FoF particle groups, and checks to see if these structures are gravitationally bound.Unbound particles are removed from the membership list.The resulting objects are called subhaloes, which correspond to haloes which fell into a more massive structure at an earlier time and are still in the process of merging with it.We use the positions of these haloes and subhaloes to populate the simulation box with central and satellite galaxies, rather than resorting to sampling spherically symmetric NFW profiles which end at the virial radius, as has been used in many previous studies (e.g.Cautun et al. 2018;Paillas et al. 2019;Armĳo et al. 2018;Hernández-Aguayo et al. 2018).This choice was made to achieve better agreement between the mock catalogues and the observations, particularly on small scales.This saves us the step of creating a halo profile for individual haloes, which would introduce an extra parameter, the concentration, to model the position of satellite galaxies.Taking our approach instead allows the HOD parameters to be constrained more tightly.

HOD galaxy catalogues
The HOD model (Peacock & Smith 2000;Berlind & Weinberg 2002) is an empirical description of the number of galaxies per halo as a function of halo mass.By using the simulated halo and subhalo catalogues we aim here to recreate the BOSS LOWZ sample (Dawson et al. 2013) (we consider this sample along with the CMASS LRG sample of Reid et al. (2016) in Paper II, to which we refer the reader for further details).The HOD prescription gives the number of central and satellite galaxies separately as functions of halo mass (Zheng et al. 2007): In Eqn. 12,  cen is the mean number of central galaxies as a function of the mass of the halo, , and  min and  log  are free parameters.
In the case of satellites, Eqn. 13 is dependent on Eqn. 12 and , because the satellite population of the halo is linked to whether or not there is a central galaxy. 0 ,  1 , and  are free parameters in Eqn. 13.In cases where the number of satellites is higher than the subhaloes attached to an individual halo, the subhaloes are recycled as satellite hosts, and could, in principle, host more than one satellite galaxy.However, this effect is at the sub-percent level for the HOD parameters used.

INFERRING HOD PARAMETERS USING THE MARKOV CHAIN MONTE CARLO METHOD
The HOD framework provides a simple and accurate means of describing a galaxy population defined by a set of selection criteria, to allow a reproduction of the large-scale structure measured in a wide field survey for a given space density of tracers.Here, to illustrate our method, we consider the application of the HOD framework to luminous red galaxies (LRGs) from the Sloan Digital Sky Survey HOD parameter-space adopted for GR and F5 simulations Table 1.The uniform priors adopted for the HOD parameter set, .Extra conditions are applied to some of the prior distributions, such as requiring that  0 >  min and that  1 > 5  0 for every set of HOD parameters.
(SDSS), which have been used to trace the large-scale structure efficiently over a large volume of the Universe (Eisenstein et al. 2001(Eisenstein et al. , 2011)), focusing on the LOWZ sample of Parejko et al. (2013).The objective is to build mock catalogues that match the number density and projected clustering measured for galaxies in the SDSS LRG sample from both the modified gravity and GR simulations.Further details of the LOWZ sample are provided in Paper II.
Several studies have been performed to construct such mock galaxy catalogues (Parejko et al. 2013;Manera et al. 2013;Manera et al. 2014).Here, we try to improve on the procedure used by Parejko et al. in several ways: first, we restrict the redshift range of the samples to reduce the variation in the observed number density, allowing this property to be modelled more accurately, and second, we develop a new scheme for fitting the observed number density along with the clustering.We describe our method in the next section.Another method worth mentioning is that presented by Zhang et al. (2022), in which the HOD parameters are constrained using highorder clustering statistics.The combination of the two and threepoint functions used by Zhang et al. allows, in principle, tighter constraints to be placed on the HOD parameters than when using the two-point function alone.However, the estimation of the three-point correlation function is significantly more time-consuming than twopoint functions (Guo et al. 2015) and so this option is not considered further here, as our pipeline involves estimating the clustering for tens of thousands of mock catalogues.
We use the Metropolis-Hasting MCMC scheme (Metropolis et al. 1953;Hastings 1970) to explore the 5-dimensional HOD parameter space, and obtain the best fitting parameters that replicate the number density and clustering of the LOWZ sample, as an example of how we can fit observations using a metric that depends on both quantities.We define the log-likelihood for each quantity, proportional to the Δ 2 distribution, where  2 is defined by where x is the realization value drawn from the set of parameters, and  is the observable that we are trying to model. −1 is the inverse of the covariance matrix, which includes the uncertainties in the observation of .The above definition is valid for the two observable quantities we are trying to fit,  gal and  p .As we are trying to fit two quantities that are related at some level, such as the clustering and number density of the galaxy sample, we need to consider this when defining the  2 that we want to measure.Here, we define a new, phenomenological form of  2 by combining both measurements: where  n and   p are factors that weight the individual  2 for the number density, , and the clustering,  p , respectively.By adding these quantities, we can fit models to the data using the adopted weights for these two metrics, which in turn can provide a better understanding of the correlation between the clustering and number density, and help us to determine if one is more important than the other when looking for the best fitting HOD parameters.This is a phenomenological, pragmatic solution which we will demonstrate using our mock catalogues.We aim to choose a set of weights that give an unbiased recovery of the number density of galaxies and their clustering.Furthermore, the results should not be sensitive to the precise value of the weights, and, for this reason, we use the same weights for different gravity models.Our definition of  2 scales with the number of bins in the statistic being probed.Thus, without using weights, the correlation function, which is measured in many bins, would dominate over the number density measurement.We argue below that some weighting is necessary to improve the accuracy with which the number density is recovered.It is important to consider the number density as a constraint; if the mocks for the gravity models had different galaxy densities, this could lead to differences in the marked correlation functions that are not due to gravity.We need to determine the weights,  n and   p , and the range of acceptable HOD parameters for each model.We used emcee (Foreman-Mackey et al. 2013), which is a Python implementation of the MCMC algorithm that applies the Metropolis-Hasting ensemble sampler.We build the ensemble using 28 walkers each running for 30 000 iterations (10 000 for the so-called burn-in or settling down phase and 20 000 for the production phase used to estimate the posterior distribution); these choices are motivated by using the autocorrelation time analysis and the Gelman-Rubin (G-R) diagnostic (Gelman & Rubin 1992).For the autocorrelation time,  f , we estimate the convergence at  = 50 f iterations, for all the chains tested.We also calculate the G-R diagnostic for the chains.We provide the parameter space limits applied to the priors, used for searching the HOD parameters, in Table 1.To investigate the impact of the choice of weights, we try three runs with different  2 definitions:  n = 0.15,   p = 0.85;  n = 0.85,   p = 0.15 and  n = 0.5,   p = 0.5.These cases are useful to study over what range we can adjust the metrics without introducing biases into the recovered parameter values and statistics.

The HOD families that reproduce LOWZ results
The clustering of galaxies is a robust probe of the cosmic large-scale structure and the increasingly accurate measurements that have been made over the past twenty years have played an important role in constraining the basic cosmological parameters (Percival et al. 2001;Cole et al. 2005;Sánchez et al. 2009;Reid et al. 2010;Ross et al. 2012;Alam et al. 2015;Icaza-Lizaola et al. 2020).When considering alternatives to GR-ΛCDM, the predicted two-point correlation function and abundance of galaxies should agree with existing measurements.Hence an approach that is becoming increasingly common in the literature is to choose HOD model parameters such that the abundance and projected two-point correlation function of a variant model look as similar as possible to those in GR (Cautun et al. 2018;Paillas et al. 2019).
To search for the HOD parameters that give us mock galaxy samples that mimic the number density and clustering of the LOWZ sample, we need to explore how the choice of weights in the goodness of fit metric affects the recovered statistics.For instance, the number density, which is the mean number of galaxies per unit volume, is represented by one number for every HOD sample, ( gal =  gal / box ), We draw over each  ( gal ) a Gaussian with the same mean and standard deviation as the distributions (smooth red curve).We have rescaled the -axis to the relative difference between the individual measurements of  sim and the target  obs (black dashed line) for the LOWZ sample.We show the uncertainty in the value of  obs (grey shaded area), which is calculated using jackknife resampling.We mark the 1- range for the red curves in each panel (blue line ending in arrows) to highlight the deviation between the  sim distribution and the target  obs for the choice of weight used in each panel.where the volume of the simulation is  box =  3 box .For the data, we also consider  obs =  gal,obs / s , where  s is the comoving volume of the survey.Whilst for the clustering, a measurement of  p is estimated in both the simulation box and the observational data using 13 bins in the projected perpendicular distance range 0.5 <  p /(ℎ −1 Mpc) < 50; this range was selected after testing different choices.For both observational metrics, the uncertainties are estimated using jackknife resampling to account for sample variance, using the full covariance matrix for  p (see, for example, Norberg et al. 2009).As we combine these measurements to fit the HOD model to the observational data, we need to make sure that this re-sults in catalogues with accurate and unbiased measurements of  gal and  p .For example, by giving the majority of the weight to the clustering by fitting  p only, we would end up with a good reproduction of the measured two-point galaxy statistic, but we would miss the target number density by around 15-20 per cent, as shown by Parejko et al. (2013).Such a result would have a strong influence on the calculation of the marked correlation function, which would in turn have an impact on the utility of this test to probe modified gravity, by adding systematic uncertainties in the ranges where we expect the models to differ.On the other hand, by giving more weight to the number density and less to the clustering, we will obtain poorer reproductions of the clustering.The range of "acceptable" HOD parameters will also be broader in the limit of giving increasing weight to the number density, as we are effectively trying to constrain the 5 HOD parameters from, in the limit, one measurement.Hence, a compromise is required in which both observational measurements are recovered without biases or tensions at an adequate statistical level of confidence.
We ran the autocorrelation time analysis and the G-R diagnostic to test the convergence of the MCMC chains for three choices of weight values.By calculating the value of  f , we ensure that the chain has been running for a sufficient number of steps.For cases (1) (  n ,   p ) = (0.15, 0.85) and (2) (  n ,   p ) = (0.5, 0.5),  f ∼ 450, which is the number of samples needed for the chain to forget where it started.Following the estimated number for the convergence suggested by emcee, these models need at least 20 000 iterations.Case (3) with (  n ,   p ) = (0.85, 0.15) converges faster with  f ∼ 300, which is expected, as this model allows a wider range of HOD parameters as a result of the smaller weight assigned to the clustering in the metric.We also compute the G-R diagnostic for the total samples in the different chains, obtaining  = 1.149, for case (1),  = 1.087 for case (2),  = 1.071 for case (3).Convergence is assumed to have occurred for values of  < 1.2, though a value of  = 1.1 or more is considered to be on the large side (Brooks & Gelman 1998).Although, all of the weight cases can be considered as having formally converged according to the  values reported above, the higher  value for case (1) disfavours the weighting scheme where  n = 0.15 and   p = 0.85.
Corner plot showing the MCMC posterior distribution for the HOD model parameters (with the units given in Table 1), for the fit to the LOWZ data.
We use the MCMC method to fit the HOD model from either the GR (red) or F5 (blue) simulations to the data we want to replicate (in this illustration, LOWZ).The diagonal subpanels show the 1-D distribution of the parameters of the posterior distribution,  (  ), with  being the HOD parameters.The off-diagonal subpanels show the 2-D projection of the parameters for all parameter combinations, where the contours are selected using Δ 2 , using 1- (inner lines) and 2- (outer lines), which correspond to Δ 2 of 2.31 and 6.17 respectively.
We illustrate the implications of using different weight values for the estimation of the number density in Fig. 1.The three panels show the distribution of the recovered number density values obtained from the HOD parameters sampled, denoted by  sim .When we compare the distributions to the value from the observational sample,  obs , we can test how good these fits are, paying attention to any systematic shifts.For the first weight case,  n = 0.15,   p = 0.85 (left panel), there is a mismatch between the mean of the distribution of recovered  sim values and the observed value  obs .By comparing the distribution of  sim with a Gaussian distribution with the same standard deviation, we find that there is a tension of around 1- (indicated by the blue line and arrows) between the peak of the histogram and the observed value.Although not formally statistically significant this tension means that less than half of the mocks drawn from the red histogram would have a number density that is in close agreement with the observed value.In comparison, the other weight value cases Figure 4.The expected number of galaxies in a halo, ⟨  ⟩, as a function of halo mass  200 for all the HOD parameter sets which lie within a 1  confidence interval according to the  2 distribution.We show the HOD region for both GR (red) and F5 (blue) models, selecting the best 68% from the Δ 2 distribution with  = 5.
(shown in the middle and right panels of Fig. 1) yield more accurate estimates of  obs .We note that this behaviour is not observed for  p ( p ), as this is a function with more bins, where the weights do not have a significant effect on the quality of the fit.
Another argument we can use to help choose the correct weighting scheme is to examine the form of the Δ 2 distribution.In Fig. 2 we show this distribution for the two weight cases that yield similar results for the fit to  gal , cases (2) and (3).As we use a model with five free parameters (i.e. the HOD model parameters), we expect that the analytic form of the  2 distribution with five degrees of freedom will match that recovered from the MCMC chains.In case (3) the higher weight given to  n reduces the effective number of degrees of freedom to  = 4, so the analytic form of the  2 distribution with  = 4 is a better match to the histogram of values from the MCMC chains.This is expected as more weight is given to one specific bin, the number density value, rather than the clustering.It is only when we give the same weight to both metrics that the expected value of  = 5 is recovered.This is relevant to consider when choosing the best weighting scheme as we ensure that the contribution of these two metrics is consistent with the model we have implemented to fit the data.After comparing the different panels of Fig. 1 and the results shown in Fig. 2, we chose a weight  n ∼ 0.5 to obtain a more accurate estimate of  obs and  p,obs .
Once we fix the weight values to  n = 0.5,   p = 0.5, we plot the posterior distribution of the parameters in our model in Fig. 3.This plot shows what the parameter space likelihood looks like and how different parameters are correlated.Some of these correlations are expected, like the dependencies between  min and  that control the occupation rate of central galaxies in low-mass haloes.Other correlations are more unexpected, like the one between  and  0 , where the latter parameter controls the haloes that contain satellite galaxies, once the low-mass haloes with centrals have been fixed.There are no significant differences between the parameter space distributions for the different weighting schemes, apart from slightly wider preferred regions obtained in case (3), which can be inferred from Fig. 2.
In Fig. 4 we show the resulting HOD functions for the galaxy catalogues for both the GR and F5 models, once we fit the model to the observational data.We focus on the example with  n = 0.5 and  w p = 0.5, plotting a random selection of 1000 HOD curves sampled from the acceptable parameter space we find using the MCMC analysis.For the three different weight values we find similar results in terms of the range of values covered by the HOD parameters.An interesting feature of these HOD parameters is that all three weight cases studied permit  log  = 0, which corresponds to a sharp cutoff in the mass of low-mass haloes that can host a central galaxy.We find that, in general, the weight value cases where equal or higher weight is given to the clustering, i.e. those with  w p = 0.5 or 0.85, cover the same parameter space.Whereas the model that gives more weight to number density (i.e. the one with  n = 0.85) leads to a broader parameter range for those parameters that contribute less to the number density, such as  log  and , but gives tighter constraints on those that contribute more, such as  min .We show in Fig. 5 the results for  p for the same run and models as shown in Fig. 4. In this case, we show the region covered by the individual  p functions selected within the 1- region for the HOD parameters, which means that the shaded region represents the uncertainties in the projected correlation function due to the variation in the values of the HOD parameters that are considered as equally good fits.Again, for the three weight cases considered we see the same features, as expected: the clustering is degenerate with the number density for the range of the HOD parameters we find, and the measurement of  p is unbiased for the different weighting schemes, for both the GR and F5 models.These results indicate a good fit to the clustering overall, with a small deviation at large scales,  p > 20 ℎ −1 Mpc.Nevertheless, this is smaller than the uncertainties from the jackknife resampling.Additionally, our measurements of  p are also consistent with those from Parejko et al. (2013), including the small deviation between the mocks and the data at large scales.

MARKED CORRELATION FUNCTION
The idea of using the marked correlation function as a new probe of large-scale structure and gravity has been tested using mock galaxy catalogues (Armĳo et al. 2018;Hernández-Aguayo et al. 2018), motivated by the theoretical background presented in White (2016), who used perturbation theory to explore the properties of the marked correlation function.In these studies, different definitions of the weights applied to galaxies in the marked correlation function were investigated, including ones based on the local density of individual galaxies, the gravitational potential of different environments, and the host halo mass.All of these properties are expected to differ from those in the ΛCDM paradigm when calculated in modified gravity models, even once the 2-point clustering and abundance have been matched between models.Satpathy et al. (2019) tested the density mark from White (2016), applying this to mocks of the LOWZ galaxy sample, using the marked correlation function defined in redshift-space.These authors concluded that their results are limited by the accuracy of the modelling of small scales in the simulations, where most of the differences between GR and MG models were found in previous studies.No significant deviations from ΛCDM were found by Satpathy et al. on scales between 6 < /(ℎ −1 Mpc) < 69.The simulations used in Satpathy et al. (2019) have limited resolution, which can affect the results on small scales, which motivates us to refine some aspects of their analysis.Furthermore, the analysis of Satpathy et al. is in redshift space, which is dominated by the pair-wise velocity distributions on small scales that require further modelling of  4. The red region corresponds to that covered by all the  p / p curves, and the black dots show the measurement from the LOWZ sample that we used to fit the model.Uncertainties on the observational measurements have been calculated using jackknife resampling.The bottom subpanel shows the residuals relative to the observational data.
differences between GR and modified gravity models.Here, we use projected clustering to avoid such complications.Following White (2016), we define the marked correlation function as where  () is the two-point correlation function and  () is the weighted or marked version of .To implement the measurement of the marked correlation function we simply include the marks as additional weights in the correlation function estimator, where the pair counts are replaced by the multiplication of the weights for each galaxy in the pair.We count pairs from the data and random catalogues, redefining the terms in the correlation function estimator to include the mark: where  gal,i is the value of the total weight for each galaxy, and  ran, is the counterpart for a random point.This is made up of the weight to compensate for observational effects, such as the radial selection function and the redshift completeness,  obs, , and the mark to give a total weight of Randoms are marked by the mean mark m so that the total weight for a random is  ran, = m obs, . ( We use the same prescription employed by Satpathy et al. (2019) to ensure that the weighted correlation functions depend on the local densities around galaxies.For a density-motivated definition, the mark uses an estimation of the local density of an individual galaxy,   , which is defined as the inverse of the volume associated with a galaxy in the density field, in units of the mean density ρ of the field.Then we define a density-based mark of the form where  is a free parameter we can vary, to up-weight different density environments.For example, a selection of  < 0 up-weights low-density regions, where the additional gravity force in MG is prevalent.On the other hand, with  > 0, high-density environments are favoured, and halos in unscreened regimes can be tested.Note that any normalization of  introduced in Eqn.22 will be included in the value of m in the estimators of Eqns.17, 18 and 19.These definitions produce similar results in distinguishing MG from GR to those obtained using the log-transform density field power spectrum or the clipped density field statistic (Valogiannis & Bean 2018).Instead of measuring the correlation function in redshift-space,  (), as was done by White (2016) and Satpathy et al. (2019), we decide to use  p ( p )/ p , the projected correlation function divided by the projected pair separation perpendicular to the line of sight,  p .This is approximately a real-space quantity.Hence, we avoid dealing with the modelling of redshift-space distortions, which would add a layer of complication (see e.g.Cuesta-Lazaro et al. 2020;Cuesta-Lazaro et al. 2023) and can weaken any conclusions by introducing noise.Currently, RSD modelling performs best on intermediate to large scales, where it is more challenging to distinguish modified gravity from GR (Paillas et al. 2019).Another reason for choosing to work in real space is that the effects of RSD modify the local densities obtained from the Voronoi tessellation, as shown in Armĳo et al. (2018), which reduces the signal of modified gravity in the amplitude of the marked correlation function.Finally, measuring RSD on these scales to test modified gravity is not within the scope of this study, which is already known to be difficult to model for  () theories (Hernández-Aguayo et al. 2019).In the next section, we explain more about the choice and calculation of density-dependent galaxy marks.

Local density estimation: the Voronoi tessellation
We base the estimation of the local galaxy density on Voronoi tessellation (Voronoi 1908) in 2D as we are focusing on projected-real space clustering.Voronoi tessellation is a computational method to partition a space according to a given geometrical criterion.The Voronoi tessellation is defined in general by a -plane with  points, where each point generates a -polytope1 that contains all of the region closer to that point than to any other.The estimation of the local density for our galaxies is performed in a 2D projection of the original XYZ 3D Cartesian coordinates.For the simulations, this is a straightforward procedure.In our case, a galaxy sample generates a set of Voronoi cells in two dimensions, each with an area, coming from a projected local volume.We choose a thin 3D slice with width 40(ℎ −1 Mpc), which is selected to maximize the number of galaxies projected in each area, whilst at the same time minimising cutting off individual structures in the different volumes.(The choice of slice width is discussed further in Paper II).With the tessellation area, we define an individual volume   for each galaxy, since the remaining dimension is provided by the thickness of the slice, and define the local projected density: Note that by projecting galaxy positions in slices along the redshift direction before performing the Voronoi tessellation we are effectively applying a smoothing to the galaxy density field, which depends on the depth of the projected slice.Estimating the local density using the Voronoi approach is a relatively inexpensive and intuitive method, where galaxies in overdense environments will have small volumes associated with them and hence high densities, and more isolated galaxies will have larger volumes and therefore smaller densities.Effectively, this is a reconstruction of the density field using galaxies, which shares features with the underlying matter field (Paranjape & Alam 2020).Voronoi tessellations have been used in a wide range of problems in astrophysics and cosmology, such as the identification of cosmic voids (Platen et al. 2007;Neyrinck 2008) and probing the primordial cosmology and galaxy formation (Paranjape & Alam 2020).In Fig. 6 we show the Voronoi diagram of the galaxy distribution.In the left panel, we show the shape of the actual Voronoi cells in the 2D projection of the 38.4ℎ −1 Mpc thick slice, which comes from one of the HOD catalogues produced from the cubic box simulations.Here, the cells of different sizes are generated by tracers of the underlying matter field and are representative of the environment in which they reside.In the right panel, we relate these Voronoi cells to the actual marks  defined by Eqn.22, with an arbitrary positive value for , divided by the value of the mean mark m.Then, we colour each Voronoi cell to show how different regions are up or down-weighted when the marked correlation function is computed.For example, small scales dominated by clusters and groups of galaxies are boosted when counting pairs, whereas pairs that include more isolated galaxies yield smaller marks.

Results
We calculate the marked correlation function for the mock samples using the marks derived from the local density measurements obtained from the Voronoi tessellation.To compute the terms in Eqn.16 we use the Landy-Szalay estimator to calculate  ( p , ).When solving the integral in the projected correlation function we consider separations in the line-of-sight direction, , using logarithmically spaced bins.By doing this, we achieve better accuracy in the integral calculation for the small  separations at which the correlation function changes rapidly.We use the publicly available twopcf2 code to compute the  p ( p ) for the data and mock catalogues; this code supports logarithmic binning and estimators using weighted pairs.The code can also efficiently calculate jackknife errors in a single loop over the galaxy pairs.For the mock catalogues, we select a random sample of 1000 HOD parameter sets selected from the posterior distribution obtained in Section 4. To study the marked statistic of the HOD mock catalogues we select the central 68 per cent of the total sample of values that are closest to the mean of M for each model.
We plot the results for the marked correlation functions M ( p ) of the HOD mock catalogues in Fig. 7.We compare M ( p ) for the GR and F5 models created from the snapshot at redshift  = 0.3, using the random sampling of the HOD parameters within the 1- confidence interval region.The model predictions overlap at separations larger than  p > 3 ℎ −1 Mpc.However, for separations  p < 3 ℎ −1 Mpc the models start to diverge, with only a modest overlap in the errors.These are the  p separations where there is the potential to find a significant difference between the model predictions but for a survey with a better measurement of the number density of galaxies and the galaxy clustering than the LOWZ sample considered here (see Paper II).

CONCLUSIONS
We have introduced a new framework to test gravity on different scales using wide-field surveys.We use galaxies as tracers of the matter field to probe the imprint of modified gravity on the cosmic large-scale structure.Such models aim to provide an alternative to the cosmological constant to explain the accelerating cosmic expansion.
The viable model we study presents two interesting features: the screening mechanism invoked to hide the modifications where GR is known to be accurate, and the additional fifth force arising from the new degrees of freedom in modified gravity.Then, this fifth force can be detected in regions of high curvature at cosmic scales, where GR still needs to be tested (Zhang et al. 2007;Arai et al. 2023).
From the theoretical side, and to predict the behaviour of the marked correlation function, we prepare mock galaxy catalogues using simulations of a ΛCDM-GR universe and compare these with mocks from a simulation which uses the  () theory of gravity with fifth force amplitude of |  0 | = 10 −5 (using the parametrisation of Hu & Sawicki 2007).We use the HOD prescription to populate haloes and subhaloes with central and satellite galaxies, from which we extract the best-fitting parameters in terms of the reproduction of the projected correlation function  p ( p ) and galaxy number density  gal .
We built a phenomenological  2 using the weighted individual  2 from the measurements of  gal and  p , and test different weight values to investigate any systematic shifts or tensions in the recovered quantities.We find that both measurements obtain better results if equal weights are given.This approach suggests ranges of weight values to use to avoid biases in the recovered statistics; with current datasets, these differences are marginal and perhaps best described as tensions (see Paper II).Nevertheless, our approach is objective and reproducible.The final weight choice is based on the definitions of convergence and the individual chains, in addition to the precision with which the measurements can be recovered.In the case of the number density, if too little weight is assigned to its contribution to the overall  2 , the target value is not recovered with the uncertainties included, which favours models of the  2 where equal weight is given to both the number density and clustering.Using the  2 distribution, we choose a range of HOD parameters within the 1- confidence interval to create mocks for both the GR and F5 simulations.Note that the same weight values are used for both gravity models.
We produce accurate mock catalogues that match the  gal and  p measured from observational samples.We find the HOD parameters that best fit these observational measurements using the MCMC algorithm, which leads to a set of mock catalogues that we use to predict the form of the marked correlation function.These mock catalogues incorporate uncertainties from the HOD modelling in the calculation of the marked correlation function, which is in principle larger than sample variance alone.Density-dependent marks are defined using an estimation of the local galaxy density based on Voronoi tessellation.We calculate the marked correlation function for the samples we generate comparing the two models of gravity.
For the LOWZ sample considered as an example here, we are not able to distinguish modified gravity at the level of |  0 | = 10 −5 (F5 model) from GR, when considering the uncertainties introduced by the HOD modelling.This is discussed further in Paper II in which we apply the test introduced here to the LOWZ and CMASS samples, and present more information about the analysis of the observational data.Then, the importance of this test is to show how the marked correlation function can deal with these uncertainties, and how it can break the degeneracy of the number density and two-clustering in the context of MG models.In the companion paper, we consider the constraints from other current surveys and speculate on the type of survey that would be needed to differentiate F5 gravity from GR.

Figure 1 .
Figure1.The  gal ), of the galaxy number density,  sim , recovered for the HOD samples for the different weighting schemes (red histogram):  n = 0.15,  p = 0.85 (left panel);  n = 0.50,  p = 0.50 (middle panel) and  n = 0.85,  p = 0.15 (right panel).We draw over each  ( gal ) a Gaussian with the same mean and standard deviation as the distributions (smooth red curve).We have rescaled the -axis to the relative difference between the individual measurements of  sim and the target  obs (black dashed line) for the LOWZ sample.We show the uncertainty in the value of  obs (grey shaded area), which is calculated using jackknife resampling.We mark the 1- range for the red curves in each panel (blue line ending in arrows) to highlight the deviation between the  sim distribution and the target  obs for the choice of weight used in each panel.

AFigure 2 .
Figure2.The  2 distribution for the MCMC chains of the HOD fits.We show two cases of the weight scheme, which produce similar results for  gal and  p : n = 0.85,  wp = 0.15 (pink histogram) and  n = 0.50,  wp = 0.50 (red histogram).The smooth curves show the corresponding analytical  2 distributions that best represent the data for each case, with  = 4 (pink dashed line) and  = 5 (red dashed line).

Figure 5 .
Figure5.The projected correlation function  p ( p ) as function of the projected separation,  p , for galaxy catalogues created using the HOD samples shown in Fig.4.The red region corresponds to that covered by all the  p / p curves, and the black dots show the measurement from the LOWZ sample that we used to fit the model.Uncertainties on the observational measurements have been calculated using jackknife resampling.The bottom subpanel shows the residuals relative to the observational data.

Figure 6 .
Figure 6.Left: Voronoi tessellation of the overlying galaxy distribution (red points) for a slice of thickness Δ = 40 ℎ −1 Mpc of the GR simulation matter distribution (grey points).The polygons indicated by the white lines are calculated using Voronoi tessellation for the slice projected in the XY plane.The tessellation generates a set of polygons each containing a galaxy, based on the galaxy's nearest neighbours.We use this area to estimate a value of the local density   for the galaxies in the sample.Right: same as in the left panel but with the individual Voronoi cells coloured according to the value of the mark  of the galaxy in that cell, divided by the mean mark m.Marks are defined as a mathematical function of  (Eqn.22) and used in the clustering estimators.Colours indicate by what factor of the mean mark m the weight is boosted.

Figure 7 .
Figure7.The marked correlation function M (  ) as a function of the projected distance   for the HOD mock galaxy catalogues from the GR (red) and F5 (blue) simulations.Top panel: M (  ) for the HOD mock catalogues within the 1- confidence interval from the MCMC fitting of the two-point clustering and number density of the targeted sample.The shaded areas for the models come from selecting the best-fitting 68% of HOD catalogues for each model, GR, F5 at redshift  = 0.3, the mean redshift of the survey, (dark red and dark blue).The bottom panel shows the relative residual taking the median of the GR simulation HOD catalogues as a reference.