An improved Halo Occupation Distribution prescription from UNITsim H_alpha Emission Line Galaxies: conformity and modified radial profile

Emission line galaxies (ELGs) are targeted by the new generation of spectroscopic surveys to make unprecedented measurements in cosmology from their distribution. Accurately interpreting this data requires understanding the imprints imposed by the physics of galaxy formation and evolution on galaxy clustering. In this work we utilize a semi-analytical model of galaxy formation (SAGE) to explore the necessary components for accurately reproducing the clustering of ELGs. We focus on developing a Halo Occupation Distribution (HOD) prescription able to reproduce the clustering of SAGE galaxies. Typically, HOD models assume that satellite and central galaxies of a given type are independent events. We investigate the need for conformity, i.e. whether the average satellite occupation depends on the existence of a central galaxy of a given type. Incorporating conformity into HOD models is crucial for reproducing the clustering in the reference galaxy sample. Another aspect we investigate is the radial distribution of satellite galaxies within haloes. The traditional density profile models, NFW and Einasto profiles, fail to accurately replicate the small-scale clustering measured for SAGE satellite galaxies. To overcome this limitation, we propose a generalization of the NFW profile, thereby enhancing our understanding of galaxy clustering.


INTRODUCTION
The study of the large-scale structure in the Universe plays a crucial role in contemporary cosmology.It serves as a powerful tool for understanding the fundamental properties of our Universe, including the values of different cosmological parameters and the nature of dark matter and dark energy.Galaxy clustering is an observable that allows us to probe both the cosmological parameters and the physics of galaxy formation.
Over the past few decades, numerous endeavours have been dedicated to creating large galaxy maps, such as the 2dFGRS (Cole et al. 2005), SDSS (York et al. 2000;Eisenstein et al. 2005), WiggleZ (Drinkwater et al. 2010;Parkinson et al. 2012), BOSS (Dawson et al. 2013;Alam et al. 2017), eBOSS (Dawson et al. 2016;Alam et al. 2021a), and DES (The Dark Energy Survey Collaboration 2005;Abbott et al. 2018).These endeavors have enhanced our understanding of the Universe, by providing measurements of its expansion history and increasing the constrains on dark energy and alternative theories of gravity.These cosmological surveys have provided the commu-★ guillermo.reyes@uam.es† savila@ifae.es‡ violetagp@protonmail.com nity with large 3-D maps of galaxies up to  ∼ 1, allowing for a better understanding of the evolution of galaxies.Despite significant progress, these areas of investigation remain open and ongoing.New generation surveys, including Euclid (Laureĳs et al. 2011;Amendola et al. 2018), Dark Energy Spectroscopic Instrument (DESI, Adame et al. 2024), the Nancy Grace Roman Space Telescope (from now Roman, Spergel et al. 2013Spergel et al. , 2015) ) and the 4-metre Multi-Object Spectroscopic Telescope (4MOST, de Jong et al. 2012), are set to map galaxies beyond  ∼ 1 and to increase the density of QSOs at higher redshifts.One of the main tracers beyond  ∼ 1 are galaxies with strong spectral emission lines or emission line galaxies (ELGs).In particular, Euclid utilizes near-infrared grisms to observe galaxies with a strong H  spectral line.eBOSS and DESI target ELGs with a strong [O ii] emission.The properties of H  and [O ii] emitters can be slightly different, due to a different number of lines coming from narrow line regions in AGNs and shocks (e.g.Favole et al. 2024).Nevertheless, these galaxies are dominantly starforming galaxies and results for one type will be broadly adequate for the other.
From both hydrodynamical and semi-analytical models (SAMs) of galaxy formation, we expect the connection between galaxies selected by their star-formation rate and dark matter haloes to be more complex than for stellar-mass selected samples (e.g.Zheng et al. 2005;Orsi & Angulo 2018;Gonzalez-Perez et al. 2020).SAMs of galaxy formation and evolution populate dark matter haloes at early cosmic times with gas and let them evolve with a set of coupled differential equations (e.g.Cole et al. 2000;Croton et al. 2016;Hirschmann et al. 2016;Cora et al. 2018;Lagos et al. 2019).These equations aim to encapsulate the physical processes that are understood to be relevant during the formation and evolution of galaxies, such as the cooling of gas, the formation of stars, the interplay between super massive black holes and the intergalactic medium, etc (see the reviews on modelling galaxies by Baugh 2006;Somerville & Davé 2015;Wechsler & Tinker 2018).SAMs are faster than hydrodynamical simulations (e.g.Schaye et al. 2015;McCarthy et al. 2017;Springel et al. 2018;Pillepich et al. 2018), as they are ran on dark matter only N-body simulations, at the expense of loosing the effect that baryons have over dark matter haloes (Schneider & Teyssier 2015;Aricò et al. 2020).
In this work we use SAGE, the SAM introduced in Croton et al. (2016), to explore how ELGs populate dark matter halos.SAMs have been previously used to produce Euclid and Roman-like H  emitter catalogues that have been used to understand their clustering and relation to their host halos (Merson et al. 2018;Zhai et al. 2019Zhai et al. , 2021b,a;,a;Knebe et al. 2022).Here we use the SAGE run on the UNIT dark-matter only simulation (Chuang et al. 2019), as described and released in the work by Knebe et al. (2022).This simulation uses the fixed and paired technique to increase its effective volume, allowing more precise statistical analyses (Angulo & Pontzen 2016).The results for H  ELGs can be generalised to galaxies selected using other spectral emission lines, so throughout this paper, we will refer to our target galaxies, simply as ELGs.
SAMs can produce model galaxies accounting for a large range of physical phenomena in simulation volumes comparable to those required to interpret observations from current cosmological surveys.Nevertheless, they are still too slow to be able to produce many realisations and they require the construction of merger trees tracing the evolution of halos in fine time slices, which is not available in many simulations due to computing/storing limitations.The Halo Occupation Distribution (HOD) model provides a simplified way to describe the relation between galaxies and haloes and can be constructed from a single halo catalogue.These simplifications makes HOD valid for the target observables they are constructed to match, but sometimes will show limitations when trying to extend it to other observables (e.g.Chaves-Montero et al. 2023).
HOD models are computationally efficient.They have demonstrated their capability to reproduce the observed clustering across a range of galaxies, including the dependency with luminosity and colour (Zehavi et al. 2005;Christodoulou et al. 2012;Carretero et al. 2015).In their basic form, HOD models assume that the mass of haloes determine the average number of galaxies they can host (e.g.Benson et al. 2000;Berlind et al. 2003;Zheng et al. 2005).Catalogues of model galaxies produced with HOD models can be produced fast enough as to choose the best fit to a set of clustering observables using statistical techniques that require hundreds or thousands of realisations in large simulation volumes (e.g.Alam et al. 2021b;Yuan et al. 2023a;Rocher et al. 2023a).HOD models have become a popular tool for producing realistic mock catalogs in large simulation volumes to match the clustering of a target galaxy sample (e.g., in BOSS, DES and eBOSS: Manera et al. 2013;Avila et al. 2018Avila et al. , 2020)).
The objective of this work is to construct an HOD model that accurately reproduces the galaxy clustering of the ELGs modelled by SAGE.In particular, we explore if two of the usual assumptions made by HOD models should be challenged when compared with model ELGs.One of this usual assumptions is that central and satellite galaxies of a certain type are independent events within a halo, a lack of conformity.The other is the modelling of the radial profile of satellite galaxies.
One of the main focus of this paper is to study the effect that galactic conformity has on the clustering of ELGs.The term galactic conformity was first coined by Weinmann et al. (2006) to describe strong correlations between the properties of satellite galaxies and their central galaxies in data from the Sloan Digital Sky Survey (SDSS).A strong 1-halo conformity for ELGs has recently been suggested from the comparison of DESI data with model galaxies produced with different HOD models (Rocher et al. 2023b;Gao et al. 2023).Overall, galactic conformity remains an active research topic and a significant puzzle in the formation and evolution of galaxies.
The second main focus of this work is to explore whether the typical assumptions made by HOD models for the radial profiles of satellite galaxies are upheld.Commonly, HOD models assume that the distribution of satellite galaxies within the dark matter halo follows the distribution of the dark matter itself, or often an NFW (Navarro et al. 1997) or an Einasto (Einasto 1969) profile.We will try different ways to sample these types of profiles and also propose an extension.
The plan of this paper is as follows.In section ( § 2) we introduce UNIT DM simulation.In section ( § 3) we describe the model galaxy sample of reference.First ( § 3.1), we introduce UNITsim-SAGE, how our reference galaxy sample is built and we describe the average halo occupation distribution that we find.We then describe in ( § 3.2) how we build the final sample of reference by applying the shuffling method to the previous model sample.In section ( § 4), we detail the various halo occupation models and properties that we employ to generate mock catalogues.Section ( § 5) discusses conformity, the method we employ to model it and its effect in galaxy clustering.Section ( § 6) is devoted to studying the radial profile of SAGE and how to reproduce it in order to recover an accurate galaxy clustering.Finally, in Section ( § 7), we conclude and discuss our findings and future prospects.

UNIT DM SIMULATION
The UNIT suite is a set of N-body cosmological simulations (Chuang et al. 2019) designed to cover the expected halo masses for emission line galaxies (ELGs), in particular [OII] emitters targeted by DESI (∼ 10 11 ℎ −1 M ⊙ Gonzalez-Perez et al. 2020) and Euclid ELGs (∼ 4 × 10 11 ℎ −1 M ⊙ Cochrane et al. 2017).To reduce the cosmic variance in the UNIT simulations, we employ the Fix and Pair method (Angulo & Pontzen 2016).Following this method, two pairs of simulations are generated with initial conditions with a power spectrum with an amplitude fixed to the expected value, given by the input power spectrum.The initial conditions of one of the simulations from each pair is set with the opposite phases than the other one from the pair (Chuang et al. 2019).These simulations were generated using GADGET2 (Springel 2005), which fully solves the gravitational evolution of the continuous distribution of dark matter.The phasespace halo finder ROCKSTAR (Behroozi et al. 2013a) was used to identify haloes from the 129 existing snapshots.Subsequently, the ConsistentTrees software (Behroozi et al. 2013a) was employed to calculate their merging histories.The software tracks dark matter halos (regions of the universe where gravity has made matter collapse into denser structures) as they evolve over time and move in space, constructing a tree-structure that illustrates how these halos merge together to form larger halos.These merger trees are 'consistent' in the sense that the mass and position of a halo at a given moment in  1. Properties of the three model galaxy catalogues described in section 3. The first column on the left shows the name given to the samples.
In the second column we quote the galaxies per square degree per redshift interval from Pozzetti et al. (2016) for the first two samples and the one we obtain applying the H  flux Euclid-expected limit to the SAGE sample.The third column contains the corresponding numerical volume densities ().The fourth column shows the H  flux limit needed to recover the target  for the Pozzetti et al. models and the flux limit expected for Euclid.Our reference model from subsection 3.2 onwards is Pozzetti nº3 (or the shuffled version of it, see subsection 3.2 ), which will be referred to as SAGE (or SAGE sh , when shuffled).
the tree are informed by the mass and position of its progenitors and its descendants.
We perform our analysis over the simulation snapshot at  = 1.321, which approximately corresponds to the mean redshift expected for Euclid ELGs, 0.9 <  < 1.8.

UNITSIM-SAGE: REFERENCE GALAXY SAMPLE
We aim to provide an improved Halo Occupation Distribution model for ELGs, focusing on conformity and satellite radial profiles.Our reference catalogue of galaxies is generated from the UNITsim-Galaxies catalogues described in Knebe et al. (2022) and released along it 1 .This catalogue is based on the SAGE semi-analytical model of galaxy formation and evolution (Croton et al. 2016) by populating the dark-matter-only UNITsim simulations (section 2).For each model galaxy, spectral emission lines associated with star formation events are obtained with the get_emlines model described in Orsi et al. (2014).One of these spectral lines is the H  , targeted by Euclid and Roman.The spectral emission lines are then attenuated by dust using a Cardelli et al. (1989) law following the code developed in Favole et al. (2020).All these models were implemented in the catalogue in Knebe et al. (2022) and we refer the reader to that publication for a more detailed description.
In order to single out the modelling of both conformity and the satellite radial profiles, we use a shuffled sample, SAGE sh ( § 3.2).To construct this sample, we remove the galaxy assembly bias: the effect that other properties beyond halo mass have on the distribution of galaxies within haloes.

Model ELGs with an Euclid-like number density
In order to produce catalogues of ELGs, we aim to reproduce the mean number density forecasted by Pozzetti et al. (2016) over the entire redshift range observed by Euclid: 0.9 <  < 1.8.For this, we use a simulation snapshot at  = 1.321, which approximately corresponds to the middle of that redshift range.This approach is slightly different to the one taken in Knebe et al. (2022), where we matched the Pozzetti et al. (2016) differential luminosity function, at the redshift of the snapshot.Pozzetti et al. (2016) provides a range of forecasts, with the extreme number densities being  1 = 1.299 × 10 −3 Mpc −3 h 3 and  3 = 6.731 × 10 −4 Mpc −3 h 3 , which are tabulated in the first column of Table 1.These volume number densities, , are obtained from the angular ones,  =  2  ()   , provided by Pozzetti et al. (2016) (third column in Table 1), as follows: where  c () is the comoving distance to the object at redshift .We apply the flux cuts indicated in Table 1 to get the extreme number densities forcasted by Pozzetti et al. (2016).These are well below 2 × 10 −16 ergs s −1 cm −2 , the expected H  flux limit for Euclid.If we apply this cut to the UNITsim-Galaxies catalogue, we obtain a sample with a number density 1.723 × 10 −4 Mpc −3 ℎ 3 , far below those from Pozzetti et al.'s models.The exact shape of the SAGE H  luminosity function is very sensitive to the dust attenuation modelling, in particular at the tails of the distribution, sampled by the flux limits.An alternative approach would be to fine tune the dust attenuation model until we match the Pozzetti et al. (2016) luminosity function as done in Zhai et al. (2019).However, this approach is not clear to be more physical than the one taken here, where we are following the approach described in Knebe et al. (2022).A more physical alternative would be to fit simultaneously the dust attenuation parameters, the emission line modelling and the SAM to different available observations with luminosity functions from different lines or even clustering measurements.However, this is beyond the scope of this paper and deserves and stand-alone study.
In the upper panel of Figure 1, we present the halo mass function of main dark matter haloes and galaxies for the three samples summarised in Table 1.In this figure, we separate the contributions of central (solid curves) and satellite galaxies (dashed curves).The halo mass functions for dark matter haloes follow a typical shape close to a power law, with a change in the slope at the massive end.Central ELGs follow this trend above a mininimum halo mass.The numbers of both central and satellite galaxies decrease rapidly below a minimum halo mass.The number of ELGs is lower than that for haloes for all halo masses.As expected, satellite galaxies populate more massive haloes than centrals.
The average halo occupation distribution (HOD) for model ELGs selected with different number densities is shown in the lower panel of Figure 1.Satellite ELGs follow the typical (truncated) power law seen for model satellite galaxies (e.g.Zheng et al. 2005;Avila et al. 2022).
The average HOD for central model galaxies do not reach unity, as it would do in a complete sample sample of galaxies without an particular type selection.This was expected from previous theoretical works on star-forming and ELGs (e.g.Zheng et al. 2005;Gonzalez-Perez et al. 2018;Zhai et al. 2021b) and also found for DESI ELGs (Rocher et al. 2023b).This implies that we do not expect to find a H  emitter in the center of every halo above a certain mass threshold.In Avila et al. (2020), a similar shape for the mean HOD of centrals galaxies, although with a steeper decay, was described as a asymmetric Gaussian.
From this point onward, we use Pozzetti et al.'s Model 3 by default, labelled as SAGE.This model has been used as the baseline for several Euclid studies, as it is expected to be the most realistic.1).The solid curves correspond to the central galaxies and the dashed curves to the satellite galaxies.The mass function corresponds to the density of haloes/galaxies for a given halo mass divided by the halo mass bin size.To achieve this, histograms of objects in the halo/galaxy catalogues are created as a function of  200c , and they are normalized by their volume (1ℎ −1 Gpc) and the size of the corresponding log halo bin.Bottom: the mean halo occupation distribution as a function of halo mass of the haloes.This is compute as the ratio of the galaxy to halo mass functions.

Evolution with redshift
We now describe how we also prepare the same galaxy sample for different redshifts within the range explored by Euclid, 0.9 <  < 1.8.The goal is to study how the findings for our reference sample are not unique to the particular redshift selected above.For this, we follow the approach by Knebe et al. (2022), where we considered 4 different snapshots in this range ( = 0.99, 1.22, 1.42 and 1.65)   with a number density matched to the differential luminosity function number 3 from Pozzetti et al. (2016).The number density and flux cuts resulting at these four redshifts are listed in Table 2.Note that this procedure is somewhat different to our default  = 1.3 analysis where we are targeting the average number density over the entire redshift range, according to Pozzetti et al. model number 3 (Table 1).
Table 2 shows that the number densities increase by a factor of 4 from the lowest at  = 1.65 to the highest at  = 0.99.The number density for the sample at  = 1.321 (Table 1), agrees with this trend, despite being calculated differently.The variation in number density is smaller than the factor of 7 found among the samples at  = 1.3 described by Table 1.
The average halo occupation distribution (HOD) for ELGs selected with our Pozzetti Model 3 at different redshifts shown in the Figure 2, showing smaller variations of the mean HOD than those seen in Figure 1 at fixed redshift.The typical mass of haloes hosting ELGs increases slightly with redshift, as expected by having selected brighter (in H  ) galaxies to obtain the lower number den-sity predicted by the luminosity function.In these lines, the largest difference in mass is observed for  = 0.987, which also shows the larges difference in number density.

SAGE sh : the shuffled reference galaxy sample
At first order, galaxy clustering at large scales only depends on the mass of the host haloes.However, galaxy clustering can be affected by other halo properties, what is usually known as assembly bias.Assembly bias has been measured from simulated haloes and galaxies in different ways (e.g.Lacerna & Padilla 2011;Contreras et al. 2021;Jiménez et al. 2021).Moreover, there is compelling observational evidence of galaxy assembly bias (Obuljen et al. 2020;Yuan et al. 2021;Wang et al. 2022;Alam et al. 2023).
Properties encapsulating the environment of haloes have been found to be the most relevant secondary property (e.g.Xu et al. 2021).However, in previous articles, one can observe how other internal properties of the halo, such as  max (which is the maximum circular velocity of the halo) or the haloes own concentration, could explain at least part of the assembly bias.However, we attempted to construct an HOD prescription based on  max instead of  halo (à la Figure 1) and we found that it captures less clustering than the  halo based HOD.The way to quantify the effect was to check the goodness in the recovery of the 2-halo term (large scales) clustering of SAGE (see discussion of Figure 3

below).
This work is focused on the exploration of both conformity and the satellite radial profile.To isolate the effect that varying these properties have on clustering we build a reference galaxy sample for which the effect of assembly bias is removed, SAGE sh .We can achieve this by shuffling all galaxies within a given halo to the position of another halo of similar mass.This technique was first proposed in Croton et al. (2007).In this proccess, the relative positions (and velocities) of galaxies to the center of a halo as well as internal properties of galaxies are preserved.It is worth noting that the bin size for shuffling coincides with the same binning used to calculate the mean galaxy occupation value (Figure 1), here the range of the mass is 10.5 < log 10 (M 200c /M ⊙ h −1 ) < 14.5 with a binning of Δlog = 0.057.
Throughout this paper we quantify the clustering by calculating the real-space two-point correlation function, , with the publicly available code Cute (Alonso 2012)2 with logarithmic binning, where the number of bins is equal to 80, the bins in r per decade is 8 and the  max is 140ℎ −1 Mpc.The clustering of model ELGs before and after the shuffling process is shown in Figure 3.By construction these two agree for the 1-halo term contribution to the clustering (pairs of galaxies within the same halo).From ∼ 0.2Mpc/ℎ, the clustering of the SAGE and SAGE sh ELGs start to differ, and stabilises at larger scales to a difference of about 10 per cent3 .This difference in the scales dominated by the 2-halo term (pairs of galaxies from different haloes) is due to the history of formation and environments of dark matter haloes, the assembly bias.While for scales smaller than 0.22, the clustering is correctly repopulated, that is, the shuffling does not affect the 1-halo.Note that the shuffling procedure does not alter the two properties we are investigating here, the 1-halo conformity and the distribution of satellite galaxies within haloes as, by construction, the positions and properties of galaxies within haloes are preserved.
In what follows we attempt to reproduce clustering of the shuffled UNITsim-SAGE H  catalogue, that we label in short SAGE sh .In Figure 3 we can see our starting point, labeled as Vanilla HOD.
In that figure, we can also see the model proposed (default, see subsection 4.3) after the improvements developed in section 5 and section 6.We see that the gain in recovering our reference clustering from SAGE sh is very remarkable, with our default model closely following the clustering of SAGE sh .We describe the main properties of these two models in the next section.

HALO OCCUPATION DISTRIBUTION MODEL
In this section we summarise the halo occupation distribution (HOD) model we use to populate with SAGE sh -like galaxies the UNIT dark matter only simulation.Halo Occupation Distribution (HOD) models are widely used to assign galaxies of different types to dark matter haloes (e.g.Avila et al. 2020).In their basic form, the mass of the halo determines the average number of galaxies that it hosts (e.g.Benson et al. 2000;Berlind et al. 2003;Zheng et al. 2005).Central and satellite galaxies are modelled separately in HOD models.In order to implement an HOD galaxy assignment prescription, we first need to choose the shape of the mean HOD ( § 4.1.1),then a probability distribution function determines if a halo contains or not a central galaxy and how many satellites ( § 4.1.3).The radial positions ( § 4.1.4)and velocities of satellite galaxies within haloes are chosen next.In this work, we have disregarded the velocities of satellites as our focus is solely on the two-point correlation function statistic in real space.
We start with a vanilla HOD model ( § 4.2), with assumptions commonly employed in the literature and propose a new model that better fits the characteristics of the model ELGs from SAGE sh , our default HOD model ( § 4.3).The default model incorporates 1-halo conformity and utilizes the analytical function proposed in subsection 6.5 for the radial profile of our satellite galaxy distribution.The clustering of mock galaxies produced with these two models, are compared to the SAGE and SAGE sh in Figure 3.This Figure shows how the vanilla HOD model departs from the expectation of SAGE sh ELGs for the 1-halo term.We see that this model under-predicts the clustering by ∼ 20% at 0.3Mpcℎ −1 and over-predicts it by more than 50% at 0.1Mpcℎ −1 .On the other hand, the default model reproduces much better the galaxy clustering at small scales.

Ingredients of our HOD models
Below, we detail the different aspects that make up the HOD models we use for our analysis.We do not incorporate the velocity profile of the satellites, in this study, as we focus only on the clustering in real space.

Mean HOD shape
For this work we need to know the expected number of central and satellite galaxies within a halo of mass  is represented by ⟨ cen ()⟩ and ⟨ sat ()⟩, respectively.The Figure 1 (bottom panel) illustrates the outcomes obtained for the halo occupation function as a function of halo mass, noting again that we will focus in model 3.As mentioned in subsection 3.1, to obtain the corresponding mean values, we first count the number of H  galaxies of each type that we have in a series of mass intervals and determine the average occupation values by dividing by the number of haloes in each interval.These mass intervals range from 10.5 to 14.8 in terms of log(M).
In this work, we do not fit a smooth curve to the measured mean HOD, but simply use it as measured.This is order to have something as close as possible to SAGE and avoid having other sources of contribution to differences in the clustering and focusing on the main properties of study in this work (conformity and radial profiles).When applying our HOD model to a halo catalogue, for each main halo, we read its mass, which will fall in one of the mass-bins defined in our HOD.We then assign to this halo a expected number of ⟨ cen ⟩ central galaxies and ⟨ sat ⟩ satellite galaxies, according to the mass bin.In order to sample the mean HOD, for each main halo in our simulation Nevertheless, the latter one (⟨ sat ⟩) will be modified by a factor ( 1 or  2 , see section 5) whenever we are considering conformity.

Conformity
In this work we define the 1-halo conformity, or just conformity, as considering that central and satellite galaxies are not independent events for the HOD model.Conformity turns out to be of great relevance for recovering the target clustering of SAGE sh ELGs.
Reproducing the count of satellite SAGE sh galaxies, both with and without a central galaxy of the same type, requires to include 1-halo conformity in our HOD models.In practice, this implies changing the ⟨ sat ⟩, depending on whether we have assigned a central (H  ) galaxy or not.In section 5, we explain in detail the concept of conformity and how we implement it.

Galaxy probability distribution function
The probability distribution function (PDF) describes the sampling from the mean number of galaxies ⟨ i ⟩ to a specific realization  i in a given halo mass, ( i |⟨ i ⟩).In this work, we fix the PDF for both central and satellite galaxies.
For central galaxies, the mean number of galaxies,  cen , can only take the values of 0 or 1, following the Bernoulli distribution.
For satellite galaxies, we assume the PDF of a Poisson distribution.This is the most common assumption for HOD models in the literature (e.g.Zheng et al. 2005;Rocher et al. 2023a).However, previous theoretical models have shown that the PDF of satellite ELGs might be better described by a non-Poissonian distribution.(Jiménez et al. 2019)  SAGE satellite ELGs are well described by a Poisson distribution.We have reached this conclusion by counting in Figure 4 the number of times that a halo hosts a given number of satellite galaxies ( sat ) in our HOD models compared to those measured from the SAGE sh catalogue.
By assuming a Poisson distribution, we are able to recover very well, within 14 , the number of haloes with no satellites,  sat = 0 (not shown in Figure 4), and up to two satellites.For haloes with 3 or 4 satellites, we find slightly larger differences, but still within 1.5.
Figure 4 shows that larger differences appear for higher  sat values, which are very rare.In particular, haloes with  sat = 5 appear to be poorly described by a Poisson distribution.However, in the SAGE sh catalogue we only find one halo with  sat = 5.For our HOD galaxy catalogues we find no halo with  sat = 5 in 90% of the cases run and a single halo in 10% of the cases.Thus, we consider the differences found for massive haloes to be statistically negligible.Even if they were not, we expect such small differences to have a negligible impact on the clustering, compared to the effect of assuming conformity or a different radial profile for satellite galaxies.Assuming a Poisson distribution is not expected to affect the conclusions drawn in this paper.

Radial profile for satellite galaxies
While central galaxies are always located at the center of their host halo, a model is needed to place satellite galaxies within haloes.In this work we explore several different prescriptions for positioning satellite ELGs within their host haloes that will be discussed in detail in section 6.The prescriptsions we have considered in this work are: • Sampling individual halo profiles assuming an NFW given by the individual halo concentration, as detailed in subsection 6.1.
• Adjusting the NFW or Einasto curves to the / s -stacked profile from all our H  galaxies, as detailed in subsection 6.2.
• Adjusting the NFW or Einasto curves to the / vir -stacked profile from all our H  galaxies, as detailed in subsection 6.3.
• The inherent  () SAGE sh distribution profile: We make use of the discrete (in a histogram) distribution profile of satellite galaxies in SAGE sh as a function of the distance  and sample from there.
• Modified NFW profile.We introduce a generalized version of the NFW density profile that effectively models the stacked profile of all our H  galaxies as a function of .We find an excellent fit with our modified NFW curve (Equation 14).In subsection 6.5, we describe in detail how we use this continuous function to accurately fit the distribution profile obtained from SAGE sh .

Vanilla HOD model
The benchmark model for this work, our Vanilla HOD, makes the following assumptions: • Mean HOD as described in subsubsection 4.1.1 as read-off from our SAGE-UNITsim catalogues.
• Central and satellite H  galaxies are independent events (no conformity).
• Poisson distribution for the satellite PDF.
• The radial profiles for satellite galaxies are obtained sampling a NFW profile given the concentration of each halo.
Whereas the first point differs slightly from the most common practice, which would rely on assuming an analytical formula (see discussion in subsubsection 4.1.1 and in subsection 3.1), the rest of points matches what is commonly assumed to generate HOD catalogues (e.g.Avila et al. 2020).As previously noted, the Vanilla HOD model produces a clustering at small scales very different from that for SAGE sh ELGs, as shown in Figure 3.

Default HOD model
The default HOD model is our proposal to best fit the clustering of SAGE sh ELGs ( § 3).As a summary, these are the choices made for the default HOD model: • Mean HOD as described in subsubsection 4.1.1:as read-off from our SAGE-UNITsim catalogues.
• We assume conformity, i.e. central and satellite ELGs are not modelled as independent events.We implement the conformity as a global factor independent of mass.This factor modifies the average random numbers of satellite galaxies, ⟨ sat ⟩, depending on whether the central is another H  selected galaxy or not, to match the numbers found in SAGE sh .
• Poisson distribution for the satellite PDF.
• Satellite galaxies are placed in haloes following an analytical function that encapsulates the number of model ELGs found as a function of their relative distance to the center of the halo.This function is presented in subsection 6.5, and can be considered as an extension of the NFW profile equation.
Figure 3 shows the improvement achieved at small scales when using our proposed model, with respect to utilizing a vanilla HOD.
The HOD default model summarised in the previous paragraph was obtained from one of the 4 existing SAGE-UNITsim boxes.The level of noise we find in the clustering when applying the default HOD model to the other simulation boxes, is similar to that found for the clustering of galaxies modelled by running SAGE on different simulation boxes.Hence, we conclude that our default-HOD model works well for the other simulation boxes and that any noise in our inferred parameters would not affect our conclusions.
Our analysis is done, by default, for the number density corresponding to the Pozzetti model number 3 evaluated over the redshift range 0.9 <  < 1.8, as indicated in Table 1, and using the simulation snapshot corresponding to  = 1.3.However, we also explore how our conclusions might change when a different number density or redshift is assumed when analysing the ELGs.For that, we construct alternative SAGE samples, as described in subsection 3.1 and apply the same shuffling algorithm to them.We then apply the same steps to construct our default HOD as summarised in the bullet points above, and described in detail in the next sections.We then compare the clustering of the HOD-default model to that of SAGE sh for alterna-Figure 6. Top: Mean number of satellite galaxies per halo as a function of halo mass: total (orange), with a companion central galaxy of the same type (blue), and without these centrals (green).Middle: The conformity parameter  1 (  ) per mass bin (Equation 4) and the global conformity factor  1,glob (Equation 6), measured for SAGE sh ELGs, as a function of halo mass.These parameters represent the ratio between the mean number of total satellite galaxies with a central galaxy (shown in blue in the top panel) with respect to the corresponding number of haloes (see Equation 2).Bottom: Similar to the middle panel, but for  2 (  ) (Equation 5) and the global factor,  2,glob (Equation 7).In this case, these parameters represent the ratio between the mean number of total satellite galaxies without a central galaxy (shown in green in the top panel) with the number of haloes (see Equation 3).
tive number densities or redshifts in Figure 5.We find that the level of agreement between our default HOD to these alternative reference samples is as good as it was for our  = 1.3 Pozzetti model 3 sample.We find sightly noisier results for these cases, but this is partly due to only using 20 random realisations, and also due to having larger error bars when having lower number densities.

MODELLING CONFORMITY
Usually, HOD models assume satellite and central galaxies of a given type to be independent events, i.e., the mean occupation of satellite galaxies does not depend whether or not the central is of the same type.Here we define the 1-halo conformity as deviations from independency, i.e., the mean occupation of satellite ELGs depend on having or not a central ELG in that halo.
The phenomenon of galactic conformity refers to the correlation between the properties of neighbouring galaxies (Weinmann et al. 2006).This is thought to happen due to galaxies having a common evolution as baryonic matter collapses within dark matter haloes to form galaxies.Observed galaxies exhibit variations in mass, shape, star formation activity, colors, and ages, and these properties tend to be correlated.It is important to differentiate between one-halo conformity, which measures conformity at small separations within a single halo, and two-halo conformity, which measures conformity at larger separations between central galaxies and neighboring galaxies in adjacent haloes.
Conformity has been detected in simulations (e.g Lacerna et al. 2018;Ayromlou et al. 2023) and there seems to be growing evidence of it from observations (e.g.Rocher et al. 2023a;Gao et al. 2023).Galactic conformity continues to be an active area of study and remains a significant puzzle in our understanding of galaxy formation and evolution.
We define the mean value of satellite galaxies with, ⟨ sat | cen = 1⟩, and without a central galaxy, ⟨ sat | cen = 0⟩, as: where  sat,wc and  sat,w/oc denote the number of satellites with or without a central ELG.Note that the total number of satellite galaxies,  sat , is the sum of these two values,  sat =  sat,wc +  sat,w/oc .The total number of dark matter haloes with and without a central H  galaxy are  h,wc and  h,w/oc , respectively.⟨ sat ⟩ is the average value of satellite galaxies in all types of haloes (with and without central galaxies of the given type). 1 and  2 are the conformity factors.
When there is no conformity,  1 =  2 = 1, so the mean number of satellite galaxies is independent of having a central galaxy of the same type, ⟨ sat | cen = 0, 1⟩ = ⟨ sat ⟩.
When there is conformity,  1 ≠ 1 ≠  2 .To model the 1-halo conformity, the average number of satellites should depend on the presence or absence of a central galaxy of the same type.
To investigate conformity, we initially examine whether the average number of satellite galaxies, with or without a central galaxy, constitutes two completely independent events.To achieve this, we calculate the mean values ⟨ sat | cen = 1⟩ and ⟨ sat | cen = 0⟩ for our SAGE galaxy sample and compare them to the mean value of ⟨ sat ⟩.In the top panel of Figure 6, we demonstrate that the total number of satellites with and without a central galaxy in our SAGE sample is not reproduced when assuming independence.As a result, we can infer that introducing the concept of conformity is necessary to accurately reproduce these differences.Furthermore, it enables us to demonstrate the crucial role of conformity in replicating the distribution of satellites in SAGE, thus marking it as one of the primary novel findings of our article.It is worth noting that in other studies, such as Lacerna et al. (2018) and Rocher et al. (2023b), they have also found the need to include the conformity property.

Mass dependent conformity
For a given bin of halo masses,   and using the above expressions, it is possible to define  1 (  ) and  2 (  ) as a function of quantities that we can measure from the reference galaxy catalogue: The above conformity factors,  1 (  ) and  2 (  ), can be computed using these equations by inserting the galaxy/halo counts measured in our simulation.These factors are then used in a HOD prescription for each mass bin.In practice, what we need is to first sample the Bernoulli PDF for centrals in order to determine whether we have  cen = 0 or  cen = 1 in a given halo (see PDF description in subsubsection 4.1.3)and then modify ⟨ sat ⟩ according to Equation 3 or Equation 2, respectively.Subsequently, we would sample the Poisson distribution (again subsubsection 4.1.3)from the modified ⟨ sat ⟩ (= ⟨ sat | cen ⟩) and continue with the rest of steps of the HOD (subsection 4.1).
In the middle panel of Figure 6, we can observe the conformity parameter  1 () (Equation 4) for our SAGE sh ELGs sample as a function of halo mass.These parameters represent the relationship between the average number of total satellite galaxies associated with a central galaxy and the overall average number of satellites, depicted in blue for  1 ().This comparison indicates the presence of conformity, i.e., how much it deviates from the scenario where the existence of a satellite galaxy and a central galaxy are two independent events (that would be represented by  1 () = 1).Then, in the bottom panel, similar analyses are performed, but this time for  2 () (Equation 5).In this case, these parameters represent the ratio between the mean number of total satellite galaxies without a central galaxy and the overall average number of satellites, shown in green for  2 ().

Global conformity
As an alternative, we consider global conformity factors,  1,glob and  2,glob , that are assumed constant for a given reference galaxy catalogue and to be the same for all halo masses. 1,glob and  2,glob are then constants in Equation 2 and Equation 3. In this case, the global conformity factors are computed as: where  indicates each of the different mass bins considered.The somewhat complicated expressions above are a consequence of populating the mean HOD ⟨ sat ⟩ in bins of halo mass, hence, with Equation 3 and Equation 2 also applied in mass bins.This implies that we can not compute the global factors,  1,glob and  2,glob , by simply dropping the mass dependence from Equation 4 and Equation 5. Instead, we need to compute them in individual bins and then do the sum indicated in Equation 6 and Equation 7.
The middle panel of Figure 6 shows the global conformity factor  1,glob (Equation 6) for our SAGE sh ELGs sample.And the bottom panel the other global factor,  2,glob (Equation 7).The global conformity factors,  1,glob and  2,glob , roughly correspond to the respective mean values of the mass-dependent conformity ones,  1 () and  2 ().We find  1,glob = 0.708 and  2,glob = 1.038, implying that we detect conformity of the order of ∼ 30%.

Variations with number density and redshift
Since we will adopt global conformity in our default model, we now explore how the values reported above change when we choose a different reference sample by adopting a different reference number density (Table 1) or a different redshift (Table 2).These values are summarised in Table 3.
The global conformity factors remain remarkably constant for samples of ELGs with different number densities.The variations are below 20% for  1,glob and below 1% for  2,glob .It will be interesting  6) and  2,glob (Equation 7) for galaxies produced with HOD models at different redshifts and with number densities obtained either in a range of redshifts, first three rows (Table 1) or from a differential (d)luminosity function (Table 2).Our default choice is the sample at  = 1.3 with number density matching that of Pozzetti model number 3 in a range of redshifts.This is indicated by using bold face in the table.
to explore in the future if this is connected with conformity depending weakly with environment, in line with the proposal of spectral emission lines coming from star forming regions being triggered by merging events (Yuan et al. 2023b).
The conformity global factors,  1,glob and  2,glob , obtained for samples at different redshifts are summarised in Table 3.We find variations below 5% for  1,glob and below 1% for  2,glob .The results suggest that there is no evolution in conformity in the explored redshift range, 0.987 <  < 1.650.

Clustering
After computing the values of  1 and  2 , either as a global factor or as a function of mass, we apply Equations 2 & 3 to our HOD model, based on either the global conformity or the mass-dependent conformity, respectively.
Subsequently, we assess their clustering and compare, in Figure 7, the results to a scenario without conformity (orange dotted dashed).The dot-dashed green line illustrates the scenario where  1 and  2 are dependent on the mass of the bins, while the solid red line corresponds to the global factor conformity.First of all, it is noticeable that in the case of independence, the largest deviation from the reference sample in terms of clustering occurs around  ∼ 0.1, Mpc, ℎ −1 , resulting in an approximately ∼ 60% difference.However, by incorporating conformity, we mitigate this difference by ∼ 40% at this scale.Moreover, we observe that both implementations of conformity yield similar outcomes, significantly superior to the scenario without considering conformity.The clustering is found nearly indistinguishable between the two models proposed for conformity.Consequently, for the sake of simplicity, we opt for using global conformity factors as our default modeling approach.It is important to emphasize that the results regarding the necessity of including the conformity property align with what has been found in other studies, as mentioned in Lacerna et al. (2018) or Rocher et al. (2023b).In the latter, a mean satellite occupation function was obtained that aligns with physically motivated models of [O ii] emitters only if conformity between centrals and satellites is introduced.This implies that the occupation of satellites is conditioned by the presence of central galaxies of the same type.It turns out that this aligns with what we obtain in this article, while using very different angles.

THE RADIAL PROFILE OF SATELLITE GALAXIES
The distribution of substructure in dark matter haloes has been accurately described by either a Navarro-Frenk-White profile (NFW, Navarro et al. 1997) or an Einasto profile (Einasto 1969).However, the distribution of galaxies within haloes could be biased with respect to that of dark matter (e.g Yuan et al. 2023a;Rocher et al. 2023a;Hadzhiyska et al. 2023).
Central galaxies are placed in the center of potential of the dark matter halo, so we focus here on satellite galaxies and how they are distributed with respect to their center of halo.
In this work, we first study the spatial distribution of SAGE sh ELGs that are satellite galaxies within dark matter haloes.Then we explore how to best use the measured distribution to inform HOD models with the aim to reproduce the SAGE sh 1-halo term clustering.
The radial profile of SAGE sh satellite ELGs is shown in Figure 8 as filled circles.This plot shows the number of satellite galaxies as a function of their distance to the central galaxy,  = |ì  | = |ì  sat − ì  h |.Note that we have verified the existence of a displacement between the position of the central galaxy and the center of the halo.However, we have studied these differences, and they are smaller than 1 × 10 −5 for all central galaxies, so we can assume that ì  cen = ì  h .Satellite galaxies in SAGE sh are preferentially placed at a distance of ∼ 0.2Mpcℎ −1 from their central galaxy.
In Figure 8, we present the radial profile as a function of the relative distance to the halo, , because we have found that this approach provides an HOD model clustering closer to the reference SAGE sh sample.This result is shown in 9.It is clear from this figure that the 1-halo clustering is better recovered when the radial profile as a function of  is best matched, instead of / s or / vir .This is a surprising result as haloes of different sizes and masses are being mixed in the radial profile shown in Figure 8, as a function of .
Typically, HOD models place satellite galaxies in dark matter haloes assuming the NFW profile given the concentration (or mass) of the halo (e.g.Zheng et al. 2005).This is typically done halo-byhalo.An other commonly used option is to use the positions of dark matter particles (e.g.Avila et al. 2020;Rocher et al. 2023a).
Our aim is to have an HOD model with galaxies clustered as close as possible to that from SAGE sh , in small scales.To achieve this, we explore HOD models with different ways of placing satellite galaxies within haloes: (i) NFW profiles using the concentration from each halo ( § 6.1, NFW halo-by-halo); (ii) NFW and Einasto curves fitted to the / s -stacked profiles ( § 6.2); (iii) NFW and Einasto curves fitted to the / vir -stacked profile ( § 6.3); (iv) radial profile from SAGE sh , as a function of , using it either directly as a discrete distribution function or to fit an analytical function ( § 6.5).
Figure 8 presents the radial distribution of galaxies modelled with all the approaches described above, together with the SAGE sh distribution.From this figure, it is clear that the first three approaches cannot reproduce the radial profile of satellite galaxies in SAGE sh .A similar conclusion was reached by Qin et al. (2023) also using the SAGE SAM to study the distribution of galaxies.In their work they applied their proposed extended profile to SSDS DR10 group catalogue galaxies.
The different HOD approaches described above are implemented following a similar method.For each halo, we generate a number of random values equal to the number of satellite galaxies assigned to that halo, which we then transform to a distance  using an inverse cumulative distribution sampling of the profile (in terms of  () ∝  2 ()).This distance  can be / vir , / s or  depending on the chosen independent variable in each prescription.The first two distances can be transformed into , given the halo  vir or   .Once we have , we place the satellite galaxies by randomly sampling the angular position with respect to the halo center.

NFW halo-by-halo: Sampling individual halo profiles
HOD models usually place satellite galaxies within haloes following the distribution of dark matter itself (e.g.Avila et al. 2018).This is often described by a Navarro-Frenk-White (NFW) profile, which was derived from dark matter only N-body simulations (Navarro et al. 1997).The NFW profile describes the density of dark matter,  NFW , as a function of distance to the center of a halo, : where  s is the scale radius provided by Rockstar, and  s is the mass density at this radius.The halo concentration, , relates the virial and scale radii of haloes,  vir =   s .Given a halo concentration, the NFW can be sampled in a halo-byhalo basis.We have followed this method to construct catalogues of ELGs.In Figure 8 we compare the global radial profile for satellite galaxies from this catalogue (dark blue solid line, NFW halo-byhalo) with that from SAGE sh .The differences are large, in particular below 0.1ℎ −1 Mpc.At these scales, we obtain differences over 70% compared to our SAGE sh sample.The difference increases for smaller values of .
The NFW halo-by-halo clearly fails to reproduce the global radial  14) to fit the radial distribution of SAGE sh satellite galaxies, is shown in red.Five modifications of this model are also shown in the plot: a model assuming a NFW distribution given the concentration of each halo (solid blue line); models assuming a NFW density profile fitted to reprocude the average SAGE sh one as a function of / vir (dash-dotted blue line) and / s (dashed blue line); models assuming an Einasto profile fitted to reproduce the average SAGE sh one as a function of / vir (dash-dotted purple line) and / s (dashed purple line).Note that the binning is linear but we have plot in logarithmic scale to properly appreciate the change of slopes.
profile of SAGE sh satellite galaxies.This difference results in a very strong clustering at scales below ∼ 0.2ℎ −1 Mpc.The NFW halo-byhalo results in galaxy clustering over a factor of 2 above the clustering measured for SAGE sh galaxies at scales below 0.1ℎ −1 Mpc (Figure 9).This is of great importance, as the NFW halo-by-halo approach is very commonly used one in the literature to populate halo catalogues from large simulations.

𝑟/𝑅 s -stacked profiles
We have fitted the SAGE sh / s -stacked satellite profile with single NFW (Navarro et al. 1997) and Einasto (Einasto 1969) parametrisations.To construct the stacked profile from the SAGE sh sample we start by converting the counts of satellite galaxies as a function of distance to the central galaxy,  (), into densities, with  =  ()/(4 2 d).We use a binning of d = 0.1ℎ −1 Mpc.
In the previous section, we introduced the NFW parametrisation of radial profiles (Equation 8).The Einasto parametrisation can be written as a function of the scale radius,  s , as follows: where  s is the mass density at  =  s and  determines the curvature of the profile.14) to fit the radial distribution of SAGE sh satellite galaxies, is shown in red.Four modifications of this model are also shown in the plot, with similar colours to those shown in Figure 8. Error bars are calculated as the standard deviation of 100 realisations with changing seeds for the generators of random numbers.
When fitting the SAGE sh / s -stacked profiles we aim to recover the total number of satellite galaxies.To achieve this, the NFW (Equation 8) and Einasto (Equation 9) parametrisations must be integrated up to  =  vir , which is considered the edge of the halo.Thus, we ensure the total number of satellite galaxies,  sat , is preserved by imposing: When fitting the NFW and Einasto parametrisations, we find the halo concentration,  =  vir / s , that gives the closest value of  sat to the SAGE sh one.Note that the concentration appears as an integration limit in Equation 10.For both parametrisations,  0 provides the normalisation of the fit to the SAGE sh / s -stacked satellite profile.The NFW profile has the halo concentration as the only free parameter, while the Einasto one also has  as a second free parameter.
For performing the fit to the NFW profile, we allow the halo concentration to vary from 0.1 to 10.This truncation is chosen based on the scarcity of satellite galaxies,  ≲ 1, beyond this limit.This choice is conservative, compared with the results from Ramakrishnan et al. (2019) where they used dark matter simulations to measure the distribution of different halo properties.In their study, the halo concentration is found between 0 and ∼ 30.
To determine the parameters that yield the best fit to the SAGE sh / s -stacked profiles, we iterate through different values of halo concentration,   , which gives the upper limit of the integration in Equation 10.The parameters yielding the lowest  2 per data point (see below) value are selected as the optimal fit.The  2 value is determined using the following expression: where we assume a Poisson error on the counts, which translates to an uncertainty on the density profile of We perform the integration within Δ(/ s ) = 0.05 bins.Thus, for smaller   values, the number of data points used to evaluate the functions will be fewer compared to larger ones, due to the profile cutoff set by this integration limit (see Equation 10).To address this differences, we normalize the  2 value dividing it by the total number of bins within each interval.We minimize this normalised quantity.
In the case of an Einasto profile, a combination of   and   values are sampled from a grid.For   , we follow the same approaches as for NFW, and consider   values from 0.1 to 2, incremented by Δ = 0.01 Table 4 presents the best fit parameters to the SAGE sh / s -stacked satellite profile for both NFW and Einasto profiles.The left panel of Figure 10 show the best fit profiles compared to the ELGs.We can observe that the Einasto density profile fits reasonably well, whereas the NFW profile is far from being a good fit.
Finally, we generate the corresponding mocks using these fitted /  profiles.In Figure 8, we can observe the profiles in terms of  (dashed violet and cyan).We find that even when utilizing our fitted / s -stacked profiles, we still achieve a poor  () curve, akin to the one obtained through the halo-by-halo sampling approach (blue curve).This is particularly striking for the Einasto profile, which showed a good fit in the stacked (/  ) profile.

𝑟/𝑅 vir -stacked profiles
In order to address the next scenario, we begin by rewriting equations 8 and 9 in terms of the virial radius of the haloes with  vir =   s .
The viral radius is defined as the radius that contains a mass with a virial overdensity, defined by  from Bryan & Norman (1998) and internally computed by Rockstar.It is considered the size of the halo, hence objects beyond this radius are considered outside the halo.By now stacking the profile in terms of / vir we are encapsulating the information of the size of each halo (instead of the concentration as in subsection 6.2).
Following a similar procedure to the previous case, we observe that both classical models have the same number of free parameters and normalization parameters for fitting using r/ vir .However, in this situation, the free parameter C does not appear as an integration limit, but rather as a component within the density profile functions for both models.To ensure the conservation of the total number of satellites, we integrate these theoretical expressions up to r =  vir , leading to the following equivalences: It is important to note that, in this case, when performing the fit, we truncate the density profile obtained from our sample of satellite galaxies up to  =  vir , as there are very few satellite galaxies beyond this value.Technically, there should not be any.By definition, there are no subhaloes outside the virial radius, but there are some exceptions, likely due to how the merger trees are constructed with ConsistentTrees (Behroozi et al. 2013b).
For the fitting, we use the same procedure as in the previous section, with a few differences.Now, C does not appear as an integration limit but is explicit present in the functions of both models.Likewise, in this case we always evaluate the number of data points from our histograms (0 ≤  ≤  vir with Δ(/ vir ) = 0.01), eliminating the need for normalization by the total number of bins.
In the right panel of Figure 10, we observe the fittings achieved by the parameter set that minimizes the  2 of the classical models (Einasto and NFW) with respect to our sample of SAGE satellite galaxies, when stacking the profiles on the variable / vir .The resulting parameters for both models are shown in Table 4.For this case, we can confirm that a decent qualitative fit has been achieved for both the NFW and Einasto profiles, unlike the / s case.However, the Einasto profile still provides a better fit, as expected, given that it includes one more free parameter.
It is worth noting that even though both models share the same physical parameters (  , , ), the values obtained differ when using / s or / vir as our variable (see Table 4).These differences stem from the lack of a direct correlation between the variables  s and  vir , with a Pearson correlation coefficient of only  R vir ,R s = 0.36.As a result, we find a great dispersion between these variables, between these and the concentration  and also between all those quantities and the  found in the ELGs.This non-univocal relation between the variables considered, makes possible the differences observed in the goodness of fit and best fit parameters when considering a fit to an / S -stacked profile or an / vir -stacked profile.
This level of dispersion is consistent with what has been previously presented in Salvador-Solé et al. (2023).In the aforementioned article, various models with distinct simulations and redshift values, showing an existence of a concentration dispersion in relation to the mass of haloes.The resulting  −  200 diagrams can be extrapolated using the dispersion we discovered for the variables  s and  vir , which are correlated with mass.
In Figure 8, we can observe the profiles as a function of distance () that we have obtained from the generated mocks using the average fits of the NFW and Einasto density profiles in terms of / vir , as described in this subsection.As observed, both Einasto and NFW Figure 10.Density profiles of the SAGE sh ELGs as a function of / s , left, and / vir , right, in black, together with fits using either NFW, in blue, or Einasto profiles, in purple.In the case of density profiles as a function of / s , the average concentration, ⟨ ⟩, is a free parameter that appears as an integration limit (see Equation 13).We show as solid lines the fit up to ⟨ ⟩, beyond this value we show dashed lines.Note that in this case we are effectively averaging different parts of haloes with different sizes.models show better results when the fits are performed in terms of / vir (dotted dashed) compared to when we use / s (dashed), in comparison with our reference, SAGE (black dots).It is also possible to appreciate a better adjustment of the Einasto model with respect to the NFW model, at least on the low- side, in line with the findings in Figure 10.In light of these findings, when studying the clustering of the different samples, we will only show results of the / vir -fitted classical profiles (and not the / S ones), to avoid overcrowding our figure.
The differences observed in the profile as a function of distance  () translate to differences in the observed galaxy clustering.These differences are reflected in Figure 9, where a strong clustering is observed at scales below ∼ 0.1ℎ −1 Mpc.This difference reaches almost ∼ 40% for both the NFW and Einasto cases at  ∼ 0.1ℎ −1 Mpc.Furthermore, we can confirm that the Einasto fit proves to be superior to the NFW concerning small scales (< 0.1ℎ −1 Mpc), consistent with the results obtained earlier in this subsection.

6.4
The inherent SAGE sh distribution  SAGE sh () profile.
In order to achieve a better fit on the small scale in terms of clustering compared to the various prescriptions discussed in the previous subsections, we make use of the inherent SAGE sh distribution    () profile.This allows us to determine the discrete distribution profile (in a histogram) of our sample of satellite galaxies from SAGE sh .We use linear binning with a bin size of Δ = 0.01ℎ −1 Mpc.We then use this histogram as a discrete distribution function to generate new catalogs of satellite galaxies with their respective positions.In Figure 9, we can see the clustering we obtain for this case (gold curve).Where it can be observed that it corresponds to the best results in terms of fitting the small scale, below  = 1ℎ −1 Mpc.It is worth noting that for  ∼ 0.1ℎ −1 Mpc, this difference is below 25%.

Extending NFW for an average 𝜌(𝑟)profile.
In this section, we present an analytical expression that describes the SAGE sh profile with greater accuracy.Since the previous prescriptions did not perform adequately in reproducing the radial profile ( ()) of our SAGE sh satellite sample, we now attempt to directly fit the profile based on the variable .This variable is the most closely related one to observable quantities and during our study we have found that the profiles in  are a better indicator of the measured clustering.For example, as we see in subsection 6.4, when we use the  () measured directly from SAGE sh , we achieved good results in terms of clustering.Motivated by this, we now perform an analytical fit to  () that may make more portable our results to other models and easily implementable in different simulations.After some search, we found that the following analytical function describes well the  () profile of SAGE sh : with  0 = 3928.273,  0 = 0.34ℎ −1 Mpc,  = 1.23,  = 3.19 and  = −2.1.Note that these parameters are free, and this case is applicable to our binning in Δ = 0.01, thus the normalization would change with the binning.
In Figure 8 it is evident that our implemented analytical function (solid red line) offers a better fit to describe the SAGE profile compared to all the other used prescriptions.
The derived analytic expression can be interpreted as a generalization of the NFW density profile: where  0 =  0 /16 2 0 .When  = 3,  = 1 and  = −2, the expresion above is reduced to the classical NFW profile (8).In this case, the free parameter  0 would correspond to the   variable in the NFW density profile, and our definition of  0 with the   variable.
Figure 9 shows how reproducing the global radial profile of SAGE sh satellite galaxies using an analytical extension of the NFW density profile yields similar results to those we obtain if we use the histogram of the distribution profile of satellite galaxies in SAGE sh (gold curve).For small scales, below  = 1ℎ −1 Mpc, our analytical expression (red curve) achieves a clustering that reproduces that of SAGE sh within ∼ 25%.

Variations with number density and redshift
Since we will adopt this extended NFW profile in our default model, we now explore how this model performs when we choose a different reference sample by adopting a different reference number density (Table 1) or a different redshift (Table 2).
In Figure 11 we compare the radial profile of satellites from the three SAGE sh ELG samples at  = 1.3 with different number densities (Table 1).The radial profiles for the three SAGE sh samples exhibit the same shape with varying normalisation, as expected from the change in number density.At  = 0.21Mpcℎ −1 , the value associated with the maximum number of counts for our reference sample, the count number increases by a factor of 2.19 from the reference sample to that with the highest density.This factor practically coincides with the ratio of total satellite galaxies between both samples, 2.04.The count number increases by a factor of 4.15 from the sample with the lowest density to the reference one; approaching the value corresponding to the ratio of total satellite galaxies for that case, which is 3.73.
The three samples with varying number density at  = 1.3 are well described by our proposed extended NFW profile (Equation 14).The best fit parameters corresponding to each sample are summarised in Table 5.We find variations of a factor of 1.17 for , 1.07 for , 1.19 for , and 1.42 for  0 .Thus, there is not much variation in the shape of the profiles for these three samples.
In Figure 12 we show the radial profile of SAGE sh satellite samples at different redshifts (see Figure 3.1 for details on the construction of the samples).Following the decrease in number density of the samples (Table 2), the normalisation of the curves decrease with increasing redshift.At  = 0.21Mpcℎ −1 , the value associated with the maximum number of counts for our default sample, the count number increases by a factor of 4.52 from the lowest value at  =  1), as indicated in the legend.These reference profiles are compared with their best fits to the analytical function for the radial profiles described by Equation 14(Table 5).For reference, the profile from the default case, Pozzetti model number 3, is shown as a red line.
1.650 to that at  = 0.987.This factor is close to the ratio of total satellite galaxies between both extreme samples, which is 4.63.
At all redshifts, the shape of the profiles remain similar and well described by Equation 14.Table 5 summarises the best fit parameters to this equation for each redshift.The parameter  0 remains nearly unchanged with redshift.We find the other parameters to change by at most a factor of 1.16 for , 1.29 for , and 1.26 for .
Therefore, we can conclude, that Equation 14 is a good description of the radial profiles of ELGs that are satellite galaxies at different redshifts.

Clustering
In this subsection, we provide a global discussion of the results obtained for the clustering, comparing the 2-point correlation function of all previously mentioned radial profile prescriptions and concluding the respective section.These include sampling individual halo profiles assuming an NFW profile based on halo concentration, using NFW and Einasto density profiles as a function of  vir to fit our data, utilizing the normalized histogram of SAGE sh satellite galaxies as a function of , and proposing an extension to NFW that fits the overall distribution observed for SAGE sh galaxies.It is important to note that we generated 100 mocks with different seeds for each of the aforementioned descriptions.
In terms of clustering, we show in the Figure 9 how the classical density profiles (Einasto and NFW) do not correctly reproduce the small scales clustering of SAGE satellite galaxies for any of the prescriptions described above (halo-by-halo, / S -stacked-fit, / vir -stacked-fit,).Moving on, we see that the stacked fits show some improvement, but still exhibit a deviation of around 40% at  2).These are compared with the corresponding best fits to the analytical function described in Equation 14(Table 5).For reference, the profile from the default case, Pozzetti model number 3 in a range of redshifts centered at  = 1.3, is shown as a red line.0.1 ℎ −1 Mpc.However, it is worth noting that among the stacked fits, the Einasto profile (purple curve) yields better results than the NFW profile (cyan curve), which shows a maximum difference of 60% in the range  <0.1ℎ −1 Mpc.Finally, we have the curves corresponding to the radial profile of our SAGE sh sample (yellow curve) and the extended NFW profile that we have implemented (red curve).We can verify that these fits provide the best results for small scales compared to the previous prescriptions, with differences less than 22% to  ∼ 0.1ℎ −1 Mpc and ratios closer to 1 than the other profiles.
The great improvement in the small scale clustering (when compared to our reference SAGE sh sample) introduced by the modified NFW in Equation 15 is one of the main results of this paper.It should be noted that when we analyzed the  2 of all the fits shown in the Figure 9, the one that presented the smallest value for the small scale was the one in which we used the own distribution profile of SAGE ( (  )):  2 ( < 1ℎ −1 Mpc) = 36.35,followed by our Default model (subsection 4.3, using Eq.15):  2 ( < 1ℎ −1 Mpc) = 42.34 compared to  2 ( < 1ℎ −1 Mpc) = 654.32 and  2 ( < 1ℎ −1 Mpc) = 54.01 for halo-by-halo and / vir Einasto, respectively.However, we have chosen Equation 15 our default model because it is more practical to implement an analytical expression that can be used in other simulations without resorting to the actual profile of the sample or running SAGE again.In accordance with the results obtained in terms of clustering, we can observe that it is interesting how galaxies do not seem to follow the classic dark matter density profiles, despite the different prescriptions made.This result aligns with various current studies (Qin et al. 2023).

SUMMARY AND CONCLUSIONS
We have explored the essential components needed in a HOD prescription to reproduce the clustering of ELGs.In particular, we aim to reproduce the small-scale of the real-space two-point correlation function (2PCF) from our reference sample, which matches the abundance of ELGs expected to be detected by Euclid.In this work we have proposed a Halo Occupation Distribution (HOD) prescription capable of significantly improving the clustering of model emission line galaxies compared to HOD models without conformity and/or assuming a radial NFW profile based on individual halo concentration.
We have generated our reference catalogue of model galaxies by running the semi-analytical model (SAM) of galaxy formation SAGE on the UNITsims dark matter simulations (Chuang et al. 2019), as outlined in Knebe et al. (2022).The reference sample is obtained by selecting a UNITsim-SAGE galaxy sample at  = 1.321 with a cut in H  flux to match the density predicted by the Pozzetti et al. (2016) model number 3 over the entire Euclid redshift range (0.9 <  < 1.8).We then shuffle the SAGE sample (see subsection 3.2), to remove assembly bias from our reference sample, SAGE sh (section 3).This last step allows us to focus on the main goal of the paper: the influence of conformity and the radial profiles on galaxy clustering without contamination from assembly bias.
Our HOD prescription is very modular, containing a series of ingredients that we have study in comparisson to the clustering measured for the SAGE sh reference sample (section 4).We focus our analysis on the effect that conformity and the radial profiles have on the final galaxy clustering.However, we outline here all the ingredients of our HOD model.We do this following the order in the code and highlighting the choice made for our default model: • Mean halo occupation shape for centrals, ⟨ cen ()⟩, and satellites, ⟨ sat ()⟩.Here, we fix these properties to what is directly measured in SAGE and shown in Figure 1.
• Conformity.We investigate whether satellite and central galaxies of a given type are independent events.We define the 1-halo conformity as deviations from independence and we quantify this with two factors  1 and  2 (Equation 4, Equation 5).We consider two cases for modeling conformity: one where the modification of satellite occupation is performed in mass bins (mass dependent conformity), and the other with global factors, constant for all mass bins (global conformity).Introducing conformity greatly improves the match to the 2PCF of the reference sample at small scales, with respect to the independent case.Figure 7 shows that the 2PCF of HOD models assuming independence, present a ∼ 60% difference at  ∼ 0.1Mpc ℎ −1 with respect to the reference sample.This difference is reduced to ∼ 20% when conformity is introduced in the HOD prescriptions.Both models of conformity result in similar 2PCF, and thus, we adopt the Global conformity as our default model for simplicity.
• Probability Distribution Function.We assume a Bernoulli distribution for central galaxies and a Poisson one for satellites.These are good descriptions for our reference sample (subsubsection 4.1.3).
• Radial profile.These are the prescriptions for the radial profiles of satellite galaxies we have implemented in our HOD models (section 6): (i) Sampling individual halo profiles assuming an NFW given by the concentration of each halo.This is usually the default in the literature.
(ii) Adjusting the NFW and Einasto curves to the / s -stacked profile from our reference sample.With  being the distance between a satellite galaxy and the center of its host halo. S is the scale radius of the halo.
(iii) Adjusting the NFW and Einasto curves to the / vir -stacked profile from our reference sample. vir is the viral radius of the halo.
(iv) The inherent distribution profile of satellite galaxies from our reference sample, measured directly as a normalised histogram of satellite counts as a function of .
(v) Modified NFW profile.We introduce a generalized version of the NFW density profile that effectively models the stacked profile of our reference sample of galaxies as a function of .We find an excellent fit with the following modified NFW curve (Equation 15) and we adopt it for our default HOD model: The NFW and Einasto density profiles are unable to replicate the small-scale clustering of our sample of SAGE sh ELGs, with any of the implementations tried here.It is particularly striking the case of the Einasto / vir -stacked curve, which shows a very good fit in (/ vir ), but does not recover the clustering from the reference sample.We argue that this is due to the low correlation between  and  vir in the reference sample.Therefore, we provide an analytical expression for the positional profile (v above) of this type of satellite galaxies, which can be interpreted as an extension of the NFW profile (Equation 14 and Equation 15).
We find that a good fit to the radial profile,  (), of our SAGE sh reference sample is a good predictor for how good the clustering of galaxies generated with an HOD model will be reproducing the reference one (Figure 9).The goodness of the two last presciptions above (inherent, iv, or fitted, v) stand out with respect to all the others (Figure 8).Directly using the inherent  () profile (iv above) provides a clustering that matches that from the reference sample slightly better than using the analytical fitted expression (v above).However, the gain is small compared to the error bars.As we find difficult to use the inherent SAGE sh  () for different simulations or reference samples, we decide to use as our default the analytical expression above (Equation 15).
Our proposed (default) HOD model includes a model for the 1-halo conformity and a modified NFW radial profile for satellite galaxies.The conformity model is implemented by computing two constants:  1, glob = 0.708 (Equation 6) and  2, glob = 1.038 (Equation 7); and then applying them within the HOD model using Equation 2and Equation 3. We assume that satellite galaxies occupy haloes following the modified NFW profile described by Equation 15.This equation has 4 free parameters: , ,  and  0 (subsection 6.5).
The proposed default HOD model improves significantly the small scale clustering when compared to the benchmark model we started from, the vanilla HOD.This HOD model does not include conformity and assumed a radial profile for satellite galaxies based on a halo-byhalo NFW profile (4.2).This is clearly depicted in Figure 3, where we see a ∼ 15% improvement at  ∼ 0.3ℎ −1 Mpc and more than 50% improvement below ∼ 0.1ℎ −1 Mpc.
There are four SAGE-UNITsim simulation boxes.To understand how much of what we learn from one simulation box can be extrapolated to the other ones, we have applied our default HOD model to the other simulation boxes.The measured clustering is consistent among the boxes, with noise levels similar to those found for the corresponding variations in SAGE.Hence, we conclude that any noise in our inferred parameters would not affect our conclusions.
The main conclusion of this work is that modelling the clustering of ELGs requires the inclusion of conformity and a radial distribution of satellite galaxies as a function of the distance to the corresponding central galaxy (Equation 15).This conclusion is robust to differ-ent number densities (those summarised in Table 1) and redshifts between 0.987 and 1.650.
The work presented here can have a large impact in the way mock catalogues of emission line galaxies are generated in the future.This can be of special relevance for Euclid, but also for DESI, Roman or 4MOST, which will study the clustering of galaxies with strong spectral emission lines.As a next step, we aim to implement the velocity profile of the satellites to investigate its impact on the study of redshift space distortions.A natural follow up to this paper would be to study the assembly bias for ELGs, by exploring the differences in clustering between SAGE and SAGE sh (Figure 3).
As we enter the fourth stage of dark energy experiments, with sub-percent precision cosmology, controlling the origin of any contribution to galaxy clustering is fundamental.Hence, studies like this one will help us in the future to understand the different contributions of galaxy formation physics to the observed galaxy clustering.

Figure 1 .
Figure 1.Top: The mass function of our catalogues of dark matter haloes (dark matter haloes , solid black section 2) and of the different samples of ELGs considered: the flux-cut raw sample (gold) together with the optimistic (green) and reference (red) samples based on Pozzetti et al. (2016) luminosity functions (see Table1).The solid curves correspond to the central galaxies and the dashed curves to the satellite galaxies.The mass function corresponds to the density of haloes/galaxies for a given halo mass divided by the halo mass bin size.To achieve this, histograms of objects in the halo/galaxy catalogues are created as a function of  200c , and they are normalized by their volume (1ℎ −1 Gpc) and the size of the corresponding log halo bin.Bottom: the mean halo occupation distribution as a function of halo mass of the haloes.This is compute as the ratio of the galaxy to halo mass functions.

Figure 2 .
Figure 2. Mean halo occupation distribution for galaxies generated with HOD models at different redshift, as indicated in the legend.The number densities of this samples correspond to those from the diferential luminosity function number 3 from Pozzetti et al. and are summarised in Table2.The contribution from central galaxies is shown as solid lines and that from satellite galaxies as dashed ones.

Figure 3 .
Figure 3. Top: Two-point correlation function in real space of different model galaxy catalogues: SAGE (solid orange), shuffled SAGE (SAGE sh , dashed black) and two HOD catalogues fitted to SAGE.The Vanilla HOD (subsection 4.2, solid blue) uses NFW radial profile and assumes that central and satellite H  galaxies are independent events (no conformity).The default HOD uses the concept of conformity and the analytical  ( ) function implemented for the satellite galaxies fitted to SAGE (subsection 4.3, solid red).Bottom: Ratios with respect to SAGE sh clustering.The error bars are calculated as the standard deviation of 100 realisations.While studying SAGE sh ELGs, we have eliminated the assembly bias, to focus on other properties.

Figure 4 .
Figure 4. Top: The galaxy probability distribution function (PDF) as a function of the mean number of satellite galaxies,  sat .Results are shown for the SAGE sh (black bars) model galaxies and those from our default HOD model (filled symbols).Bottom: Ratio between the number of model galaxies found in both catalogues.The error bars are calculated as the standard deviation of 100 realisations.
found model ELGs to be better described when a super-Poisson variance was assumed.Meanwhile, Avila et al. (2020) and Vos-Ginés et al. (2023) have found hints for eBOSS ELGs following either sub-Poisson or supper-Poisson distribution, depending of the HOD model assumptions.

Figure 5 .
Figure 5.The ratio of two point correlation in real space for galaxies generated with our Default HOD models assuming different number densities and redshifts with the respective 2PCF of each SAGE sh sample.

Figure 7 .
Figure 7. Ratios of the real space two-point correlation functions from galaxy catalogues generated with different HOD models with respect to that from the SAGE sh reference galaxy.The result from the proposed HOD model ( § 4.3), which uses global conformity factors, is shown in red.Two modifications of this model are also shown: one with no conformity (centrals and satellites are treated as independent events), shown in orange; and one using mass dependent conformity factors, shown in green.Note that the three HOD catalogues use an analytical function to model the radial distribution of satellite galaxies ( § 14).The error bars are calculated as the standard deviation of 100 realisations.

Figure 8 .
Figure 8. Radial profile for satellite galaxies as a function of their distance, , to the center of their host halo.ELGs from SAGE sh are shown as filled symbols with Poisson error bars.The lines correspond to catalogues produced with different HOD models, as indicated in the legend.The results from our proposed HOD model ( § 4.3), which uses an analytical function (Equation14) to fit the radial distribution of SAGE sh satellite galaxies, is shown in red.Five modifications of this model are also shown in the plot: a model assuming a NFW distribution given the concentration of each halo (solid blue line); models assuming a NFW density profile fitted to reprocude the average SAGE sh one as a function of / vir (dash-dotted blue line) and / s (dashed blue line); models assuming an Einasto profile fitted to reproduce the average SAGE sh one as a function of / vir (dash-dotted purple line) and / s (dashed purple line).Note that the binning is linear but we have plot in logarithmic scale to properly appreciate the change of slopes.

Figure 9 .
Figure 9. Ratios of the real space two-point correlation functions from galaxy catalogues generated with different HOD models with respect to that from the SAGE sh reference galaxy sample.The results from our proposed HOD model ( § 4.3), which uses an analytical function (Equation14) to fit the radial distribution of SAGE sh satellite galaxies, is shown in red.Four modifications of this model are also shown in the plot, with similar colours to those shown in Figure8.Error bars are calculated as the standard deviation of 100 realisations with changing seeds for the generators of random numbers.

Figure 11 .
Figure 11.Radial profile for satellites at  = 1.3 from our reference sample of ELGs modelled by SAGE sh (filled symbols) assuming two number densities: Pozzetti model number 1 and Flux cut (Table1), as indicated in the legend.These reference profiles are compared with their best fits to the analytical function for the radial profiles described by Equation14(Table5).For reference, the profile from the default case, Pozzetti model number 3, is shown as a red line.

Figure 12 .
Figure 12.Radial profile for satellites at different redshift from our reference sample of ELGs modelled by SAGE sh (filled symbols) assuming number densities matching the differential luminosity function number 3 from Pozzetti et al. (Table2).These are compared with the corresponding best fits to the analytical function described in Equation14(Table5).For reference, the profile from the default case, Pozzetti model number 3 in a range of redshifts centered at  = 1.3, is shown as a red line.

Table 2 .
The contribution from central galaxies is shown as solid lines and that from satellite galaxies as dashed ones.

Table 2 .
The columns in this table contain from left to right: the redshift of each sample; the number densities () derived from the differential luminosity function

Table 4 .
Best fit parameters to the SAGE sh / s and / vir -stacked satellite profile for both NFW and Einasto profiles.It should be noted that the best fit of NFW for C is very similar to what we obtain by averaging the   values of haloes containing satellite galaxies.

Table 5 .
Values of the free parameter in Equation14and Equation 15 that best fit the radial profiles of our reference samples of ELGs for samples at different redshifts and with number densities obtained either in a range of redshifts, first three rows (Table1) or from a differential luminosity function (Table2).Our default choice is the sample at  = 1.3 with number density matching that of Pozzetti model number 3 in a range of redshifts.This is indicated by using bold face in the table.