Representing low mass black hole seeds in cosmological simulations: A new sub-grid stochastic seed model

The nature of the first seeds of supermassive black holes (SMBHs) is currently unknown, with postulated initial masses ranging from $\sim10^5~M_{\odot}$ to as low as $\sim10^2~M_{\odot}$. However, most existing cosmological simulations resolve BHs only down to $\sim10^5-10^6~M_{\odot}$. In this work, we introduce a novel sub-grid BH seed model that is directly calibrated from high resolution zoom simulations that can trace the formation and growth of $\sim 10^3~M_{\odot}$ seeds forming in halos with pristine, star-forming gas. We trace the BH growth along merger trees until their descendants reach masses of $\sim10^4$ or $10^5~M_{\odot}$. The descendants assemble in galaxies with a broad range of properties (e.g., halo masses $\sim10^7-10^9~M_{\odot}$) that evolve with redshift and are sensitive to seed parameters. The results are used to build a new stochastic seeding model that directly seeds these descendants in lower resolution versions of our zoom region. Remarkably, we find that by seeding the descendants simply based on total galaxy mass, redshift and an environmental richness parameter, we can reproduce the results of the detailed gas based seeding model. The baryonic properties of the host galaxies are well reproduced by the mass-based seeding criterion. The redshift-dependence of the mass-based criterion captures the influence of halo growth, star formation and metal enrichment on seed formation. The environment based seeding criterion seeds the descendants in rich environments with higher numbers of neighboring galaxies. This accounts for the impact of unresolved merger dominated growth of BHs, which produces faster growth of descendants in richer environments with more extensive BH merger history. Our new seed model will be useful for representing a variety of low mass seeding channels within next generation larger volume uniform cosmological simulations.


INTRODUCTION
The origin of supermassive black holes (SMBHs) is a key missing piece in our current understanding of galaxy formation.Several theoretical channels have been proposed for the first "seeds" of SMBHs, predicting a wide range of postulated initial masses.At the lowest mass end of the initial seed mass function, we have the remnants of the first generation Population III stars, a.k.a.Pop III seeds (Fryer et al. 2001;Madau & Rees 2001;Xu et al. 2013;Smith et al. 2018) ranging from ∼ 10 2 − 10 3 M⊙.Next, we have seeds postulated at the "intermediate mass" range of ∼ 10 3 − 10 4 M⊙ that can form via runaway stellar and black hole (BH) collisions within dense Nuclear Star Clusters, a.k.aNSC seeds (Davies et al. 2011;Lupi et al. 2014;Kroupa et al. 2020;Das et al. 2021b,a).Finally, we can have "high mass seeds" formed via direct isothermal collapse of gas at sufficiently high temperatures (≳ 10 4 K), a.k.a direct collapse black hole or DCBH seeds (Bromm & Loeb 2003;Begelman et al. 2006;Regan et al. 2014;Latif et al. 2016;Luo et al. 2018;Wise et al. 2019;Luo et al. 2020;Begelman & Silk 2023).DCBHs masses are traditionally postulated to be ranging within ∼ 10 4 −10 6 M⊙, but recent works have suggested that they can also be as massive as ∼ 10 8 M⊙ (Mayer et al. 2023).
The growing observed population of luminous quasars at z ∼ 6 − 8 (Fan et al. 2001;Willott et al. 2010;Mortlock et al. 2011;Venemans et al. 2015;Jiang et al. 2016;Bañados et al. 2016;Reed et al. 2017;Matsuoka et al. 2018;Wang et al. 2018;Bañados et al. 2018;Matsuoka et al. 2019;Yang et al. 2019;Wang et al. 2021) tells us that ∼ 10 9 − 10 10 M⊙ BHs already assembled within the first few hundred million years after the Big Bang.These already pose a serious challenge to models of BH formation as well as BH growth.For example, light seeds may need to sustainably accrete gas at super-Eddington rates to grow by ∼ 6 − 7 orders of magnitude within such a short time.Alternatively, they can boost their seed mass via mergers, but it is unclear as to how efficiently these seeds sink and merge with each other within the shallow potential wells of high redshift proto-galaxies (Volonteri 2007;Ma et al. 2021).Heavier seed masses such as DCBHs are substantially more conducive for assembling the high-z quasars, but it is unclear if they form frequently enough to account for the observed number densities (1 Gpc −3 ).
Due to possible degeneracies in the impact of different BH formation versus BH growth models, it is challenging to constrain seed models solely using observations of luminous highz quasars.To that end, detections of lower mass BH populations at high-z are going to be crucial for constraining seed models as these BHs are more likely to retain the memory of their initial seeds.The James Webb Space Telescope (JWST; Gardner et al. 2006) is pushing the frontiers of SMBH studies by detecting lower luminosity active galactic nuclei (AGN) at high redshifts.In addition to the first statistical sample of ∼ 10 6 − 10 7 M⊙ AGN at z ∼ 4 − 7 (Harikane et al. 2023), JWST has also produced the first detections at z ≳ 8.3 (Larson et al. 2023) and z ∼ 10.6 (Maiolino et al. 2023).Moreover, there is an exciting possibility of future detections of BHs as small as ∼ 10 5 M⊙ using JWST, which would potentially enable us to probe the massive end of the seed population for the very first time (Natarajan et al. 2017;Cann et al. 2018;Inayoshi et al. 2022).
Even with JWST and proposed X-ray facilities like ATHENA (Barcons et al. 2017) and Axis (Mushotzky et al. 2019), low mass seeds ∼ 10 2 − 10 4 M⊙ are likely to be inaccessible to electromagnetic (EM) observations at highz.However, with the new observational window of gravitational waves (GW) opened for the first time by the Laser Interferometer Gravitational-Wave Observatory (LIGO; Abbott et al. 2009), we can close this gap.In addition to detecting numerous (∼ 80) stellar mass BH mergers, LIGO has also started probing the elusive population of intermediate mass black holes (IMBH: ∼ 10 2 − 10 5 M⊙) with GW190521 (Abbott et al. 2020) producing a ∼ 142 M⊙ BH remnant.At the other end of BH mass spectrum, the North American Nanohertz Observatory for Gravitational Waves (NANOGRAV) have also detected the Hellings-Downs correlation expected from a stochastic GW background that most likely originates from populations of merging SMBHs (Agazie et al. 2023).But the strongest imprints of BH formation will likely be provided by the upcoming Laser Interferometer Space Antenna (LISA; Baker et al. 2019), which is expected to detect GWs from mergers of IMBHs as small as ∼ 10 3 M⊙ up to z ∼ 15 (Amaro-Seoane et al. 2017).
Cosmological hydrodynamic simulations (Di Matteo et al. 2012;Vogelsberger et al. 2014b;Sijacki et al. 2015;Khandai et al. 2015;Schaye et al. 2015;Volonteri et al. 2016;Dubois et al. 2016;Kaviraj et al. 2017;Tremmel et al. 2017;Nelson et al. 2019a;Volonteri et al. 2020) have emerged as powerful tools for testing galaxy formation theories (see, e.g., the review by Vogelsberger et al. 2020).However, most such simulations can resolve gas elements only down to ∼ 10 5 −10 7 M⊙, depending on the simulation volume.This is particularly true for simulation volumes needed to produce statistical samples of galaxies and BHs that can be directly compared to observations.Therefore, most cosmological simulations only model BH seeds down to ∼ 10 5 M⊙ (for e.g.Vogelsberger et al. 2014b;Khandai et al. 2015;Tremmel et al. 2017).Notably, there are simulations that do attempt to capture seed masses down to ∼ 10 4 M⊙ (Ni et al. 2022) and ∼ 10 3 M⊙ (Taylor & Kobayashi 2014; Wang et al. 2019), but they do so without explicitly resolving the seed-forming gas to those masses.Overall, directly resolving the low mass seed population (∼ 10 2 − 10 4 M⊙ encompassing Pop III and NSC seeding channels) is completely inaccessible within state of the art cosmological simulations, and pushing beyond current resolution limits will require a substantial advancement in available computing power.
Given that BH seed formation is primarily governed by properties of the seed-forming gas, the insufficient resolution within cosmological simulations carries the additional liability of having poorly converged gas properties.For instance, Pop III and NSC seeds are supposed to be born out of star-forming and metal poor gas.However, the rates of star formation and metal enrichment may not be well converged in these simulations at their typical gas mass resolutions of ∼ 10 5 − 10 7 M⊙ (for example, see Figure 19 of Bhowmick et al. 2021).As a result, many simulations (Di Matteo et al. 2012;Vogelsberger et al. 2014b;Nelson et al. 2018;Ni et al. 2022) simply use a host halo mass threshold to seed BHs.Several cosmological simulations have also used local gas properties for seeding (Taylor & Kobayashi 2014;Tremmel et al. 2017;Wang et al. 2019).These simulations produce seeds directly out of sufficiently dense and metal poor gas cells, which is much more consistent with proposed theoretical seeding channels.But these approaches can lead to stronger resolution dependence in the simulated BH populations (see Figure 10 of Taylor & Kobayashi 2014).In any case, most of these seeding approaches have achieved significant success in generating satisfactory agreement with the observed SMBH populations at z ∼ 0 (Habouzit et al. 2020).However, it is important to note that they do not provide definitive discrimination among the potential seeding channels from which the simulated BHs may have originated.
A standard approach to achieve very high resolutions in cosmological simulations is to use the 'zoom-in' technique.In our previous work (Bhowmick et al. 2021(Bhowmick et al. , 2022a)), we used cosmological zoom-in simulations with gas mass resolutions up to ∼ 10 3 M⊙ to build a new set of gas based seed models that placed seeds down to the lowest masses (1.56 × 10 3 M⊙/h) within halos containing sufficient amounts of star forming & metal poor gas.We systematically explored these gas based seed models and found that the strongest constraints for seeding are expected within merger rates measurable with LISA.However, the predictions for these zoom simulations are subject to large cosmic variance, as they correspond to biased regions of the large-scale structure.In order to make observationally testable predictions with these gas based seed models, we must find a way to represent them in cosmological simulations despite the lack of sufficient resolution.
In this work, we build a new sub-grid stochastic seed model that can represent low mass seeds born out of star forming and metal poor gas, within lower-resolution and largervolume simulations that cannot directly resolve them.To do this, we first run a suite of highest resolution zoom simulations that places 1.56 × 10 3 M⊙/h seeds within star forming and metal poor gas using the gas based seed models from Bhowmick et al. (2021).We then study the growth of 1.56 × 10 3 M⊙/h seeds and the evolution of their formation environments.We particularly study the halo and galaxy properties wherein these seeds assemble higher mass (1.25 × 10 4 & 1 × 10 5 M⊙/h) descendants.We then use the results to build our stochastic seed model that directly seeds these descendants within lower resolution versions of the same zoom region.In the process, we determine the key ingredients required for these stochastic seed models to reproduce the results of the gas based seed models in the lower resolution zooms.
Section 2 presents the basic methodology, which includes the simulation suite, the underlying galaxy formation model, as well as the BH seed models.Our main results are described in sections 3 and 4. In section 3, we present the results for the formation and growth of 1.56 × 10 3 M⊙/h seeds within our highest resolution zoom simulations.In section 4, we use the results from section 3 to build our stochastic seed model.Finally, Section 5 summarizes our main results.

BH accretion, feedback and dynamics
BH accretion rates are determined by the Eddington-limited Bondi-Hoyle formalism given by where G is the gravitational constant, ρ is the local gas density, M bh is the BH mass, cs is the local sound speed, mp is the proton mass, and σT is the Thompson scattering cross section.Accreting black holes radiate at bolometric luminosities given by, where ϵr = 0.2 is the radiative efficiency.
IllustrisTNG implements a dual mode AGN feedback.'Thermal feedback' is implemented for Eddington ratios (η ≡ Ṁbh / Ṁedd ) higher than a critical value of ηcrit = min[0.002(MBH/10 8M⊙) 2 , 0.1].Here, thermal energy is deposited on to the neighboring gas at a rate of ϵ f,high ϵr ṀBHc 2 with ϵ f,high ϵr = 0.02 where ϵ f,high is the "high accretion state" coupling efficiency.'Kinetic feedback' is implemented for Eddington ratios lower than the critical value.Here, kinetic energy is injected into the gas in a pulsed fashion whenever sufficient feedback energy is available, which manifests as a 'wind' oriented along a randomly chosen direction.The injected rate is ϵ f,low ṀBHc 2 where ϵ f,low is called the 'low accretion state' coupling efficiency (ϵ f,low ≲ 0.2).For further details, we direct the interested readers to Weinberger et al. (2017).
The limited mass resolution hinders our simulations from fully capturing the crucial BH dynamical friction force, especially for low masses.To stabilize the dynamics, BHs are relocated to the nearest potential minimum within their proximity, determined by the closest 10 3 neighboring gas cells.When one BH enters the neighborhood of another, prompt merger occurs.

Gas based seed model
We explore the formation and growth of the lowest mass 1.56 × 10 3 M⊙/h seeds using the gas based seeding prescriptions developed in Bhowmick et al. (2021).In order to contrast these seeds from those produced by the seed model discussed in the next subsection, we shall hereafter refer to them as direct gas based seeds or DGBs with mass M DGB seed .These seeding criteria are meant to broadly encompasses popular theoretical channels such as Pop III, NSC and DCBH seeds, that are postulated to form in regions comprised of dense and metal poor gas.We briefly summarize them as follows: • Star forming & metal poor gas mass criterion: We place DGBs in halos with a minimum threshold of dense (> 0.1 cm −3 ) & metal poor (Z < 10 −4 Z⊙) gas mass, denoted by Msfmp (in the units of M DGB seed ).The values of Msfmp are not constrained, but we expect it to be different for the various seeding channels.In this work, we consider models with Msfmp = 5, 50, 150 & 1000.
• Halo mass criterion: We place DGBs in halos with a total mass exceeding a critical threshold, specified by Mh in the units of M DGB seed .In this work, we consider Mh = 3000 & 10000.While our seeding prescriptions are meant to be based on the gas properties within halos, we still adopt this criterion to avoid seeding in halos significantly below the atomic cooling threshold.This is because our simulations do not include the necessary physics (for e.g.H2 cooling) to self-consistently capture the collapse of gas and the formation of stars within these (mini)halos.Additionally, these lowest mass halos are also impacted by the finite simulation resolution, many of which are spuriuosly identified gas clumps with very little DM mass.(Please see Figure B1 and Appendix B for further discussion about the foregoing points.)Another motivation for this criterion is that NSC seeds are anticipated to grow more efficiently within sufficiently deep gravitational potential wells where runaway BH merger remnants face difficulties escaping the cluster.Deeper gravitational potentials are expected in higher mass halos.
Our gas based seed models will therefore contain three parameters, namely Msfmp , Mh and M DGB seed .The simulation suite that will use these seed models will be referred to as GAS_BASED.The individual runs will be labelled as SM*_FOF* where the '*'s correspond to the values of Msfmp and Mh respectively.For example, Msfmp = 5 and Mh = 3000 will correspond to SM5_FOF3000.As already mentioned, the seed masses in this suite will be M DGB seed = 1.56 × 10 3 M⊙/h.

Stochastic seed model
As we mentioned, the key goal of this work is to build a new approach to represent low mass seeds in larger-volume lowerresolution cosmological simulations that cannot directly resolve them.As we shall see in Section 4, this is achieved via a new stochastic seeding model.The complete details of this seed model are described in Section 4, where we thoroughly discuss their motivation and calibration using the results obtained from the GAS_BASED suite.Here, we briefly summarize key features so that the reader can contrast it against the gas based seed models described in the previous subsection.Since the simulations here will not fully resolve the 1.56 × 10 3 M⊙/h DGBs, we will essentially seed their resolvable descendants.To distinguish them from the DGBs, we shall refer to these seeded descendants as extrapolated seed descendants or ESDs with masses (denoted by M ESD seed ) limited to the gas mass resolution of the simulations.In this work, we will largely explore ESD masses M ESD seed = 1.25 × 10 4 & 1 × 10 5 M⊙/h, to be used for simulations with gas mass resolutions of ∼ 10 4 & 10 5 M⊙/h respectively.
To seed the ESDs, we identify sites using the FOF algorithm, but with a shorter linking length (by factor of ∼ 1/3) compared to that used for identifying halos.We shall refer to these short linking length FOFs as "best-Friends of Friends or bFOFs".These bFOFs essentially correspond to galaxies or proto-galaxies residing inside the halos.We do this to accommodate the formation of multiple ESDs per halo; this is because even if we seed one DGB per halo in the gas based seed models, subsequent evolution of hierarchical structure naturally leads to halos occupying multiple higher mass descendants.Notably, one could alternatively seed in subhalos computed by SUBFIND; however, SUBFIND is prohibitively ex-pensive to be called frequently enough for seeding BHs.Hereafter, in most instances, we shall simply refer to these bFOFs as "galaxies".Their properties are comprehensively studied in Section 4.1.
The ESDs will be stochastically placed in galaxies based on where the descendants of the 1.56 × 10 3 M⊙/h DGBs end up within the GAS_BASED suite.Below we provide a brief summary of the seeding criteria • Galaxy mass criterion: We will apply a galaxy mass ('galaxy mass' hereafter refers to the total mass including dark matter, gas and stars) seeding threshold that will be stochastically drawn from galaxy mass distributions predicted for the assembly of (1.25 × 10 4 and 10 5 M⊙/h) BHs that are descendants of 1.56 × 10 3 M⊙/h DGBs within the GAS_BASED suite.As we explore further, it becomes evident that these distributions vary with redshift and exhibit significant scatter.The redshift dependence will capture the influence of halo growth, star formation, and metal enrichment on seed formation in our gas based seed models.
• Galaxy environment criterion: In the context of a galaxy, we define its environment as the count of neighboring halos (N ngb ) that exceed the mass of its host halo and are located within a specified distance (denoted by D ngb ) from the host halo.In this study, we determine N ngb within a range of 5 times the virial radius (Rvir) of the host halo, i.e.D ngb = 5Rvir.This choice is suitable for investigating the immediate small-scale external surroundings of the galaxy, extending beyond its host halo.We then apply a seeding probability (less than unity) to suppress ESD formation in galaxies with ≤ 1 neighboring halos, thereby favoring their formation in richer environments.By doing this, we account for the impact of unresolved hierarchical merger dominated growth from M DGB seed to M ESD seed , as it favors more rapid BH growth within galaxies in richer environments.
The simulations that use only the galaxy mass criterion will be referred to as the STOCHASTIC_MASS_ONLY suite.For simulations which use both galaxy mass criterion and galaxy environment criterion, we will refer to them as the STOCHASTIC_MASS_ENV suite.During the course of this paper, we will illustrate that the outcomes of each simulation of a specific region within the GAS_BASED suite, employing a distinct set of gas based seeding parameters, can be reasonably well reproduced in a lower-resolution simulation of the same region within the STOCHASTIC_MASS_ENV suite.

Simulation suite
Our simulation suite consists of zoom runs for the same overdense region as that used in Bhowmick et al. (2021) (referred to as ZOOM_REGION_z5).The region was chosen from a parent uniform volume of (25 Mpc/h) 3 , and is targeted to produce a 3.5 × 10 11 M⊙/h halo at z = 5.The simulations were run from z = 127 to z = 7 using the MUSIC (Hahn & Abel 2011) initial condition generator.The background grid's resolution and the resolution of high-resolution zoom regions are determined by two key parameters: Lmin (or levelmin) and Lmax (or levelmax) respectively.These parameters define the resolution level, denoted as L, which is equivalent to the mass resolution produced by 2 L number of dark matter (DM) particles per side in a uniform-resolution (25 Mpc/h) 3 box.Specifically, we set Lmin = 7 for the background grid, resulting in a DM mass resolution of 5.3 × 10 9 M⊙/h.For the high-resolution zoom region, we explore Lmax values of 10, 11 and 12.In addition, there is a buffer region that consists of DM particles with intermediate resolutions bridging the gap between the background grid and the zoom region.This buffer region serves a crucial purpose of facilitating a smooth transition between the zoom region and the background grid.Our simulation suite is comprised of the following set of resolutions for the zoom regions: • In our highest resolution Lmax = 12 runs, we achieve a DM mass resolution of 1.6×10 4 M⊙/h and a gas mass resolution of ∼ 10 3 M⊙/h (the gas cell masses are contingent upon the degree of refinement or derefinement of the Voronoi cells, thereby introducing some variability).These runs are used for the GAS_BASED suite that seeds DGBs at 1.56 × 10 3 M⊙/h using the gas based seed models described in Section 2.3.1.
Further details of our full simulation suite are summarized in Table 1.It is important to note that our new stochastic seed models will be primarily designed for implementation within larger-volume uniform simulations.However, this paper specifically focuses on zoom simulations.In particular, we are using Lmax = 11 & 10 zoom simulations for testing the stochastic seed models against the highest resolution Lmax = 12 zooms that use the gas based seed models.In a subsequent paper (Bhowmick et al in prep), we will be applying the stochastic seed models on uniform volume simulations of the same resolutions as the Lmax = 11 & 10 zooms.

Tracing BH growth along merger trees: The SUBLINK algorithm
We use the GAS_BASED suite to trace the growth of the lowest mass 1.56×10 3 M⊙/h DGBs and study the evolution of their environments (halo and galaxy properties) as they assemble higher mass BHs.We do this by first constructing subhalo merger trees using the SUBLINK algorithm (Rodriguez-Gomez et al. 2015), which was designed for SUBFIND based subhalos.Note that these SUBFIND based subhalos, like bFOFs, also trace the substructure within halos.Therefore, to avoid confusion, we shall refer to SUBFIND based subhalos as "subfindsubhalos".It is also very common to interpret the subfindsubhalos as "galaxies".As we shall see however, in this work, we only use these subfind-subhalos as an intermediate step to arrive at the FOF and bFOF merger trees.Therefore, there is no further mention of subfind-subhalos after this subsection.
On that note, recall again that any mention of "galaxy" in our paper refers to the bFOFs.SUBFIND was run on-the-fly to compute subfind-subhalos within both FOF and bFOF catalogues.Therefore, for obtaining both FOF and bFOF merger trees, we first compute the merger trees of their corresponding subfind-subhalos.Following are the key steps in the construction of the subfindsubhalo merger tree: • For each progenitor subfind-subhalo at a given snapshot, SUBLINK determines a set of candidate descendant subfindsubhalos from the next snapshot.Candidate descendants are those subfind-subhalos which have common DM particles with the progenitor.
• Next, each candidate descendant is given a score based on the merit function χ = i 1/R −1 i where Ri is the binding energy rank of particle i within the progenitor.DM particles with higher binding energy within the progenitor are given a lower rank.i denotes a sum for all the particles within the candidate descendant.
• Amongst all the candidate descendants, the final unique descendant is chosen to be the one with the highest score.This essentially ensures that the unique descendant has the highest likelihood of retaining the most bound DM particles that resided within the progenitor.
From the subfind-subhalo merger trees, we use the ones that only consist of central subfind-subhalos (most massive within a FOF or bFOF) and construct the corresponding FOF/ halo merger trees and bFOF/galaxy merger trees.We then trace the growth of BHs along these merger trees, and the outcomes of this analysis are elaborated upon in the subsequent sections.

RESULTS I: BLACK HOLE MASS ASSEMBLY IN HIGH-RESOLUTION ZOOMS
We start our analysis by looking at the growth history of 1.5 × 10 3 M⊙/h DGBs within the GAS_BASED suite.We trace their growth along halo merger trees (see Section 2.5) from the time of their formation to when they assemble higher mass (1.25 × 10 4 , 1 × 10 5 & 8 × 10 5 M⊙/h) descendant BHs.We choose these descendant BH masses as they encompass the target gas mass resolutions of our lower resolution (Lmax = 11 & 10) zooms.These are also comparable to typical gas mass resolutions of cosmological simulations in the existing literature.For example, the TNG100 (Nelson et al. 2018), Illustris (Vogelsberger et al. 2014b,a), EAGLE (Schaye et al. 2015), MassiveBlackII (Khandai et al. 2015), BlueTides (Feng et al. 2016) and HorizonAGN (Kaviraj et al. 2017) simulations have a gas mass resolution of ∼ 10 6 M⊙ and similar values for the seed masses.The relatively smaller volume cosmological simulations such as ROMULUS25 (Tremmel et al. 2017) and TNG50 (Pillepich et al. 2019) have a gas mass resolution of ∼ 10 5 M⊙ with a seed mass of 10 6 M⊙.Recall again that most of these simulations seed BHs simply based on either a constant halo mass threshold, or poorly resolved local gas properties.The results presented in this section will be used in Section 4 to calibrate the stochastic seed model that will represent the gas based 1.56 × 10 3 M⊙/h seeds in the lower-resolution zooms without resolving them directly.
3.1 Evolution of seed forming sites: Rapid metal enrichment after seed formation   Table 1.Spatial and mass resolutions within the zoom region of our simulations for various values of Lmax (see Section 2.4 for the definition).M dm is the mass of a dark matter particle, Mgas is the typical mass of a gas cell (note that gas cells can refine and de-refine depending on the local density), and ϵ is the gravitational smoothing length.The 4th column represents the number of nearest gas cells that are assigned to be BH neighbors.The 5th and 6th columns correspond to the seed mass and seed model used at the different resolutions.Comparing them to the full population of halos at each redshift, we find that even though the DGB forming halos at z = 15 are biased towards lower gas metallicities at fixed halo mass (lower left panel), subsequent evolution of these halos to lower redshifts causes them to become more unbiased at z = 14, 13 & 12.This is due to the rapid metal enrichment of these DGB forming halos depicted in Figure 2.
there exists gas that is simultaneously forming stars but is also metal poor (marked in yellow circles).However, we also find that metal enrichment has already commenced at the immediate vicinity of these DGB forming sites.In other words, DGB formation occurs in halos where metal enrichment has already begun due to prior star formation and evolution, but it has not polluted the entire halo yet.But soon after DGB formation, i.e. within a few tens of million years, we find that the entirety of the regions becomes polluted with metals.
The rapid metal enrichment of DGB forming halos is shown much more comprehensively and quantitatively in Figure 2.Here we show the evolution of halo mass, star forming gas mass, star forming metal poor gas mass and gas metallicity from z ∼ 25 − 7 for all DGB forming halos along their respective merger trees (faded dotted lines).To avoid overcrowding of the plots, we select trees based on the most restrictive seed-ing criterion of Msfmp = 1000 & Mh = 3000, but our general conclusions hold true for other seeding thresholds as well.Not surprisingly, the halo mass (1st row) and star forming gas mass (2nd row) tend to monotonically increase with decreasing redshift on average (thick solid black lines).Note that for individual trees, the halo mass can occasionally decrease with time due to tidal stripping.On more rare occasions, there may also be a sharp drop in the the halo mass at given snapshot followed by a sharp rise back to being close to the original value.This is likely because the FOF finder "mistakenly" splits a larger halo in two at that snapshot.The star forming gas mass can also additionally decrease with time due to the star forming gas being converted to star particles.
Very importantly, the star forming & metal poor gas mass (3rd row of Figure 2) increases initially and peaks at the time of DGB formation, following which it rapidly drops down.This happens independent of the formation redshift, and is due to the rapid metal enrichment depicted in Figure 1.The rapid metal enrichment can be quantitatively seen in the average gas metallicity evolution (4th row of Figure 2).We can see that even prior to the DGB formation, the average gas metallicities already start to increase from the pre-enrichment values (∼ 10 −8 Z⊙), to ∼ 10 −3 Z⊙ at the time of formation.Therefore, even at the time of formation, the average metallicities of halos are already greater than the maximum seeding threshold of 10 −4 Z⊙; however, there are still pockets of star forming gas with metallicities ≤ 10 −4 Z⊙, wherein DGBs form.
In Figure 3, we select halos that form DGBs at z = 15 using gas based seeding parameters Msfmp = 1000 & Mh = 3000, and we show their evolution (orange circles) to z = 14, 13 & 12 on the SFR versus halo mass plane (upper panels) and the gas metallicity versus halo mass plane (lower panels).We compare them to the full population of halos at their respective redshifts (blue points).We investigate how biased these DGB forming halos are compared to typical halos of similar masses.On the SFR versus halo mass plane, the DGB forming halos have similar SFRs compared to halos of similar masses; not surprisingly, this continues to be so as they evolve to lower redshifts.On the metallicity versus halo mass plane, we find that DGB forming halos have significantly lower metallicities compared to halos of similar masses.This is a natural consequence of the requirement that the DGB forming halos have sufficient amounts of metal poor gas.However, due to the rapid metal enrichment of these halos seen in Figures 1 and 2, their descendants at z = 14, 13 & 12 end up having metallicities similar to halos of comparable mass.
The picture that emerges from Figures 1 -3 is one in which DGB-forming halos are generally not a special subset of halos (in terms of properties that persist to lower redshift), but rather they are fairly typical halos that have the right conditions for DGB formation at a special moment in time.
In other words, despite our seeding criterion favoring lowmetallicity, star-forming halos, their descendants still end up with similar SFRs and metallicities compared to the general population of similar-mass halos.While Figure 3 only shows the evolution of DGB-forming halos at z = 15, this general conclusion holds true for DGB-forming halos at all redshifts.
A key consequence is that the descendants of seed forming halos can be well characterized by their halo mass distributions, largely because they are in this transient phase of rapid metal enrichment at the time of seed formation.
We utilize this characteristic of our gas based seeding models to develop the new sub-grid seeding model for lowerresolution simulations in Section 4. Rather than requiring information about detailed properties of the descendant galaxies of these gas based seeding sites, we show in Section 4.2 that most galaxy properties are well reproduced by simply matching the galaxy mass distribution.We then show in Section 4.3 that by additionally imposing a criterion on galaxy environment, we can robustly capture the evolved descendants of seeding sites from our high-resolution simulations.

DGB formation and subsequent growth
We have thus far talked about the DGB forming halos and their evolution.In this subsection, we will focus on the for-mation of the DGBs themselves, and their subsequent growth to assemble higher mass BHs.

Drivers of DGB formation: Halo growth, star formation and metal enrichment
Our gas based seeding criteria identify three main physical processes that govern DGB formation in our simulations, i.e. halo growth, star formation and metal enrichment.Halo growth and star formation tend to promote DGB formation with time, whereas metal enrichment suppresses DGB formation with time.The overall rate of DGB formation at various redshifts is determined by the complex interplay between these three processes.We study this interplay in Figure 4, wherein we show the number of halos satisfying three different criteria: M total , M SF and M SF metal poor correspond to the total halo mass, star forming gas mass, and star forming & metal poor gas mass of halos respectively.Amongst the above three criteria, the one that is most restrictive essentially determines the driving physical process for DGB formation at a given redshift.For example, in the rightmost panel of Figure 4, the dotted lines have the lowest normalization from z ∼ 25−10; this implies that halo growth is primary driver and leads to the production of more DGBs with time.In the 3rd panel from the left, the solid and dashed lines have similar normalization, and both of them are lower than the dotted lines at the highest redshifts; this indicates that star formation is the key driver, which also enhances DGB formation with time.Lastly, in all of the panels, the solid lines have substantially lower normalization than both dashed and dotted lines at the lowest redshifts.In this case, metal enrichment is the primary driver, which leads to slow down and eventual suppression of DGB formation with time.
Comparing the different columns in Figure 4, we note that the gas based seeding parameters ( Mh and Msfmp ) have a strong influence in determining which process dominantly drives DGB formation at various redshifts.For Mh = 3000 and Msfmp = 5 (leftmost panel), halo growth is the key driver for DGB formation from z ∼ 30 − 15; at z ≲ 15, metal enrichment becomes the primary driver and slows down DGB formation.When Mh is fixed at 3000 and Msfmp is increased to 50 or 150 (2nd and 3rd panels respectively), star formation replaces halo growth to become the primary driver for DGB formation at z ∼ 30 − 15; however, metal enrichment continues to be the main driver in slowing down DGB formation at z ≲ 15.Finally, when Msfmp is fixed at 5 and Mh is increased to 10000 (rightmost panels), halo growth becomes the key driver for DGB formation from z ∼ 30 − 10.In this case, metal enrichment takes the driving seat at a lower redshift of z ∼ 10 compared to the cases when Mh = 3000.
To further summarize the above findings from Figure 4, we find that when Mh is 3000, DGB formation is ramped up by either star formation or halo growth until z ∼ 15.After z ∼ 15, it is slowed down by metal enrichment.But when Mh = 10000, the halo mass criterion becomes much more restrictive and halo growth continues to ramp up DGB formation until z ∼ 10 before it is slowed down by metal enrichment.In the next subsection, we shall see the implications of the foregoing on the rates of DGB formation at various redshifts.The lower panels show ratio of the normalizations w.r.t. the dotted lines from the top panel.The line with the smallest normalization determines which of the processes between halo growth versus star formation versus metal enrichment is the key driver for DGB formation at a given epoch.For Mh = 3000, we find that metal enrichment becomes the key driver for (suppressing) DGB formation around z ∼ 13 for all Msfmp values between 5 − 150.However, when Mh = 10000, halo growth continues to be the primary regulator for DGB formation until z ∼ 10, after which metal enrichment takes over.

Formation rates of ∼ 10 3 M⊙ DGBs
The leftmost panel of Figure 5 shows the formation rates of 1.56 × 10 3 M⊙/h DGBs for the different gas based seed models.The interplay between halo growth, star formation and metal enrichment discussed in the previous subsection is readily seen in the DGB formation rates.For Mh = 3000 and Msfmp = 5, 50, 150 & 1000, we find that DGB formation ramps up as the redshift decreases from z ∼ 30 − 15, driven predominantly either by halo growth (for Msfmp = 5) or star formation (for Msfmp = 50, 150 & 1000).As the redshift decreases below z ∼ 15, metal enrichment significantly slows down DGB formation.However, when Mh is increased to 10000 (red line), halo growth continues to ramp up DGB formation till z ∼ 10, after which the suppression of DGB formation due to metal enrichment takes place.Note also that at z ≲ 10, DGB formation is finally strongly suppressed due to metal pollution for all the seed models.This is because most of the newly star forming regions are already metal enriched by then, likely due to stellar feedback dispersing the metals throughout the simulation volume.

3.2.3
Assembly rates of ∼ 10 4 − 10 6 M⊙ BHs from ∼ 10 3 M⊙ seeds The assembly rates of 1.25×10 4 , 1×10 5 & 8×10 5 M⊙/h BHs are shown in 2nd, 3rd and 4th panels of Figure 5 respectively.As in Bhowmick et al. (2021), we find that nearly 100% of the growth of these DGBs is happening via mergers.This is partly due to the M 2 BH scaling of Bondi Hoyle accretion rates, which leads to much slower accretion onto low mass DGBs, and it is consistent with the findings of Taylor & Kobayashi (2014) (see Figure 2 in their paper).
Let us first focus on the impact of this merger dominated growth on the assembly of 1.25 × 10 4 M⊙/h BHs (2nd panel of Figure 5).They generally assemble at rates ∼ 50−80 times lower than the rates at which 1.56 × 10 3 M⊙/h DGBs form.Notably, the trends seen in the DGB formation rates directly reflect upon the rates at which 1.25 × 10 4 M⊙/h BHs assemble.In particular, for Mh = 3000 and Msfmp = 5, 50 & 150, we see an increase in the assembly rates as the redshift decreases from z ∼ 25 − 15 wherein DGB formation is driven by halo growth or star formation.The assembly rates slow down at z ≲ 15 as metal enrichment slows down DGB formation.For a higher value of Mh = 10000, halo growth continues to increase the assembly rates until z ∼ 10, before metal enrichment slows it down.Overall, these results suggest that the interplay of halo growth, star formation and metal enrichment processes that we witnessed on the formation rates of 1.56 × 10 3 M⊙/h DGBs, are also retained in the assembly rates of their higher mass 1.25 × 10 4 M⊙/h descendants.
We also see the assembly of a handful of 1 × 10 5 and 8 × 10 5 M⊙/h BHs (3rd and 4th panels of Figure 5). 1 × 10 5 M⊙/h BHs generally start assembling at z ≲ 15 and 8 × 10 5 M⊙/h BHs assemble at z ≲ 12.However, any potential trends similar to that identified in the previous paragraph for 1.25 × 10 4 M⊙/h descendants, are difficult to discern for the 1×10 5 and 8×10 5 M⊙/h descendants due to very limited statistical power.
Figure 6 shows the host halo masses (denoted by M halo total ) and redshifts at which 1.56 × 10 3 M⊙/h DGBs form (leftmost panel), followed by the assembly of 1.25 × 10 4 M⊙/h and 1 × 10 5 M⊙/h BHs (middle and right panels respectively).Broadly speaking, 1.56 × 10 3 M⊙/h DGBs form in ∼ 10 6.5 − 10 7.5 M⊙/h halos, 1.25 × 10 4 M⊙/h BHs assemble in ∼ 10 7.5 − 10 8.5 M⊙/h haloes, and 1 × 10 5 M⊙/h BHs  We only show data points for a limited set of models to avoid overcrowding.Solid lines show the mean trend and the shaded regions show ±1σ standard deviations.We find that as metal enrichment takes over as the driving force and suppresses DGB formation at lower redshifts, DGBs form in increasingly massive halos.This also drives a similar redshift dependence for the assembly of 1.25 × 10 4 M ⊙ /h BHs.
assemble in ∼ 10 8.5 − 10 9.5 M⊙/h haloes.Therefore, rates of BH growth versus halo growth are broadly similar.This is a natural expectation from merger-dominated BH growth, since the BH mergers crucially depend on the merging of their host halos.Note however that in the absence of our currently imposed BH repositioning scheme that promptly merges close enough BH pairs, we could expect larger differences between the merger rates of BHs and their host halos.
The interplay between halo growth, star formation and metal enrichment at different redshifts (as noted in Section 3.2) profoundly influences the redshift evolution of the halo masses in which the seeding of 1.56 × 10 3 M⊙/h DGBs and assembly of higher-mass BHs take place.Let us first focus on the seeding of 1.56 × 10 3 M⊙/h DGBs (Figure 6: left panel).
We find for Mh = 3000 & Msfmp = 50, 150 that the halo masses steadily increase with time as star formation drives the formation of DGBs.As described in more detail in Appendix B, this is a simple consequence of cosmological expansion, which makes it more difficult for the gas to cool and form stars at later times within halos of a fixed mass.Notably, as metal enrichment gradually takes over at z ≲ 15, the redshift evolution becomes substantially steeper, pushing DGB formation towards even more massive halos at later times.This may seem counterintuitive since we expect more massive halos to have stronger metal enrichment, which should suppress DGB formation within them.However, more massive halos also generally have higher overall star forming gas mass, a portion of which may remain metal poor since star-forming halos are not fully metal enriched instantaneously.As it turns out in our simulations, when metal enrichment increases, it favors DGB formation in more massive halos because they are more likely to have sufficient amount of star forming & metal poor gas mass.For further details on this, the reader can refer to Appendix B. When Mh is increased to 10000, the redshift evolution of DGB forming halo mass is flat until z ∼ 10 since the seed formation is primarily driven by the halo mass criterion.It is only after z ∼ 10 that the DGB forming halo mass starts to steeply increase due to the full influence of metal enrichment.
The above trends directly impact the redshift evolution of the host halo masses in which 1.25 × 10 4 M⊙/h assemble (middle panel of Figure 6).For the model with a stricter halo mass criterion (i.e., Mh = 10000 & Msfmp = 5), the transition in the slope of the M halo total versus redshift relation occurs much later (transition occurs between z ∼ 12 − 10) compared to models with more lenient halo mass criterion Mh = 3000 & Msfmp = 5 − 150 (z ≳ 15).This, again, is because metal enrichment starts to suppress DGB formation much later in the model with stricter halo mass criterion.Finally, for the assembly of 1 × 10 5 M⊙/h BHs, the redshift evolution of the host halo masses cannot be robustly deciphered due to statistical uncertainties.But here too, we see hints of higher host halo masses at lower redshifts in regimes where metal enrichment is the primary driver for (the suppression of) DGB formation.
Overall, the impact of halo growth, star formation and metal enrichment on DGB formation is well imprinted in the redshift evolution of the host halo masses within which their descendant BHs assemble.We shall see in later sections how this fact is going to be crucial in building the new seed model to represent (descendants of) 1.56 × 10 3 M⊙/h DGBs in lower-resolution simulations.

RESULTS II: A NEW STOCHASTIC SEED MODEL FOR LARGER SIMULATIONS
We have thus far traced the growth of low mass (1.56 × 10 3 M⊙/h) DGBs born in regions with dense & metal poor gas, in order to determine the host properties of their highermass (1.25 × 10 4 & 1 × 10 5 M⊙/h) descendant BHs.We will now use these results to build a new stochastic seed model that can represent these 1.56×10 3 M⊙/h DGBs within simulations that cannot directly resolve them.In section 2.3.2,we gave a brief introduction of this seed model and mentioned that this model would rely on a galaxy mass criterion and a galaxy environment criterion.Here we detail the motivation, construction, and calibration of both of these seeding criteria and demonstrate that the resulting model can reproduce reasonably well the high-resolution, gas based seed model predictions in lower-resolution simulations.Note that some of our gas based seed parameter combinations do not produce enough descendant BHs in our zoom region to perform a robust calibration.These include Mh = 3000; Msfmp = 1000 for the 1.25 × 10 4 M⊙/h descendants and Mh = 3000 & 10000; Msfmp = 150 & 1000 for the 1 × 10 5 M⊙/h descendants.Therefore, we shall not consider these parameter values hereafter.
In the stochastic seed model, we will directly seed the descendants with initial masses set by the gas mass resolution (1.25 × 10 4 & 1 × 10 5 M⊙/h in Lmax = 11 & 10 respectively).As already mentioned in Section 2.3.2, because these massive seeds are meant to represent descendants of 1.56 × 10 3 M⊙/h DGBs that cannot be resolved directly, we refer to the former as "extrapolated seed descendants" or ESDs with initial mass denoted by M ESD seed .In other words, our new stochastic seeding prescription will place ESDs with M ESD seed set by the gas mass resolution of 1.25 × 10 4 or 1 × 10 5 M⊙/h, but they are intended to represent our gas based seed models with unresolvable 1.56×10 3 M⊙/h DGBs.To that end, the next few subsections address the following question: How do we build a new seed model that can capture the unresolved growth phase from M DGB seed = 1.56 × 10 3 M⊙/h to M ESD seed = 1.25 × 10 4 or 1 × 10 5 M⊙/h?
4.1 Seeding sites for ESDs: "Best Friends of Friends (bFOF)" galaxies It is common practice in many (but not all) cosmological simulations to place one seed per halo at a given time step.
The advantage to this is that the halo properties (particularly the total halo mass) show much better resolution convergence compared to the local gas properties.However, this is not quite realistic, as halos typically have a significant amount of substructure and can therefore have multiple seeding sites at a given time.Despite this, subhalos are not typically used to seed BHs, likely because on-the-fly subhalo finders like SUBFIND are much more computationally expensive compared to on-the-fly halo finders like the FOF finder.
Recall that in our gas based seed model, 1.56 × 10 3 M⊙/h DGBs were also seeded as "one seed per halo".But even in this case, as these smaller seed-forming halos and their BHs undergo mergers, configurations with multiple 1.25 × 10 4 or 1 × 10 5 M⊙/h BHs per halo tend to naturally emerge.We emulate this in our new seed model by seeding ESDs within bFOFs introduced in Section 2.3.2.The linking length for the bFOFs was chosen to be 1/3rd of the value adopted for standard FOF halos (which is 0.2 times the mean particle separation).This value was chosen after exploring a number of possibilities.On one hand, a much larger linking length does not resolve the substructure adequately.On the other hand, if the linking length is much smaller, a significant number of FOFs end up not containing any bFOFs.
Figure 7 summarizes the bFOF properties in relation to the familiar FOF halos at z = 8.The leftmost panel shows the relationship between the masses of FOFs and bFOFs.Within a FOF, the most massive bFOF is assigned as the "central bFOF" (blue circles) and the remaining bFOFs are assigned as the "satellite bFOFs" (orange circles).The central bFOFs are about ∼ 7 times less massive than the host FOF.Not surprisingly, the satellite bFOFs span a much wider range of masses all the way down to the lowest possible masses at the bFOF/FOF identification limit (≥ 32 DM particles).The middle panel of Figure 7 shows the bFOF occupation statistics for FOFs of different masses.More massive FOFs tend to host a higher number of bFOFs; the most massive ∼ 3 × 10 10 M⊙/h FOF has about ∼ 4 × 10 3 bFOFs.We can see that in addition to the central bFOF, the satellite bFOFs can also contain BHs (orange, green and maroon points in the middle panel).To that end, the right panel of Figure 7 shows the total BH occupations inside FOFs and bFOFs as a function of their respective masses.We can clearly see that while individual FOFs can contain multiple BHs (up to a few tens), the vast majority of individual bFOFs contain 0 or 1 BHs.In fact, amongst the ∼ 30000 bFOFs at z = 8, only 12 of them have more than 1 BH.These results generally hold true at all redshifts.
By building our seed model based on bFOFs instead of FOFs (i.e. one ESD per bFOF), we expect to naturally place multiple 1.25 × 10 4 M⊙/h or 1 × 10 5 M⊙/h ESDs in individual halos.As a result, we will successfully capture situations where multiple 1.25 × 10 4 M⊙/h or 1 × 10 5 M⊙/h descendant BHs assemble from 1.56 × 10 3 M⊙/h DGBs in a single halo within close succession.As mentioned in Section 2.3.2,these bFOFs are essentially the sites where high-z (proto)galaxies reside; we therefore use the phrase "galaxies" to refer to these bFOFs.All this motivates us to use bFOFs as seeding sites (instead of FOFs) in our new stochastic seed models that would be able to represent the lowest mass (∼ 10 3 M ⊙ /h) DGBs in lower resolution simulations that cannot directly resolve them.These bFOFs are essentially sites of (proto)galaxies residing within the high-z halos.We hereafter refer to these bFOFs as "galaxies".We find that for all the models, there is a transition in the slope of the mean trend at redshift z ≡ ztrans ∼ 12 − 13, which is driven by the suppression of seed formation by metal enrichment.The trends are reasonably well fit by a double power law (dashed lines).These fits are used in our stochastic seed models that directly seed the descendants (referred to as "extrapolated seed descendants or ESDs) at 1.25 × 10 4 M ⊙ /h or 1 × 10 5 M ⊙ /h within the lower resolution Lmax = 11 & 10 zooms, respectively.To obtain fits in the top row, we first assumed ztrans = 13.1 for Mh = 3000, Msfmp = 5, 50 & 150, and ztrans = 12.1 for Mh = 10000, Msfmp = 5 via a visual inspection.The fits were then performed to obtain the slopes at z < ztrans and z > ztrans using scipy.optimize.curve_fit.The final fitted parameters are shown in versus redshift relations (Figure 8).ztrans, Mtrans, α, & β are obtained by fitting the mean trends using the double power-law function shown in Equation 5. σ is the standard deviation.Columns 8 to 10 show the parameter values for the galaxy environment criterion (i.e., p 0 , p 1 and γ).These are obtained by exploring a range of possible values to find the best match with the small-scale BH clustering and overall BH counts predicted by the gas based seed model.

Building the galaxy mass criterion
Recall from Section 3.1 that because DGB formation in our gas based seeding model occurs during a transient phase of rapid metal enrichment in halos that are otherwise fairly typical, their descendents have metallicities (and SFRs) similar to that of typical halos with similar total masses.This motivates us to first explore low-resolution simulations with seeding criterion that simply matches the galaxy mass distribution of seeding sites in our high-resolution, gas based models.We refer to this seeding criterion as the galaxy mass criterion; notably, this differs from typical halo-mass-based seeding models in the use of a distribution of host mass thresholds rather than a single value.The corresponding simulations are referred to as STOCHASTIC_MASS_ONLY.To calibrate our seed models, we first determine the galaxy masses (M galaxy total ) in which 1.25×10 4 M⊙/h and 1×10 5 M⊙/h BHs assemble from 1.56 × 10 3 M⊙/h DGBs within our GAS_BASED simulations; these are shown in Figure 8.Let us first focus on the assembly of 1.25 × 10 4 M⊙/h descendants (Figure 8, top panels).Similar to that of M halo total versus redshift relations (Figure 6, middle panel), the M galaxy total versus redshift relations show features that reflect the interplay between halo growth, star formation and metal enrichment in influencing DGB formation.For Mh = 3000, Msfmp = 50 & 150, we see that the slope of redshift evolution of the mean (denoted by M galaxy total and shown as solid lines) undergoes a gradual transition between z ∼ 13 − 15.This corresponds to the slow down of DGB formation due to metal enrichment.When Mh = 10000 & Msfmp = 5, this transition occurs at comparatively lower redshifts (z ∼ 12 − 10) as the influence of metal enrichment starts later due to the higher Mh .We then fit the mean trend by a double power law (dashed lines in Figure 8, upper panels) given by

Galaxy masses at assembly of
ztrans roughly marks the transition in the driving physical process for DGB formation.For z > ztrans, halo growth or star formation primarily drives DGB formation; for z < ztrans, metal enrichment takes over as the primary driver to suppress DGB formation.Mtrans is the value of M galaxy total at the transition redshift.Finally, α and β are the slopes of the M galaxy total versus redshift relation at z > ztrans and z < ztrans respectively.To simplify our fitting procedure, we first select ztrans for each of the cases via visual inspection and determine Mtrans by interpolating the M galaxy total versus redshift relation.We then fit for α and β using the scipy.optimize.curve_fitpython package.Note that the double power-law function assumes a sharp transition in the M galaxy total versus redshift relation at z = ztrans.However, as we can see in Figure 8, this transition occurs much more gradually as metal enrichment starts to slow down and eventually suppresses DGB formation.Nevertheless, the double power-law model offers a simple (albeit approximate) framework to capture the intricate convolution of the impact of halo growth, star formation and metal enrichment that leads to the initial rise and eventual suppression of DGB formation.
The values of ztrans, Mtrans, α and β for the different gas based seed models are listed in the top four rows of Table 2.We choose ztrans = 13.1 for Mh = 3000, Msfmp = 5, 50 & 150.ztrans is the same for all three Msfmp values to encode that the slow down of seed formation due to metal enrichment starts at similar redshifts for all these models.For Mh = 10000, Msfmp = 5, we choose a lower transition redshift of ztrans = 12.1 as halo growth continues to drive up seed formation up to lower redshifts compared to the models with Mh = 3000.
The impact of Mh and Msfmp on Mtrans, α and β is noteworthy.As Mh or Msfmp increases, the value of Mtrans also increases to generally reflect the fact that descendant BHs of a fixed mass are assembling in more massive halos.α is significantly more sensitive to Msfmp compared to Mh ; this is not surprising as α corresponds to the regime where metal enrichment primarily governs seed formation.A higher value of Msfmp produces a steeper α, as it leads to stronger suppression of DGB formation by metal enrichment.Lastly, β is impacted by both Msfmp and Mh .This also makes sense because β corresponds to the regime where either star formation or halo growth can drive seed formation.Increasing Msfmp enhances the role of star formation, and increasing Mh enhances the role of halo growth.Generally, we see that as the number of DGBs forming at the highest redshifts is de-creased due to increase in Mh or Msfmp , β tends to go from negative to positive values thereby favoring higher M galaxy total at higher redshifts.This is likely because when BHs are very few, merger driven growth is slow and galaxies have more time to grow via DM accretion between successive mergers.As a result, galaxy growth is slightly faster than merger dominated BH growth at these highest redshifts where there are very few BHs.
We now turn our attention to the assembly of 10 5 M⊙/h descendant BHs (bottom panels of Figure 8).In this case, we do not have adequate statistics to robustly determine the M galaxy total versus redshift relations.We can see that  We find that the STOCHASTIC_MASS_ONLY simulation (p 0 = 1 and p 1 = 1) significantly underestimates the small-scale clustering and overestimates the BH counts compared to the GAS_BASED simulations.As we introduce the galaxy environment criterion (STOCHASTIC_MASS_ENV) and decrease p 0 and p 1 to favor seeding in richer environments, we find that the small-scale clustering is enhanced and the BH counts decrease.The model with p 0 , p 1 = 0.1, 0.3 produces the best match for the small-scale clustering as well as the BH counts.2. The thin black dashed lines in the right three panels show STOCHASTIC_MASS_ONLY simulations that assume zero scatter in the galaxy mass criterion i.e σ = 0.The thinnest black solid line in the same panels show simulations that assume a constant galaxy mass threshold fixed at the mean of the distributions from the leftmost panels (see vertical line).Amongst all the simulations that use stochastic seeding, only the STOCHASTIC_MASS_ENV simulations are able to successfully capture the GAS_BASED simulation predictions.
Figure 13.Same as Figure 12, but for the assembly of 1 × 10 5 M ⊙ /h BHs.The statistics are more limited compared to the previous figure.The shaded grey regions correspond to z > 13.1, wherein we could not calibrate the galaxy mass criterion due to lack of data points in Figure 8.But at z < 13.1 where calibration was possible, we find that the STOCHASTIC_MASS_ENV simulations (at a resolution of Lmax = 10) do reasonably match with the BH counts predicted by the Lmax = 12 GAS_BASED simulations.
data points only exist at z ≲ 13, wherein M galaxy total tends to increase with decreasing redshift, except for Mh = 10000, Msfmp = 5, where statistics are too poor to reveal any useful trends).Here, we only fit for α after assuming the same values of ztrans that were used for the assembly of 1.25 × 10 4 M⊙/h BHs (dashed lines in Figure 8, lower panels).The best fit values are shown in the bottom three rows of Table 2. Overall, we should still keep in mind that there are very few 10 5 M⊙/h descendants.Therefore, these fits are not very statistically robust.Nevertheless, they will still be useful to test our stochastic seed models in the next subsection.
In addition to the mean trends, the M galaxy total versus redshift relations show a significant amount of scatter (σ).This is defined to be the 1 sigma standard deviation shown by the shaded regions in Figure 8. Generally we see that the scatter does not have a strong redshift evolution.The overall mean scatter (averaged over the entire redshift range) for the different gas based seed models is shown in the seventh column of Table 2.The scatter decreases slightly as we make the gas based seeding criterion more restrictive by increasing Mh or Msfmp .This is likely because for more restrictive seed models, assembly of higher-mass BHs occurs in more massive galaxies for which the underlying galaxy mass function is steeper.For the same reason, the scatter is also smaller for the assembly of 1 × 10 5 M⊙/h BHs compared to that of 1.25 × 10 4 M⊙/h BHs.

Properties of galaxies that form ESDs: Comparison with gas based seed model predictions
We finally use the M galaxy total versus redshift relations to formulate our galaxy mass criterion.More specifically, we place ESDs of mass 1.25 × 10 4 M⊙/h and 1 × 10 5 M⊙/h based on minimum galaxy mass thresholds.The threshold value (M th ) is stochastically drawn from redshift dependent distributions described by a log-normal function, i.e ∝ exp [− 1 2 (log 10 M 2 th − µ 2 )/σ 2 ], with mean µ ≡ M galaxy total (z) described by the double power-law fits shown in Figure 8 and Table 2.The standard deviation σ is shown in Table 2 (column 7).
In Figure 9, we show the 1D distributions (marginalized over all redshifts until z = 7) of the various galaxy properties wherein 1.25 × 10 4 M⊙/h descendants assemble (i.e., total mass, stellar mass, SFRs, gas metallicities and environments).We compare the predictions for the GAS_BASED simulations that assemble the 1.25 × 10 4 M⊙/h descendants from 1.56 × 10 3 M⊙/h DGBs (colored lines), and the STOCHASTIC_MASS_ONLY simulations that directly seed the 1.25 × 10 4 M⊙/h ESDs (grey lines).We can clearly see that after calibrating the STOCHASTIC_MASS_ONLY simulations to reproduce the total galaxy masses (1st panels from the left) predicted by the GAS_BASED simulation, it also broadly reproduces the baryonic properties of the galaxies such as stellar masses, SFRs and metallicities (2nd, 3rd and 4th panels).This further solidifies our findings from Figures 1 to 3, that the galaxies wherein the 1.25 × 10 4 M⊙/h descendants assemble are reasonably well characterized by their total mass alone.Recall that this is attributed to the transience of the rapid metal enrichment phase in which halos form 1.56 × 10 3 M⊙/h DGBs in the GAS_BASED suite.
However, we see that the galaxy mass criterion places the ESDs in sparser environments (hosts with fewer neighboring halos) compared to the GAS_BASED simulation predictions (rightmost panels in Figure 9).This reflects the fact that when the low-mass DGBs assemble higher-mass BHs through merger-dominated BH growth, their descendants naturally grow faster in regions with more frequent major halo and galaxy mergers.Therefore, for a given distribution of total galaxy masses, those living in richer environments are more likely to contain higher-mass descendant BHs.These results for the assembly of 1.25×10 4 M⊙/h BHs also hold true for the assembly of 1 × 10 5 M⊙/h BHs, as shown in Figure 10.In the next section, we develop an additional seeding criterion to account for this small-scale clustering of the assembly sites of higher mass descendants in our GAS_BASED models.

Building the galaxy environment criterion
In this section, we describe an additional galaxy environment criterion to favor the placement of ESDs in galaxies in richer environments (at fixed galaxy mass).We then explore its implications on their two-point clustering and the overall BH population.
First, we assume that any potential seeding site with two or more neighbors (N ngb ≥ 1) will always seed an ESD.Potential seeding sites with zero or one neighbors will seed an ESD with a probability 0 ≤ P env seed ≤ 1.For these cases, we assign a different linear dependence of P env seed on the galaxy mass M galaxy total , such that the probability for any potential seeding site to actually form an ESD is given by .
(5) Here, p0 and p1 denote the seeding probability in galaxies with 0 and 1 neighbors respectively, at the mean M galaxy total of the total mass distributions of galaxies wherein the descendant BHs assemble.
The parameter γ defines the slope for the linear dependence of P env seed on the galaxy mass; it varies slightly between the underlying gas based seed models used for calibration, as listed in Table 2.The motivation for this linear dependence and the adopted γ values are described in Appendix A. But to briefly summarize the main physical motivation, we use a γ > 0 to encode the natural expectation that for fixed N ngb , descendants will grow faster within galaxies with higher total mass.This is because N ngb , by definition, counts the number of halos with masses higher than the host halo mass of the galaxy that are within 5Rvir.As a result, a higher-mass galaxy with N ngb neighbors is in a more overdense region than a lower-mass galaxy with the same N ngb neighbors.
We add the galaxy environment criterion to the already applied galaxy mass criterion.We shall refer to the resulting suite of simulations as STOCHASTIC_MASS_ENV.In Figure 11, we systematically compare the GAS_BASED simulations (maroon lines) to the STOCHASTIC_MASS_ENV simulations that trace 1.25 × 10 4 M⊙/h descendants (grey lines) for a range of parameter values for p0 and p1.We start with p0 = 1, p1 = 1, which is basically the STOCHASTIC_MASS_ONLY simulation (lightest grey lines), and find that it significantly underestimates the two point clustering (by factors up to ∼ 5) of the ≥ 1.25 × 10 4 M⊙/h BHs compared to the GAS_BASED simulations (lower left three panels).At the same time, the STOCHASTIC_MASS_ONLY simulation also over-estimates the overall counts of the ≥ 1.25×10 4 M⊙/h BHs (lower right most panel).Upon decreasing the probabilities as p0 < p1 < 1, we can see that the two-point clustering starts to increase while the overall BH counts simultaneously decrease.For p0 = 0.1 & p1 = 0.3, we produce the best agreement of the two-point clustering as well as the overall BH counts.Further decreasing p0 and p1 mildly enhances the two-point clustering, but leads to too much suppression of the BH counts compared to GAS_BASED simulations.Therefore, we identify p0 = 0.1 & p1 = 0.3 as the best set of parameter values for the gas based seeding parameters [ Mh ,Msfmp = 3000,150].
As a caveat, we must also note in Figure 11 that while p0 = 0.1 & p1 = 0.3 produces the best agreement with the two point correlation function between GAS_BASED and STOCHASTIC_MASS_ENV simulations, it does place the ESDs in galaxies with somewhat higher N ngb compared to the GAS_BASED simulations (upper right panels).To that end, recall that N ngb only measures the galaxy environment at a fixed separation scale of D ngb = 5 Rvir (revisit Section 2.3.2).Therefore, we cannot expect N ngb to fully determine the two-point correlation profile, which measures the environment over a wide range of separation scales (∼ 0.01−1 Mpc/h in our case).In other words, one could come up with alternative set of galaxy environment criteria (for example, using N ngb within a different D ngb ̸ = 5 Rvir or even multiple set of N ngb values within different multiple D ngb values) and still be able simultaneously reproduce the two-point correlation function as well as the BH counts.Finding all these different possibilities of galaxy environment criteria is not the focus of this work.Instead, our objective here is simply to demonstrate that to reproduce the GAS_BASED simulation predictions, we need a galaxy environment criterion to favor the placing of ESDs in galaxies with richer environments.Furthermore, we showed that by applying a galaxy environment criterion that brings the two point correlation function into agreement with the GAS_BASED simulations, our STOCHASTIC_MASS_ENV simulations achieve the primary goal for our sub-grid seeding model: faithfully representing the descendants of 1.56 × 10 3 M⊙/h seeds produced in the GAS_BASED simulations.
Thus far we have calibrated a STOCHASTIC_MASS_ENV simulation to reproduce the 1.25 × 10 4 M⊙/h descendant BH population from a gas based seed model with [ Mh ,Msfmp = 3000,150] and M seed = 1.56 × 10 3 M⊙/h.We can perform the same calibration for the remaining gas based seed models in our suite, and for the assembly of 1 × 10 5 M⊙/h descendant BHs in addition to 1.25 × 10 4 M⊙/h descendants.The resulting p0 and p1 values for all the gas based seeding parameters are listed in Table 2. Broadly speaking, we require p0 ∼ 0.1 − 0.2 and p1 ∼ 0.3 − 0.4 to simultaneously reproduce the gas based seed model predictions for the small-scale clustering and BH counts of the descendant BHs.Slightly higher p0 and p1 values are favored for more restrictive gas based criteria and for higher-mass descendant BHs, possibly because in both cases the descendant BHs assemble in higher-mass galaxies.Note that higher-mass galaxies tend to be more strongly clustered than lower mass galaxies.As a result, during the calibration of the STOCHASTIC_MASS_ENV simulations, the galaxy mass criterion alone will already produce a slightly stronger clustering for the ESDs.This lessens the burden on the galaxy environment criterion to achieve the desired clustering predicted by the gas based seed models.
In  12), we calibrate models corresponding to [ Mh ,Msfmp = 3000,50 & 3000,150] and [ Mh , Msfmp = 10000, 5].We exclude the most lenient gas based seed parameters of [ Mh ,Msfmp = 3000,5], since it leads to a significant portion of 1.25 × 10 4 M⊙/h descendants to assemble in galaxies that cannot be resolved in the Lmax = 11 runs.For the remaining gas based seed parameters, the STOCHASTIC_MASS_ENV simulations well reproduce the GAS_BASED simulation predictions for the BH counts, two-point correlation functions and merger rates of > 1.25 × 10 4 M⊙/h BHs.
For M ESD seed = 1 × 10 5 M⊙/h (Figure 13), we only do this exercise for the most lenient gas based seed models i.e. [ Mh ,Msfmp = 3000,5 & 3000,50].This is because for the stricter gas based seed models, there are too few BHs produced overall.Here, the STOCHASTIC_MASS_ENV simulations well reproduce the counts of > 1×10 5 M⊙/h BHs at z < 13.1 (wherein there is enough data to calibrate the slope α; revisit Figure 8, bottom row).For z > 13.1, β = 0 is assumed due to the absence of enough data points to perform any fitting; here, the STOCHASTIC_MASS_ENV seed model overestimates the number of > 1 × 10 5 M⊙/h BHs and their high-z merger rates.Regardless, where enough data exist for robust calibration, these results imply that with a calibrated combination of galaxy mass criterion and galaxy environment criterion, the STOCHASTIC_MASS_ENV simulations can well reproduce the GAS_BASED simulation predictions for a wide range of gas based seeding parameters.
Figures 12 and 13 also disentangle the impact of the various components of our final stochastic seed model, and they highlight the importance of each component in the successful representation of the gas based seed models.As seen previously, the STOCHASTIC_MASS_ONLY seed model overestimates the BH counts and merger rates by factors between ∼ 2 − 5. Next, when we assume zero scatter in the galaxy mass criterion (Σ = 0, black dashed lines), it further overestimates the BH counts and merger rates up to factors of ∼ 1.5 (grey solid versus black dashed lines).Finally, if we remove the redshift dependence in the galaxy mass criterion and instead assume a constant threshold value (thin dotted lines), the BH counts and merger rates monotonically increase with time.Not surprisingly, this is because such a model cannot capture the suppression of seed formation due to metal enrichment.
Overall, we can clearly see that in order to represent our Lmax = 12 gas based seed models forming 1.56 × 10 3 M⊙/h BH seeds in lower-resolution, larger-volume simulations, we need a stochastic seed model that places their resolvable descendant BHs (ESDs) using the following two criteria • A galaxy mass criterion with a galaxy mass seeding threshold that is drawn from a distribution that evolves with redshift.The redshift evolution encodes the impact of star formation, halo growth and metal encrichment on seed formation.
• A galaxy environment criterion that favors seeding within galaxies living in rich environments.This encodes the impact of the unresolved, hierarchical-merger-dominated growth of these seeds from M DGB seed to M ESD seed .

Accounting for unresolved minor mergers
We have thus far successfully built a new stochastic BH seed model that places ESDs which represent the ∼ 10 4 − 10 5 M⊙/h descendants of ∼ 10 3 M⊙/h DGBs in simulations that cannot directly resolve these lowest-mass BHs.In this section, we model the subsequent growth of these ESDs.To do so, we must first account for one additional contribution to their growth: unresolved minor mergers.
Recall from Bhowmick et al. (2021) that the earliest growth of these ∼ 10 3 M⊙/h DGBs is completely driven by BH mergers, with negligible contribution from gas accretion.For our present purposes, these BH mergers can be classified into three types: • Heavy mergers: In these mergers, both the primary and secondary black holes (with masses M1 and M2 respectively) are greater than the mass of the ESDs (M1 > M2 > M ESD seed ).Therefore, these mergers will be fully resolvable within STOCHASTIC_MASS_ENV simulations.
• Light major mergers: In these mergers, both the primary and secondary black holes are less massive than the ESDs (M DGB seed < M2 < M1 < M ESD seed ).These mergers cannot be resolved in STOCHASTIC_MASS_ENV simulations.However, these are the mergers that lead to the initial assembly of the descendants represented by the ESDs, such that their contribution to BH assembly is already implicitly captured within the stochastic seed model.
• Light minor mergers: In these mergers, the primary black hole is more massive than the ESD mass, but the secondary black hole is not (M1 > M ESD seed & M DGB seed < M2 < M ESD seed ).These mergers cannot be resolved in STOCHASTIC_MASS_ENV simulations, and their contributions to BH mass assembly cannot be captured by the galaxy mass criterion or the galaxy environment criterion.Therefore, we must modify our prescription to explicitly add their contribution to the growth of the ESDs.
We first determine the contribution of light minor mergers within the GAS_BASED simulations.Here we only show the results for M ESD seed = 1.25 × 10 4 M⊙/h, since there are too few 1 × 10 5 M⊙ BHs formed in the GAS_BASED simulations to robustly perform this analysis for the latter.The light minor mergers are thus defined to have M1 > 1.25 × 10 4 M⊙/h and 1.56 × 10 3 < M2 < 1.25 × 10 4 M⊙/h, and heavy mergers are defined to be those with M1 > M2 > 1.25×10 4 M⊙/h.In Figure 14, we compare the contributions of the light minor mergers and heavy mergers to the growth of > 1.25 × 10 4 M⊙/h BHs in the GAS_BASED simulations.The light minor mergers are ∼ 30 times more frequent than the heavy mergers (top row); this is simply due to higher overall number of MBH < 1.25 × 10 4 M⊙/h BHs compared to M bh > 1.25 × 10 4 M⊙/h BHs.When we compare the mass growth contributed by light minor mergers versus heavy mergers (middle row), we find that the light minor mergers dominate at the highest redshifts (z ∼ 15 − 19).As BH growth proceeds over time, the mass growth contributed by heavy mergers increases and eventually exceeds that of the light minor mergers at z ≲ 12, even though the overall merger rates are still dominated by light minor mergers.This is because the masses of the BHs involved in the heavy mergers continue to increase with time.Eventually, when new DGB formation is strongly suppressed by metal enrichment, the mass growth due to the light minor mergers becomes small.We clearly see these trends in the third row of Figure 14 which shows ∆M light minor defined as the amount of mass growth due to light minor mergers between successive heavy merger events.∆M light minor monotonically decreases with redshift and its evolution is reasonably well fit by power laws.
We use the power law fits of ∆M light minor (shown in the last row of Figure 14) to determine the missing BH growth contribution from light minor mergers.More specifically, for each heavy merger event in a STOCHASTIC_MASS_ENV simulation, we added extra mass growth of ∆M light minor due to light minor mergers, calculated based on these power law fits.Figure 15 shows that it is only after the inclusion of these unresolved light minor mergers, we achieve reasonable agreement between the BH mass functions predicted by the GAS_BASED and the STOCHASTIC_MASS_ENV simulations (colored dashed lines versus solid black lines).Note that at masses between M ESD seed and 2M ESD seed , the STOCHASTIC_MASS_ENV simulations will inevitably continue to slightly underpredict the mass functions.This is because within our prescription, the contribution from light minor mergers does not occur until the first heavy merger event between the ESDs.

SUMMARY AND CONCLUSIONS
In this work, we tackle one of the longstanding challenges in modeling BH seeds in cosmological hydrodynamic simulations: how do we simulate low mass (≲ 10 3 M⊙) seeds in simulations that cannot directly resolve them?We address this challenge by building a new sub-grid seed model that can stochastically seed the smallest resolvable descendants of low mass seeds in lower-resolution simulations (hereafter referred to as "stochastic seed model").Our new seed model is motivated and calibrated based on highest resolution simulations that directly resolve the low mass seeds.With this new tool, we have bridged a critical gap between high-resolution simulations that directly resolves low mass seeds, and largervolume simulations that can generate sufficient numbers of BHs to compare against observational measurements.This paves the way for making statistically robust predictions for signatures of low-mass seeds using cosmological hydrodynamic simulations, which is a crucial step in preparation for the wealth of observations with ongoing JWST, as well as upcoming facilities such as LISA.
The core objective of this work has been to determine the key ingredients needed to construct such a seed model.To do this, we study the growth of the lowest mass 1.56×10 3 M⊙/h seeds that were fully resolved using highest resolution zoom simulations.These seeds are placed in halos containing gas that is simultaneously star forming as well as metal poor (< 10 −4 Z⊙), consistent with proposed low mass seeding candidates such as Pop III stellar remnants.We trace the growth of these 1.56 × 10 3 M⊙/h seeds until they assemble descendants with masses that are close to different possible gas mass resolutions (∼ 10 4 − 10 6 M⊙) expected in larger cosmological volumes.We characterize the environments in which these Figure 14.Comparing the contributions of heavy mergers versus light minor mergers to the merger driven BH growth within the GAS_BASED suite.The green lines show heavy mergers where the masses of both primary and secondary BHs are ≥ 1.25 × 10 4 M ⊙ /h.The orange lines show the light minor mergers where the secondary BH mass is < 1.25 × 10 4 M ⊙ /h but the primary BH mass is ≥ 1.25 × 10 4 M ⊙ /h.The olive lines show the total contribution from both types of mergers i.e. all mergers with primary BHs ≥ 1.25 × 10 4 M ⊙ /h.The different columns show different gas based seed models.Middle panels show the mass growth rate due to mergers as a function of redshift, which is defined as the total mass of all merging secondary BHs per unit redshift.The light minor mergers show a dominant contribution at z ≳ 11, whereas heavy mergers tend to be more prevalent at z ≲ 11.The bottom panels show the mass growth (∆M light minor ) due to the light minor mergers between successive heavy mergers.This contribution needs to be explicitly included in simulations that use the stochastic seed models, to produce BH growth consistent with the GAS_BASED simulations.
descendants assemble; for e.g. they assemble in halos with masses ranging from ∼ 10 7 − 10 9 M⊙.The results are used to build our stochastic seed model that directly seeds these descendants in lower resolution simulations.To distinguish against the actual 1.56 × 10 3 M⊙/h seeds, we refer to the "seeds" formed by the stochastic seed model as "extrapolated seed descendants" or ESDs (with mass M ESD seed ).We consider 1.25 × 10 4 & 1 × 10 5 M⊙/h ESDs that are aimed at faithfully representing the descendants of 1.56 × 10 3 M⊙/h seeds born out of star forming and metal poor gas.Specifically, we explore a wide range of stochastic seed models on lower resolution versions of our zoom region, and determine the crucial ingredients required to reproduce the results of the highest resolution zoom simulations that explicitly resolve the 1.56 × 10 3 M⊙/h seeds.Following are the key features of our new seed model: • We seed the ESDs in high-z (proto)galaxies which are bound substructures within high-z halos.Since halos can contain multiple galaxies, this naturally allows the placement of multiple ESDs per halo.This is important because even if 1.56 × 10 3 M⊙/h seeds are placed as one seed per halo, their subsequent hierarchical growth inevitably assembles multiple higher mass descendants within individual halos.
• We introduce a galaxy mass criterion which places the ESDs based on galaxy mass thresholds.These thresholds are stochastically drawn from galaxy mass (including DM, stars and gas) distributions wherein 1.25 × 10 4 & 1 × 10 5 M⊙/h BHs assemble from 1.56 × 10 3 M⊙/h seeds.We find that the galaxy mass criterion effortlessly also replicates the baryonic properties of the galaxies at the time of assembly of the seed descendants, including stellar mass, SFRs, and gas metallicities.This is because, although 1.56 × 10 3 M⊙/h seeds form within halos exhibiting a bias towards lower metallicities in comparison to typical halos of similar masses, they undergo a transient phase characterized by rapid metal enrichment.As a result, the higher mass 1.25 × 10 4 & 1 × 10 5 M⊙/h descendants end up in unbiased halos with metallicities similar to halos with similar masses.The redshift dependence of the distributions underlying the galaxy mass thresholds capture the complex influence of processes such as halo growth, star formation and metal enrichment, on the formation of 1.56 × 10 3 M⊙/h seeds.
• However, if our stochastic seed model only contains the galaxy mass criterion, it underestimates the two-point clustering (at scales of 0.01 − 0.1 Mpc/h) of ≥ 1.25 × 10 4 & 1 × 10 5 M⊙/h BHs by factors of ∼ 5.At the same time, it overestimates the BH abundances and merger rates of ≥ 1.25 × 10 4 & 1 × 10 5 M⊙/h BHs by factors up to ∼ 5.This is a direct consequence of the fact that in our highest resolution zooms, the 1.56 × 10 3 M⊙/h seeds grow primarily via BH-BH mergers.As a result, the assembly of the higher mass descendants is more efficient in galaxies with richer en- vironments (higher number of neighboring halos) with a more extensive merger history.This cannot be captured solely by the galaxy mass criterion.
• To successfully capture the two-point clustering of the ≥ 1.25 × 10 4 & 1 × 10 5 M⊙/h descendant BHs, we introduce a galaxy environment criterion, where we assign seeding probabilities less than unity for galaxies with ≤ 1 neighbors.By doing this, we preferentially place ESDs in richer environments, which enhances the two-point clustering.We demonstrate that by adding a galaxy-environment criterion that is calibrated to produce the correct two-point clustering, our stochastic seed models can simultaneously also reproduce the BH abundances and merger rates of the ≥ 1.25 × 10 4 & 1 × 10 5 M⊙/h BHs.
• Lastly, the BH growth in our stochastic seed models is underestimated due to the absence of light minor mergers, defined as those involving a resolved primary (M1 > M ESD seed ) but an unresolved secondary (M2 < M ESD seed ).We compute the contribution of these mergers from the highest resolution zooms that resolve the 1.56 × 10 3 M⊙/h seeds, and explicitly add them to the simulations that use the stochastic seed models.It is only after adding the contribution from light minor mergers, do our stochastic seed models achieve success in accurately reproducing the BH mass functions predicted by the highest resolution zooms.
Overall, our stochastic seed model requires three main seeding components to successfully represent low mass seeds in lower resolution-larger volume simulations: 1) a galaxy mass criterion, 2) galaxy environment criterion, and 3) inclusion of unresolved light minor mergers.In our upcoming companion paper (Bhowmick et al. in prep), we apply these stochastic seed models to uniform volume cosmological simulations, and thereby make predictions that would be directly comparable to facilities such as JWST and LISA for different seeding scenarios.
The construction of our stochastic seed model essentially rests only on two important aspects of the formation of low mass seeds.First, these seeds are forming in regions which are already in the process of rapid metal enrichment, which is a natural consequence of seeding within star forming & metal poor gas.Second, the BH growth is dominantly driven by BH-BH mergers.Therefore, our stochastic seed model could be tuned to represent any low mass seeding scenario for which the foregoing assumptions hold true.These include scenarios beyond the ones we consider in this work.Furthermore, we can calibrate our stochastic seed model against any high resolution simulation run with different galaxy formation models or using different state-of-the-art numerical solvers such as GADGET-4 (Springel et al. 2021), GIZMO (Hopkins 2015) etc. Lastly, a key advantage of our seed model is that it depends solely only on galaxy total mass (which is dark matter dominated) and galaxy environment.Therefore, it can also be readily applied to DM only simulations as well as semianalytic models that are typically much less expensive compared to full hydrodynamic simulations.
In the near future, we shall test our stochastic seed model for their ability to represent low mass seeds when coupled with alternate accretion and dynamics models.For example, having a smaller scaling exponent between BH accretion rate and BH mass (such as α = 1/6 for gravitational torque driven accretion model) may significantly enhance the role of gas accretion in the growth of low mass seeds at high redshifts.Similarly, having a more physically motivated BH dynamics prescription will likely impact the merger rates and change the relative importance of accretion versus mergers in driving BH growth.In such a case, we can envision requiring additional ingredient(s) in our stochastic seed model to capture the impact of unresolved accretion driven growth of low mass seeds, similar to how the galaxy environment criterion was needed to account for the impact of unresolved merger dominated BH growth.
Nevertheless, our new stochastic seed model offers a substantial improvement from existing cosmological simulations that have either relied on a threshold halo / stellar mass, or on poorly resolved gas properties for seeding.Unlike most of these currently used seed models, our models will allow us to represent low-mass seeds in cosmological simulations without the need to either explicitly resolve the seeds, or seed below the gas mass resolution of the simulation.Overall, this work is an important step towards the next generation of cosmological hydrodynamic simulations in terms of improved modeling of high redshift SMBHs, to finally understand their role in shaping high redshift galaxy evolution in the ongoing JWST and upcoming LISA era.vironment based seeding probabilities p0 and p1 to linearly increase with the galaxy mass with a slope γ > 0 (see Equation 5).Depending on the gas based seed parameters, γ values of ∼ 1.2 − 1.6 (quoted in Table 2) are the ones found to maintain the calibration of the galaxy mass criterion.For values significantly higher or lower than ∼ 1.2 − 1.6, the galaxy environment criterion starts to skew the galaxy mass distributions (wherein ESDs are formed) towards higher or lower masses respectively, compared to our desired calibration.Lastly, incorporating this linear dependence with γ > 0 is also physically motivated.This is because it captures the notion that, for a given value of N ngb , seeding should be favored in a galaxy with higher mass because it exists in a more extreme environment compared to a lower mass galaxy with the same N ngb .

APPENDIX B: EVOLUTION OF STAR FORMING & METAL POOR GAS IN HALOS
In Figure B1, we show scatter plots of the star forming gas mass (MSF) and star forming & metal poor gas mass (M metal poor SF ) versus the total halo mass (M halo total ) at different redshifts.The top row shows that there is a straightforward positive correlation between the halo mass and star forming gas mass, except at the lowest halo masses wherein the results are likely impacted by the finite simulation resolution.Notably, several of these lowest mass objects are spuriously identified gas clumps with very little DM mass.In addition, these halos are also significantly below the atomic cooling threshold (virial Temperature of 10 4 K, dashed black vertical lines), which we do not self-consistently simulate due to the absence of H2 cooling.With our adopted halo mass thresholds ( Mh = 3000 & 10000), we avoid seeding in these lowest mass halos (marked as shaded grey region).Hereafter, we shall focus only on halos with reasonably well converged stellar and gas properties (outside the grey region).The top row also shows that at fixed halo mass, the star forming gas mass (top row) steadily decreases with time (green circles vs. black solid line).This is a simple consequence of cosmological expansion, which increases the atomic cooling threshold with time.As a result, at later times, halos of a given mass have lower ability to contain gas and collapse it to high enough densities to form stars.This is overall responsible for the steady increase in DGB forming halo masses with time in epochs where star formation is the primary driver of DGB formation (seen in Section 3.3).In the bottom row, the fraction of star forming gas mass that is also metal poor (< 10 −4 Z⊙), sharply decreases with halo mass at fixed redshift.This is not surprising because metal enrichment is expected to be more prevalent in massive halos.Regardless, the middle row shows that the overall star forming & metal poor gas mass continues to be positively correlated with halo mass.This is simply due to more massive halos having higher overall star forming gas mass.As a result, whenever metal enrichment becomes the primary driver of DGB formation, it leads to a more rapid increase in the DGB forming halo mass with time, compared to that of simple cosmological expansion (see again Section 3.3)., middle panels) and their ratios (bottom panels) are plotted versus the total mass (M total ) for halos in different snapshots within the GAS_BASED suite that explicitly resolves the 1.56 × 10 3 M ⊙ /h DGBs.The different columns show different redshift snapshots; however, the mean trend at z = 7 is plotted as solid black line in all the top panels to clearly see the redshift evolution.We added 1 to the y-axis in order to include halos with no star forming gas on the log-scale.The black dashed vertical lines correspond to the atomic cooling limit (halo virial temperature T vir = 10 4 K).The red and orange horizontal lines correspond to the seeding thresholds of Msfmp = 5 & 150 respectively.The blue and brown vertical lines correspond to the seeding thresholds of Mh = 3000 & 10000 respectively.Shaded regions correspond to the lowest mass objects below the Mh = 3000 limit, which are also below the atomic cooling threshold.We avoid seeding in these halos since they are impacted by the limited mass resolution and lack of sufficient physics (absence of H 2 cooling).Top panels show that at fixed halo mass, star forming gas mass decreases with time due to cosmological expansion.Middle and bottom panels show that despite stronger metal enrichment in more massive halos, the star forming & metal poor gas mass is still positively correlated with halo mass.As a result, DGB formation is favored in more massive halos when the primary driver is metal enrichment.

Figure 1
Figure1depicts the evolution of gas density, star formation rate (SFR) density, and gas metallicity at DGB forming sites from two distinct epochs(z = 20 & 12).As dictated by our gas based seed models, for each of the DGB forming sites

Figure 1 .
Figure1.Evolution of gas density (red/orange), star formation rate density (grayscale) and gas metallicity (yellow/purple) of various seed forming sites in our zoom simulations that use the gas based seed models described in Section 2.3.1.Hereafter, we shall refer to the seeds formed by the gas based seed models as "Direct Gas Based seeds" or DGBs.The large panels correspond to DGB forming sites from two distinct epochs namely z = 20 (top) and z = 12 (bottom).Within each large panel, the leftmost sub-panel corresponds to the snapshot at the time of DGB formation, wherein the yellow circles mark the location of the formation site that contains the star forming & metal poor gas.The remaining subpanels from left to right show the evolution of that formation site along three subsequent snapshots.We can clearly see that at the time of DGB formation, the regions in the immediate vicinity of the formation site have already started the process of metal enrichment.As a result, these regions get completely polluted with metals within a very short time after DGB formation.

Figure 2 .
Figure2.Assembly history of halos forming 1.56 × 10 3 M ⊙ /h DGBs using gas based seed models.Top to bottom, the rows show the evolution of total halo mass (M total ), star forming gas mass (M SF ), star forming & metal poor gas mass (M SF metal poor ), and gas metallicity (Zgas).Left, middle and right panels show halos seeded at z = 20, z = 15 and z = 10 (vertical dashed lines in each column) respectively, using the gas based seeding criterion, Msfmp = 1000 (horizontal dashed line in 3rd row) and Mh = 3000 (horizontal dashed line in 1st row).The faded dotted lines show the evolution of all DGB-forming halos along their merger trees.The thick solid lines show the mean trend, i.e. logarithmic average of the values of all the faded dotted lines at each redshift.The star forming & metal poor gas masses tend to sharply drop soon after seeding, independent of the time of seeding.This is because the DGB forming halos have already started to undergo rapid metal enrichment, which is shown in the fourth row by the rapid increase in gas metallicity prior to the seeding event.

Figure 3 .
Figure 3.The evolution of host star formation rates or SFR (top panels) and Zgas (bottom panels) versus host mass is shown for 1.56 × 10 3 M ⊙ /h DGBs formed at z = 15.In the leftmost panels, the filled orange circles indicate the halos that form DGBs at z = 15.The filled orange circles in the subsequent panels (from left to right) show the same host halos at z = 14, 13 & 12.The full population of halos at each redshift is shown in blue In other words, we select the orange circles at z = 15 using our gas based seeding criteria [ Mh , Msfmp = 3000, 1000] (assuming M DGB seed = 1.56 × 10 3 M ⊙ /h ), and follow their evolution on the halo merger tree.Comparing them to the full population of halos at each redshift, we find that even though the DGB forming halos at z = 15 are biased towards lower gas metallicities at fixed halo mass (lower left panel), subsequent evolution of these halos to lower redshifts causes them to become more unbiased at z = 14, 13 & 12.This is due to the rapid metal enrichment of these DGB forming halos depicted in Figure2.

Figure 4 .
Figure 4.The upper panels show the number of halos satisfying different cuts that were used in our gas based seed models: dotted lines correspond to a total mass cut of Mh × M DGB seed , dashed lines correspond to a star forming gas mass cut of Msfmp × M DGB seed , and solid lines show a star forming & metal poor gas mass cut of Msfmp × M DGB seed .The lower panels show ratio of the normalizations w.r.t. the dotted lines from the top panel.The line with the smallest normalization determines which of the processes between halo growth versus star formation versus metal enrichment is the key driver for DGB formation at a given epoch.For Mh = 3000, we find that metal enrichment becomes the key driver for (suppressing) DGB formation around z ∼ 13 for all Msfmp values between 5 − 150.However, when Mh = 10000, halo growth continues to be the primary regulator for DGB formation until z ∼ 10, after which metal enrichment takes over.

Figure 5 .
Figure5.We trace the growth of 1.56 × 10 3 M ⊙ /h DGBs (leftmost panels) along merger trees and show the redshifts when they assemble BHs of masses 1.25 × 10 4 M ⊙ /h, 1 × 10 5 M ⊙ /h and 8 × 10 5 M ⊙ /h (2nd, 3rd and 4th panels from the left).Different colors correspond to the different gas based seed models with varying Msfmp = 5, 50, 150 & 1000, Mh = 3000 and Msfmp = 5, Mh = 10000.We find that the impacts of increasing Msfmp and Mh are qualitatively distinguishible.For Mh = 3000 and Msfmp = 5 − 1000, metal enrichment starts to slow down DGB formation around z ∼ 15.In contrast, when Mh is increased from 3000 to 10000, the slow down of DGB formation due to metal enrichment starts much later (z ≲ 10).Similar trends are seen in the assembly rates of higher mass descendants (particularly 1.25 × 10 4 M ⊙ /h BHs).

Figure 6 .
Figure6.The left panel shows the redshifts and the FOF total masses at which 1.56 × 10 3 M ⊙ /h DGBs form.Middle and right panels show the redshifts and the FOF total masses at which 1.25 × 10 4 M ⊙ /h and 1 × 10 5 M ⊙ /h descendant BHs respectively assemble on the FOF merger tree.The different colors correspond to different gas based seed models.Each data point corresponds to a single instance of assembly or seeding.We only show data points for a limited set of models to avoid overcrowding.Solid lines show the mean trend and the shaded regions show ±1σ standard deviations.We find that as metal enrichment takes over as the driving force and suppresses DGB formation at lower redshifts, DGBs form in increasingly massive halos.This also drives a similar redshift dependence for the assembly of 1.25 × 10 4 M ⊙ /h BHs.

Figure 7 .
Figure 7. Introduction to best friends of friends (bFOF) galaxies, which are identified using the FOF algorithm but with one-third of the linking length used for identifying halos: Left panel shows the relation between halo mass and the mass (M galaxy total ) of the central or most massive bFOF in blue, and satellite bFOF in orange.On an average, the central bFOFs are ∼ 7 times less massive than their host FOFs, but with substantial scatter (≳ 1 dex) for fixed FOF mass (M halo total ).The middle panel shows the number of bFOFs for FOFs of different total masses.The plots are shown at z = 8 and for the gas based seed model [ Mh , Msfmp = 3000, 5].Blue color shows all bFOFs (with or without BHs); orange, green and maroon lines show bFOFs with a total BH mass of 1.5 × 10 3 M ⊙ /h, 1.25 × 10 4 M ⊙ /h and 1 × 10 5 M ⊙ /h respectively.Right panel shows the number of BHs occupied by FOFs and bFOFs.While ≳ 12% of FOFs contain multiple BHs (up to ∼ 30), only ∼ 1% of bFOFs contain multiple BHs.All this motivates us to use bFOFs as seeding sites (instead of FOFs) in our new stochastic seed models that would be able to represent the lowest mass (∼ 10 3 M ⊙ /h) DGBs in lower resolution simulations that cannot directly resolve them.These bFOFs are essentially sites of (proto)galaxies residing within the high-z halos.We hereafter refer to these bFOFs as "galaxies".

Figure 8 .
Figure 8. Top and bottom rows show the redshifts and the galaxy total masses (M galaxy total that includes DM, gas and stars) at which 1.25 × 10 4 M ⊙ /h and 1 × 10 5 M ⊙ /h BHs respectively assemble from 1.56 × 10 3 M ⊙ /h DGBs when the BH growth is traced along the galaxy merger tree.The 1st, 2nd and 3rd columns show different gas based seeding models with Mh = 3000 and Msfmp = 5, 50 & 150.The 4th column shows Mh = 10000 and Msfmp = 5.Solid lines show the mean trend and the shaded regions show ±1σ standard deviations.We find that for all the models, there is a transition in the slope of the mean trend at redshift z ≡ ztrans ∼ 12 − 13, which is driven by the suppression of seed formation by metal enrichment.The trends are reasonably well fit by a double power law (dashed lines).These fits are used in our stochastic seed models that directly seed the descendants (referred to as "extrapolated seed descendants or ESDs) at 1.25 × 10 4 M ⊙ /h or 1 × 10 5 M ⊙ /h within the lower resolution Lmax = 11 & 10 zooms, respectively.To obtain fits in the top row, we first assumed ztrans = 13.1 for Mh = 3000, Msfmp = 5, 50 & 150, and ztrans = 12.1 for Mh = 10000, Msfmp = 5 via a visual inspection.The fits were then performed to obtain the slopes at z < ztrans and z > ztrans using scipy.optimize.curve_fit.The final fitted parameters are shown in Table2.

Figure 9 .
Figure 9. Colored dashed lines show 1D distributions of galaxy properties in which 1.25 × 10 4 M ⊙ /h BHs assemble from 1.56 × 10 3 M ⊙ /h DGBs within GAS_BASED simulations.From left to right, the panels in each row show the total galaxy masses (M galaxy total ), stellar masses (M galaxy * ), SFRs, gas metallicities (Z), and environments (N ngb i.e. the number of neighboring halos around the galaxy as defined in Section 2.3.2).Top, middle and bottom rows correspond to different sets of gas based seed parameters: [ Mh , Msfmp = 3000, 50], [ Mh , Msfmp = 3000, 150] and [ Mh , Msfmp = 10000, 5] respectively.In each panel, the light grey lines show host properties for the 1.25 × 10 4 M ⊙ /h ESDs in the corresponding STOCHASTIC_MASS_ONLY simulation.Note that unlike the rest of the paper, here the STOCHASTIC_MASS_ONLY simulations are run at the highest resolution of Lmax = 12 for a fair comparison of their predicted galaxy baryonic properties with the GAS_BASED simulations run at the same resolution.The total galaxy masses of BH hosts in the STOCHASTIC_MASS_ONLY simulations are calibrated match the GAS_BASED simulations, but no other calibration is performed.The agreement of the distributions of baryonic properties ((M * , SFR, & Z) between the two types of simulations results naturally from matching the M galaxy total distribution.However, the STOCHASTIC_MASS_ONLY simulations do end up placing the ESDs in significantly less rich environments (smaller N ngb ) compared to what is required by the GAS_BASED simulations.

Figure 11 .
Figure 11.Impact of galaxy environment criterion on the two-point clustering and the overall counts of > 1.25 × 10 4 M ⊙ /h BHs.The dashed maroon lines show a simulation that uses the gas based seed model [ Mh , Msfmp = 3000, 150] with M DGB seed = 1.56 × 10 3 M ⊙ /h.The grey solid lines correspond to simulations that use the stochastic seed model, and directly place ESDs of mass 1.25 × 10 4 M ⊙ /h based on both the galaxy mass criterion and galaxy environment criterion.For the galaxy environment criterion, we systematically decrease p 0 and p 1 as the shade gets darker (see legend).Upper panels: The total galaxy mass (left panel) and galaxy environment (right panel) during the initial assembly of 1.25 × 10 4 M ⊙ /h BHs.Lower panels: The left three panels show the two point clustering of > 1.25 × 10 4 M ⊙ /h BHs at z = 8, 11 & 14 respectively, and the rightmost panel shows the overall number of > 1.25 × 10 4 M ⊙ /h BHs in each snapshot.We find that the STOCHASTIC_MASS_ONLY simulation (p 0 = 1 and p 1 = 1) significantly underestimates the small-scale clustering and overestimates the BH counts compared to the GAS_BASED simulations.As we introduce the galaxy environment criterion (STOCHASTIC_MASS_ENV) and decrease p 0 and p 1 to favor seeding in richer environments, we find that the small-scale clustering is enhanced and the BH counts decrease.The model with p 0 , p 1 = 0.1, 0.3 produces the best match for the small-scale clustering as well as the BH counts.

Figure 12 .
Figure 12.Here we demonstrate the ability of different Lmax = 11 stochastic seed models to represent the 1.25 × 10 4 M ⊙ /h descendants of 1.56 × 10 3 M ⊙ /h DGBs formed in Lmax = 12 gas based seed models.The leftmost two panels show the total galaxy mass and galaxy environment at the time of assembly of 1.25 × 10 4 M ⊙ /h BHs.The remaining three panels on the right show the statistics of > 1.25 × 10 4 M ⊙ /h BHs, namely the total BH counts versus redshift, the two-point clustering at z = 8, and the merger rates.The colored dashed lines show the GAS_BASED simulations wherein 1.56 × 10 3 M ⊙ /h DGBs form and eventually grow to assemble 1.25 × 10 4 M ⊙ /h BHs.The different rows correspond to different values of Msfmp and Mh (see legend).The remaining lines correspond to simulations using stochastic seed models that place ESDs directly at 1.25 × 10 4 M ⊙ /h.The thick and solid silver and black lines and histograms show the STOCHASTIC_MASS_ONLY and STOCHASTIC_MASS_ENV simulations respectively; they use the fiducial seeding parameters calibrated for each set of gas based seeding parameters listed in Table2.The thin black dashed lines in the right three panels show STOCHASTIC_MASS_ONLY simulations that assume zero scatter in the galaxy mass criterion i.e σ = 0.The thinnest black solid line in the same panels show simulations that assume a constant galaxy mass threshold fixed at the mean of the distributions from the leftmost panels (see vertical line).Amongst all the simulations that use stochastic seeding, only the STOCHASTIC_MASS_ENV simulations are able to successfully capture the GAS_BASED simulation predictions.

Figure 15 .
Figure15.Comparison of the cumulative mass functions (i.e. the number of BHs above a minimum BH mass threshold M min bh ) between the GAS_BASED (colored lines) and STOCHASTIC_MASS_ENV (black lines) simulations.The top, middle and bottom rows show z = 8, 10 and 12, respectively.The black dashed and solid lines show the STOCHASTIC_MASS_ENV predictions with and without the explicit inclusion of the contribution from the unresolved light minor mergers.Without the light minor mergers, the STOCHASTIC_MASS_ENV BH mass functions are significantly steeper than in the GAS_BASED simulations.After including the contribution from the unresolved light mergers, the STOCHASTIC_MASS_ENV simulations are able to achieve reasonable agreement with the BH mass functions predicted by the GAS_BASED simulations.

Figure B1 .
Figure B1.Star forming gas masses (M SF , top panels), star forming & metal poor gas masses (M metal poorSF

Table 2 .
Fiducial model parameters for the stochastic seed model, calibrated for each of the gas based seeding parameters.Columns 1 and 2 show the gas based seeding parameters Mh and Msfmp .For each set of Mh and Msfmp values, the remaining columns list the parameters of the stochastic seed model.Columns 3 to 7 show the parameter values used for the galaxy mass criterion, which are derived from gas based seed model predictions of the M galaxy total