Are the ultra-high-redshift galaxies at z>10 surprising in the context of standard galaxy formation models?

A substantial number of ultra-high redshift (8<z<17) galaxy candidates have been detected with JWST, posing the question: are these observational results surprising in the context of current galaxy formation models? We address this question using the well-established Santa Cruz semi-analytic models, implemented within merger trees from the new suite of cosmological N-body simulations GUREFT, which were carefully designed for ultra-high redshift studies. Using our fiducial models calibrated at z=0, we present predictions for stellar mass functions, rest-frame UV luminosity functions, and various scaling relations. We find that our (dust-free) models predict galaxy number densities at z~11 (z~13) that are an order of magnitude (a factor of ~30) lower than the observational estimates. We estimate the uncertainty in the observed number densities due to cosmic variance, and find that it leads to a fractional error of ~20-30% at z=11 (~30-80% at z=14) for a 100 sq arcmin field. We explore which processes in our models are most likely to be rate-limiting for the formation of luminous galaxies at these early epochs, considering the halo formation rate, gas cooling, star formation, and stellar feedback, and conclude that it is mainly efficient stellar-driven winds. We find that a modest boost of a factor of ~4 to the UV luminosities, which could arise from a top-heavy stellar initial mass function, would bring our current models into agreement with the observations. Adding a stochastic component to the UV luminosity can also reconcile our results with the observations.


INTRODUCTION
Understanding the physical processes that shape galaxies is one of the most important and challenging open problems in astrophysics.Over the past decades, advances in instrumentation have enabled us to push the "redshift frontier" back ever further in distance and cosmic time, discovering objects that formed in the very early Universe, shortly after the Big Bang.Before the launch of JWST, the highest redshift reported galaxy candidates were at redshifts of ∼ 9-11, corresponding to formation times about 400-500 Myr after the Big Bang (e.g.Oesch et al. 2016;Finkelstein et al. 2022a).
After many years of anticipation, the JWST (Gardner et al. 2006(Gardner et al. , 2023) ) was launched in December, 2021.With its 6.5m aperture and infrared sensitivity, JWST was designed to be capable of detecting extremely-distant, high-redshift galaxies.Almost as soon as the first NIRCam images from Early Release Observations (ERO) and Early Release Science (ERS) such as SMACS0723, GLASS, and CEERS were released, reports of galaxy candidates that broke the existing redshift record started to appear (e.g.Finkelstein et al. 2022b; Naidu ★ E-mail: aaron.yung@nasa.gov† NASA Postdoctoral Fellow et al. 2022b; Castellano et al. 2022;Atek et al. 2022;Adams et al. 2022;Donnan et al. 2022;Harikane et al. 2023).Additionally, Labbé et al. (2023) reported the discovery of several galaxies at  ∼ 8-10 with extremely large stellar mass estimates, suggesting that the formation of these galaxies' progenitors must have begun very early.
Our current state-of-the-art models of galaxy formation are set within the paradigm of the Λ Cold Dark Matter (ΛCDM) model for structure formation (Blumenthal et al. 1984).In this picture, galaxies form within dark matter halos, and gas can only cool and form stars within halos above a critical mass corresponding to the temperature where cooling becomes possible.Thus, the epoch of formation of the first stars and galaxies is largely determined by the evolution of the dark matter halo mass function, which is set by the primordial power spectrum of density fluctuations and gravitational collapse.The very first stars to form out of the dark ages are generally referred to as Pop III stars, and they are formed from metal-free gas which must rely on H 2 as a coolant.These Pop III stars are expected to start forming at around  ∼ 30-20 in 3-4  peaks of the density field, corresponding to "mini"-halos with masses of 10 5 − 10 6 M ⊙ (Haiman et al. 1997;Barkana & Loeb 2001;Bromm et al. 2002;Visbal et al. 2018).Pop III stars quickly enrich the gas in their surroundings to the point where the dominant mode of star formation becomes Pop II stars, which form in more massive "atomic cooling" halos, with masses of 10 7 − 10 8 M ⊙ (Oh & Haiman 2002).
The discovery of these ultra-high redshift galaxy candidates initially appeared surprising for several reasons.These early data are relatively shallow, implying that these candidates are fairly luminous.Yet, at the same time, the areas covered are quite small (∼ a few tens of arcmin 2 ).Additionally, the newly detected objects imply very weak evolution in the bright end of the luminosity function from  ∼ 12 to  ∼ 4 (Donnan et al. 2022;Finkelstein et al. 2023b;Harikane et al. 2023;Finkelstein et al. 2023a), while the halo mass function evolves very strongly at the massive end over this interval (e.g.Behroozi et al. 2020).Some published semi-analytic models (Dayal et al. 2014(Dayal et al. , 2019;;Yung et al. 2020b) and hydrodynamic simulations (Wilkins et al. 2022) predict many fewer luminous galaxies at  ≳ 10 than the new observations suggest (see e.g.Harikane et al. 2023, their Fig. 15).Some empirical models extrapolated from lower redshifts suffer similar issues (e.g.Behroozi et al. 2020;Mirocha & Furlanetto 2022).
One of the first questions that arose following these discoveries was whether the existence of these early galaxies violates basic constraints placed by the abundance of dark matter halos in standard ΛCDM.Labbé et al. (2023) conducted a search for high redshift galaxies with a strong Balmer break, and discovered objects at  ≳ 6 with stellar masses that seemed surprisingly high (10 10 − 10 11  ⊙ ).Based on the Labbé et al. (2023) results that were originally posted on the archive (arXiv:2207.12446v1),Boylan-Kolchin (2023) argued that these galaxies were several orders of magnitude more numerous than expected in ΛCDM, and violated constraints on the total amount of baryonic mass available in standard cosmologies.Lovell et al. (2022) performed an Extreme Value Statistics analysis on several of the available samples, again adopting the estimated stellar masses of the original Labbé et al. (2023) analysis, and found that these objects were in moderate (3) tension with standard ΛCDM.It is important to note that these studies did not account for uncertainties from field-tofield variance due to large scale structure.Additionally, following an in-flight update to the calibration of the NIRCam instrument released on July 29, 2022, the stellar mass estimates in the Labbé et al. (2023) sample were revised, leading to a removal of any fundamental tension with ΛCDM, as reflected in the revised version of Boylan-Kolchin (2023).Mason et al. (2023) showed that a "maximally optimistic" empirical model, in which star formation is 100% efficient, all gas in halos is made available to form stars, and all galaxies have young ages (10 Myr), is able to produce rest-UV luminosity functions four orders of magnitude higher than current observations, again implying that there is no fundamental tension with ΛCDM.Dust could play a role in causing the unexpectedly weak redshift evolution in the UV LF from  ∼ 15-8 (Ferrara et al. 2023).
In interpreting these results, it is important to keep in mind that there are still significant observational uncertainties on the true number densities of ultra-high redshift galaxies.These galaxy candidates have been identified through the Lyman-break selection techniques or via photometric redshifts.Some of the objects that are currently being counted as  > 10 galaxies may actually be at much lower redshifts.Observations with the NIRSpec instrument on JWST have so far suggested that a fairly high fraction of photometric candidates at  ∼ 7-13 are indeed at redshifts close to those implied by the photometric redshift analysis (Curtis-Lake et al. 2022;Arrabal Haro et al. 2023;Fujimoto et al. 2023).However, in some cases the photo-z analysis can yield a strongly bimodal probability distribution, such as the objects with one peak at at  ∼ 5 and another at  ∼ 16-18 discussed by Naidu et al. (2022b) and Zavala et al. (2023).Recently, NIRSpec spectroscopy revealed that a bright ( F277W = 26.5)object in the CEERS field that was identified as a fairly robust candidate for being at  ∼ 16.4 (Donnan et al. 2022;Naidu et al. 2022a;Harikane et al. 2023;Finkelstein et al. 2023b;Bouwens et al. 2023a) instead has a redshift of 4.9 (Arrabal Haro et al. 2023).Furthermore, our understanding of the calibration of JWST's instruments is still an evolving process, and field-to-field variance is expected to be significant for the very small areas that have been probed up until now.A further point is that stellar mass estimates based on a few photometric bands have very large uncertainties (Cullen et al. 2023).
As many of the studies mentioned above rely on estimates of the dark matter halo mass function and halo growth rates, it is worth asking how accurately these are known over the mass and redshift ranges of interest here.The answer is that up to now, the constraints on this fundamental quantity for an up to date cosmology are surprisingly poor.Mason et al. (2023) use mass functions from Reed et al. (2007), which adopts values of the cosmological parameters that are not consistent with results from the Planck satellite (Planck Collaboration 2016, 2020).Other studies (such as Boylan-Kolchin (2023) and Ferrara et al. (2023)) adopt halo mass functions from the Sheth & Tormen (1999) or Sheth & Tormen (2002) models, which can disagree with -body based results by up to an order of magnitude at high redshift.Others extrapolate fits to -body simulations that were calibrated at much lower redshifts (Lovell et al. 2022;Mirocha & Furlanetto 2022).The  > 10 UV LF predictions presented in Yung et al. (2020b) adopted  = 11 to 15 HMFs that were fitted simultaneously to both a large cosmological simulation (Bolshoi-Planck with 250 Mpc h −1 on a side; Klypin et al. 2016) and a high-resolution, smaller volume cosmological simulation (Visbal et al. 2018), and relied on merger trees based on the Extended Press-Schechter formalism, which are not well tested at these early epochs.None of these recent studies make use of dark matter halo mass functions and merger trees robustly measured from -body simulations with up to date cosmological parameters over the relevant halo mass and redshift ranges.
Semi-analytic models (SAMs) have proven an invaluable tool for forecasting observations for high to ultra-high redshift preceding the launch of JWST and for interpreting JWST results (Qin et al. 2017(Qin et al. , 2023;;Dayal et al. 2019;Yung et al. 2019a;Hutter et al. 2021).The computational efficiency of this modelling approach, in contrast to conventional numerical methods, makes it possible to simulate galaxies in the ultra-high-redshift universe that are both low mass and extremely rare, which is difficult to achieve with a single numerical hydrodynamic simulation.Furthermore, it enables the interpretation of JWST observations in terms of physical processes, which is not the case with empirical models (e.g.halo occupation distribution or sub-halo abundance matching).
In this work, we exploit a new suite of dissipationless -body simulations that were designed specifically as a backbone for semianalytic models of galaxy formation to make predictions for the ultrahigh redshift Universe.Gadget at Ultrahigh Redshift with Extra-fine Timesteps (gureft, pronounced graft), is a suite of four cosmological boxes, spanning a range of particle mass from 1.5 × 10 4 to 8.5 × 10 7 M ⊙ and volumes 5 to 90 Mpc h −1 on a side.The adopted cosmological parameters are consistent with constraints from the Planck satellite (Planck Collaboration 2016) and the ones adopted by the MultiDark simulation suite (Klypin et al. 2016).Moreover, most existing -body simulations with publicly available halo catalogues and merger trees have only a small number of snapshots saved at very early times corresponding to  ≳ 6.In gureft, many snapshots are saved between  ∼ 40 to 6, such that we ensure that snapshots are saved every tenth of the typical halo dynamical time.By grafting together the results of these different boxes, we can obtain uniquely robust constraints on the assembly histories and abundances of dark matter halos at these early times.
We couple these new merger trees with a well established model for galaxy formation, the Santa Cruz semi-analytic model (Somerville & Primack 1999;Somerville et al. 2008Somerville et al. , 2015)).It contains standard prescriptions for gas cooling, star formation, chemical enrichment, stellar feedback, and other processes.The models are calibrated to match  = 0 observations, and are not re-calibrated at other redshifts.The fiducial models have previously been shown to match observations of galaxy rest-UV luminosity functions and other observables out to  ∼ 10 ( Somerville et al. 2015;Yung et al. 2019aYung et al. ,b, 2021)).Here we present updated, much more robust predictions for the properties of the galaxy population at the new high redshift frontier  ∼ 8-17.We forward model our predicted star formation and chemical enrichment histories into the observational space of rest-UV luminosity and compare with the observations in that plane, in order to avoid having to make uncertain assumptions about stellar masses.
The structure of the paper is as follows.In Section 2.1, we present a brief overview of the physical ingredients in the Santa Cruz semianalytic models.In Section 2.2, we present a brief overview of the gureft simulation suite.In Section 3, we present our main results.We discuss the implications of our results in Section 4 and summarize and conclude in Section 5.

THE PHYSICAL MODELLING PIPELINE
In this section, we provide a concise summary of the Santa Cruz semi-analytic model for galaxy formation (Section 2.1) and briefly describe the new suite of -body simulations and dark matter halo merger trees used in this work (Section 2.2).Both the Santa Cruz SAM and gureft simulation suite adopt cosmological parameters Ω  = 0.308, Ω Λ = 0.693,  0 = 67.8km s −1 Mpc −1 ,  8 = 0.823, and   = 0.96; which are broadly consistent with the ones reported by the Planck Collaboration in 2015 (Planck Collaboration XIII 2016), and the same as those adopted by the Bolshoi-Planck simulation from the MultiDark suite (Klypin et al. 2016).

Santa Cruz semi-analytic model for galaxy formation
The semi-analytic model (SAM) developed by the Santa Cruz group is a versatile galaxy formation model that implements a suite of carefully curated physical processes to track the formation and evolution of galaxies via dark matter halo merger trees (Somerville & Primack 1999;Somerville et al. 2008Somerville et al. , 2012Somerville et al. , 2015Somerville et al. , 2021;;Popping et al. 2014).We refer the reader to these papers for a full description of the model components and Yung et al. (2022) for a schematic flowchart of the internal workflow of the model.The well-established model has been shown to reproduce the observed evolution of distribution functions of galaxy properties at 0 <  < 6 (Somerville et al. 2015) and 4 <  < 10 ( Yung et al. 2019a,b), with free parameters in the model only calibrated to a subset of observational constraints at  ∼ 0.
The Santa Cruz SAM includes a suite of standard physical processes that are also adopted by other semi-analytic models or incorporated in cosmological hydrodynamic simulations.These processes include cosmological accretion and atomic cooling, suppression of gas accretion after reionization by the meta-galactic photoionizing background, star formation and stellar-driven winds, chemical evolution, black hole feedback, and mergers.The cold gas disc is partitioned into atomic, molecular, and ionized components depending on the gas-phase metallicity of the disc using fitting functions based on numerical hydrodynamic simulations by Gnedin & Kravtsov (2011, denoted as GK).In addition, the model adopts a H 2 -based star formation recipe.In a similar spirit as the Kennecutt-Schmidt star formation (SF) relation where SFR is proportional to the surface density of cold gas (e.g.Kennicutt, Jr. 1998), we adopt a SF relation that scales with the surface density of molecular hydrogen (e.g. ), where the slope  steepens from 1 to 2 above a critical value of Σ H 2 .This steepening is motivated by observations (Sharon et al. 2013;Rawle et al. 2014;Hodge et al. 2015;Tacconi et al. 2018) as well as theory (Ostriker et al. 2010), and has been shown to be essential in order for the SAM to reproduce the observed evolution in the  > 4 UV luminosity functions, stellar mass functions and star formation rate distribution functions (Somerville et al. 2015;Yung et al. 2019a).
The model parameters are calibrated as shown in Gabrielpillai et al. (2022), such that the SAM outputs match the observed  ∼ 0 stellar mass function (Baldry et al. 2012;Bernardi et al. 2013;Moustakas et al. 2013), stellar-to-halo mass ratio (Rodríguez-Puebla et al. 2017), cold ISM gas fraction vs. stellar mass (Calette et al. 2018;Catinella et al. 2018), stellar metallicity (Gallazzi et al. 2005;Kirby et al. 2011), and  BH - bulge relation (McConnell & Ma 2013;Kormendy & Ho 2013).These physical parameters are not re-'tuned' at higher redshift, and we use identical values of the parameters in this study.Note that these parameters are slightly different from the ones used in the Yung et al.JWST forecast paper series, due to the switch from Extended Press-Schecter based merger trees to N-body based trees.
The full star formation and chemical enrichment history is convolved with publicly available results from the stellar population synthesis (SPS) model Binary Population and Spectral Synthesis (BPASS 1 ; Eldridge et al. 2017;Byrne et al. 2022) to produce a highresolution SED for each galaxy.The rest-frame UV magnitude is calculated from the SED with a tophat filter centred around 1600Å.Yung et al. (2020a) and Yung et al. (2020b) explored various SPS models and have shown that the binary star populations in BPASS yield a higher ionizing photon budget which may help reproduce the reionization history implied by various constraints derived from intergalactic medium and cosmic microwave background observations (see Robertson et al. 2013 andRobertson et al. 2015 for a thorough review on the various types of constraints and how they are derived).However, at rest 1600Å the BPASS models yield very similar predictions to single star models such as those of Bruzual & Charlot (2003), and we do not expect our predictions to be very sensitive to the assumed stellar population models.The version of the BPASS model adopted in this work assumes a fiducial broken power law stellar Initial Mass Function, with an upper slope  1 = −1.30between 0.1 -0.5 M ⊙ and a lower slope  2 = −2.35 between 0.5 -300 M ⊙ .In this work, we utilize SEDs with stellar metallicity spanning  = 10 −5 to 0.040.We note that dust attenuation is omitted in this work, as we wish to obtain the most optimistic predictions for the ultra-high- galaxy populations.In addition, we include only stellar continuum emission, and neglect nebular emission, although we will include the latter in a forthcoming work.

GUREFT: A new suite of N-body simulations and merger trees for the ultra-high redshift Universe
Gadget at Ultrahigh Redshift with Extra-Fine Timesteps (gureft, pronounced graft) is a suite of dark matter-only cosmological simulations designed to have mass and temporal resolution sufficient shown with green lines with matching line styles for redshift (Klypin et al. 2016).The line colour fades for individual HMF for the mass range that falls below 50 DM particles for the respective simulation to indicate where the HMF begins to become incomplete.We show the  = 0 HMF from the Bolshoi-Planck simulation with green diamonds.The grey lines show the HMFs adopted in previous work (up to  = 15, e.g.Yung et al. 2019aYung et al. , 2020b)), fitted to Rodríguez-Puebla et al. ( 2016) and Visbal et al. (2018).
to capture the merger history of dark matter halos at extreme redshifts (Yung et al. 2023).The merger trees extracted from this simulation suite will ultimately be grafted together to paint a holistic picture of the earliest emergence of halos that are massive enough to host galaxies, which are not available from most publicly accessible cosmological simulations due to insufficient mass resolution and sparsely spaced snapshots (see Table 1).For each simulated volume, 170 snapshots are saved between  = 40 to 6 with spacing roughly one-tenth of the typical halo dynamical time.For comparison, the Bolshoi-Planck simulation only has 30 snapshots and the IllustrisTNG simulations (Nelson et al. 2019) has 14 snapshots in the same redshift range.Halo and merger tree catalogues are generated with the rockstar (Behroozi et al. 2013a) and consistent-tree codes (Behroozi et al. 2013b), based on the virial mass definition of Bryan & Norman (1998).
Fig. 1 shows the halo mass functions (HMFs) between  ∼ 6 to 20 from the four gureft boxes (colour-coded).We also show predictions from the Bolshoi-Planck simulations at  = 6, 10, and 15 for comparison, as well as the  = 0 halo mass function to guide the eye.We note that no mass cutoff is applied in the making of these HMFs.Thus, the turnover observed on the low-mass end reflects where the halo populations become incomplete due to the mass resolution of the simulation, which is typically ∼ 100 times larger than the mass of DM particles.Tabulated HMFs constructed by combining well-resolved halos from the four gureft volumes are available in Appendix A. For more details on the gureft simulations, as well as a comparison of the halo mass functions and other properties with past work and commonly used analytic models and fitting functions, please see Yung et al. (2023).

RESULTS
In this section, we show predicted properties of the galaxy population simulated with the Santa Cruz SAM with merger trees from the suite of gureft simulations.

Distribution functions
One-point distribution functions are a frequently used summary statistic for characterizing and comparing galaxy populations from observations and simulations.In Fig. 2, we present the stellar mass functions (SMFs) for galaxies across the full mass range that we can resolve between  = 8 to 17 by combining the results from the four gureft volumes.One can see from the excellent agreement in the regions where the different boxes overlap that, as shown previously in Gabrielpillai et al. (2022), the convergence is excellent (i.e. the SAM results are insensitive to the resolution of the merger trees as long as the merger history is well resolved).These results are compared to SMFs derived from past Hubble and Spitzer observations (Song et al. 2016).The flattening or turnover seen at the low-mass end of gureft-35 and gureft-90 is caused by incompleteness due to the mass resolution limit of the underlying cosmological simulations.The gureft-05 and gureft-15 boxes are designed to resolve low-mass halos and their merger histories, and thus here the turnover reflects the halo mass limit where atomic cooling becomes inefficient (we do not include processes that can cool gas below 10 4 K).This has been previously explored using the extended Press-Schechter (EPS)based merger trees (e.g.Somerville & Kolatt 1999) and a grid of halo masses as presented in Yung et al. (2019b), which is also shown in this comparison.As previously shown, our model predictions are in very good agreement with the observationally derived stellar mass functions at  = 8.We refrain from showing stellar mass functions for higher redshifts or for any of the recent JWST based studies, as the uncertainties on these stellar mass estimates are still extremely large.
Fig. 3 presents the rest-frame UV luminosity functions (UV LFs) between  = 8 to 17 for galaxies simulated by the SC SAM.These predictions are compared to the recent JWST observations by Finkelstein et al. (2022bFinkelstein et al. ( , 2023b,a) ,a) from CEERS (Bagley et al. 2023b), Leung et al. (2023) from NGDEEP (Bagley et al. 2023a), Casey et al. (2023) from COSMOS-Web (Casey et al. 2022), Donnan et al. (2022), Harikane et al. (2023), andBouwens et al. (2023a).In addition, we include pre-JWST measurements from Bouwens et al. (2014) and Finkelstein et al. (2015), and a compilation of results from Finkelstein (2016).These include observations from lensed fields from Livermore et al. (2017) and Bouwens et al. (2022), as well as the latest results derived from past Hubble and Spitzer Space Telescope observations presented by Oesch et al. (2018), Stefanon et al. (2019), Bowler et al. (2020), andFinkelstein et al. (2022a).We show the  ∼ 17 luminosity function estimates of Bouwens et al. (2022) and Harikane et al. (2023) as open points to illustrate how these published results compare with our model predictions.However, the single galaxy from the CEERS field (first discovered by Donnan et al. 2022) that went into the Bouwens et al. (2022) estimate has now been shown to be at  ∼ 5 (Arrabal Haro et al. 2023).The Harikane et al. (2023) estimate is based on two galaxies, the same CEERS galaxy now known to be at lower redshift, and a candidate from the Stephan's Quintet field (S5-z16-1).Similar to the CEERS object, S5-z16-1 has a bimodal photometric redshift solution with one peak at  = 16.4 and another at  = 4.9.The UV luminosity implied by the  = 16.4 solution would be extremely high ( UV = −21.6).One can see that even one object of this luminosity at this redshift would be in very strong tension with our model predictions.
In Appendix A, we provide the tabulated data for the SMFs and UV LFs by combining the predicted galaxies across the four simulated volumes within the mass (and luminosity) range where the galaxy populations are complete.This approach is similar to the one adopted by Vogelsberger et al. (2020) to produce JWST predictions from the suite of IllustrisTNG simulations (e.g.Nelson et al. 2018;Pillepich et al. 2018;Nelson et al. 2019).
As also shown in previous works Yung et al. (2019b), the rest-UV luminosity functions predicted by our models are in good agreement with observations at  = 8-10.Note that the predictions shown here do not include any dust attenuation, so the small excesses in our predicted UV LFs at the bright end at  ∼ 8 and 9 could be cured by invoking a moderate amount of extinction (see Yung et al. (2019b), which did include modelling of dust attenuation).We also note that the free parameters used in this work are based on the re-calibration presented in Gabrielpillai et al. (2022).In particular, the slope for the stellar feedback-mass outflow relation (see Yung et al. (2019a) for full description) has been slightly increased from  rh = 2.8 to 3.0, resulting in higher mass outflow rates in low mass halos, and hence lower masses and luminosities for the galaxies hosted by those halos.As a result, the faint-end (low mass-end) slopes for galaxies presented in this work are slightly shallower than those presented in previous works.This is expected and is in agreement with the controlled experiment shown in Yung et al. (2019a,b).By  = 11, we see a moderate discrepancy between our predicted UV LF and the observational measurements, with the number densities predicted by our models an order of magnitude or more lower than the observational estimates.The discrepancy is about a factor of 30 at  = 13.
The number density constraints estimated for observed galaxies presented in previous figures suffer from various uncertainties, in- cluding photometric errors, redshift errors, and field-to-field variance due to Poisson sampling statistics as well as galaxy clustering (field-to-field variance).Photometric errors can cause a so-called "Eddington bias", where the much more numerous intrinsically lowluminosity galaxies are "scattered" into the higher luminosity bins, causing a flattening of the high-luminosity slope of the luminosity function).The full uncertainty budget has not been fully accounted for in the published error bars.Similarly, the simulated outputs also carry uncertainties due to cosmic variance associated with our relatively small simulated volumes.
In order to better assess the level of tension between our theoretical predictions and the observational measurements, in Fig. 4 we show a "zoomed in" version of the  = 11 and  = 13 UV LF comparisons, where we focus on the luminosity range of the detected galaxy populations.Here we have selected the luminosity range where the simulated galaxies are best captured for each gureft box and combine them.For instance, at  = 12 and 13, galaxies with  UV ≤ −18 are sourced from gureft-90, and galaxies between −18 ≤  UV ≤ −13 are from gureft-35.
Note that we do not attempt to correct for redshift uncertainties (which may shift the galaxies in redshift within these bins) or catastrophic redshift errors (which may cause galaxies that are at much lower redshifts to be counted in this ultra high-z population), as these corrections remain very uncertain and are not straightforward to model.We experimented with including photometric noise in our predicted luminosity functions, and find that it introduces a minor source of uncertainty in this comparison, so we do not show it here.
We explore the potential impact on the number density of simulated galaxies due to cosmic variance.We show cosmic variance estimates from the cosmological simulation bluetides (Feng et al. 2016;Bhowmick et al. 2020), which is 400 h −1 on a side, ∼ 88 and ∼ 1492 times larger than the box sizes of gureft-90 and gureft-35, respectively.We also show estimates from the cosmic variance calculator of Trenti & Stiavelli (2008).A detailed discussion of how we obtained these estimates for each panel, and of uncertainties associated with them, can be found in Section 4. We have chosen the CEERS survey as a representative field for these estimates.We show a representative "total error" estimate as the sum in quadrature of the quoted error from FL23 and the larger of our cosmic variance error estimates.In their  ∼ 11 bin, the uncertainty due to Poisson sampling and photometric errors combined (a fractional error of ∼0.44 at  UV = −20) is comparable to that from cosmic variance (fractional error of ∼ 0.2 − 0.3).At  ∼ 14 and  UV = −20, the quoted fractional error is 100%, and the uncertainty due to cosmic variance is ∼ 0.3 − 0.77.
In Fig. 4, we also show the impact of theoretical uncertainties in the stellar initial mass function.For our fiducial calculations, we have adopted an IMF that is similar to the standard Chabrier (2003) IMF which is typical for galaxies in the nearby Universe.However, there are multiple reasons to think that the IMF might be top heavy in the early Universe (as discussed in more detail in Section 4).It has been suggested that stellar populations could be 2-10 times brighter in the UV if the IMF is top heavy (e.g.Raiter et al. 2010).The blue shaded regions in Fig. 4 show what happens to our predictions if we simply boost the luminosity of each galaxy in our simulation by a factor of 2-10.One can see that, at  = 11-13, a fairly modest "IMF boost" of a factor of four would bring our model predictions into reasonable agreement with the observational estimates.In the left panel, in addition to the  = 11 predictions, we also show the predicted UV LFs at  = 9 and 12 with the light brown dashed line above and below, respectively, as the observational sample of F23 spans a redshift range of  = 9.5-12.The observational measurements are the same as those shown in the corresponding panels in Fig. 3.In addition, we show two estimates of the 1 uncertainty on the number density due to cosmic variance with the blue and pink error bars (see text for a full description of how these were calculated).The grey bar shows a representative total error, including the quoted error bar from the observational study combined with the estimated cosmic variance in quadrature.The light blue shaded regions and lines show the effect of "boosting" the UV luminosity of all galaxies in our model by a factor of 2-10, as indicated on the plot label.This illustrates that a fairly modest boost of a factor of ∼ 4 could bring our predictions into agreement with the observational measurements.3, and the potential effects of UV stochasticity shown with different shades of pink as labelled on the plot panels.This effect is implemented by adding a stochastic component to the predicted UV magnitudes of individual galaxies in post-processing, assuming the random component follows a Gaussian distribution centred at zero with standard deviation .We find that bursty star formation with a scatter of  ≃ 1.5-2 is required to bring our fiducial model into agreement with observations at  = 11-13.
Although our model includes merger-triggered bursts, there could be short timescale star formation stochasticity due to ISM processes that we do not resolve or attempt to represent in our models.Given there are fewer bright objects, adding stochasticity causes more galaxies to scatter into the bright end of the UVLF, in an effect similar to Eddington bias.In order to explore the effect of UV stochasticity in a simple way, we add a stochastic component to the predicted UV magnitudes in post-processing.The stochastic component is assumed to follow a Gaussian distribution centred at zero for specified values of the standard deviation  UV .We repeat the process of adding the stochastic component and computing UV LFs for 2,500 independent realizations and report the median of the outcome.In Fig. 5, we show the effect of  UV = 0.5, 1.0, 1.5, and 2.0; and compare them to the same observational constraints shown in Fig. 3.We find that a relatively large value of  ≃ 2 is required to match the observations at  ∼ 11-13. .The predicted number density of galaxies in a fixed bin of  UV (bin width Δ UV = 1) as a function of redshift.The number density of luminous galaxies is predicted to evolve more rapidly than that for lower luminosity galaxies.
In Fig. 6 we show how the composite stellar mass function and rest-UV luminosity function evolve from  = 6 to 17.These are compiled by combining all of the gureft volumes.This figure highlights the relatively rapid predicted evolution of both the SMF and UVLF.This appears to be in tension with the current observational results, which suggest weaker evolution over cosmic time at the highest redshifts.Fig. 7 shows the same quantity but sliced in a different way -this shows the number density in a bin of  UV as a function of redshift.The number density of luminous galaxies is predicted to evolve more rapidly than that of lower luminosity galaxies, as has also been predicted (and observed) at lower redshifts (sometimes called 'downsizing').

Scaling relations at ultra-high redshift
Scaling relations are conditional relationships between two physical or observable properties of galaxies, and they are important diagnostics of the physics that shapes galaxy properties.In addition to rest-frame  UV and  * , physical properties such as star formation rate (SFR) and UV spectral slope ( UV ) can be inferred from the measured multi-band photometry with SED fitting techniques (e.g.Johnson et al. 2021;Carnall et al. 2018;Iyer et al. 2019;Boquien et al. 2019).In this subsection, we compare outputs from the SC SAM to scaling relation constraints inferred by observations.In these studies, physical properties such as stellar masses are obtained through SED fitting.It is important to keep in mind that these techniques rely on a complex modelling procedure, and the estimates can have large uncertainties due to corresponding uncertainties in the underlying simple stellar population models, and assumed priors regarding star formation history, chemical evolution, dust, etc. (e.g.Papovich et al. 2011;Tacchella et al. 2022b).Furthermore, these inferred properties are subject to uncertainties in distance.
Fig. 8 shows the  * - UV relation from our predicted SAM galaxies, where individual galaxies are shown by the data points and the median and 16th and 84th percentiles are shown with the cyan line and shaded area.These results are compared to results presented by Whitler et al. (2022) derived from NIRCam imaging in the CEERS field and the flexible Bayesian SED fitting code Prospector with a built-in 'continuity' prior.In addition, we include constraints derived from past Hubble and Spitzer observations by Tacchella et al. (2022b).These quantities are also based on the Prospector tool, assuming a 'bursty' version of a continuity prior.We note that while these galaxies span a wide range of redshift (8 <  < 11 for Whitler et al. (2022) and 9 <  < 11 for Tacchella et al. (2022a)) with relatively large uncertainties in photometric redshift, we show all observed galaxies across all panels in Fig. 8 for discrete redshift snapshots at  = 8, 11, and 13 from our simulations.The median of the  * - UV relation are overlaid in the last panel.It is intriguing that the model predictions for the relationship between stellar mass and rest-UV luminosity are in good agreement with the observational estimates at  = 8 and  = 11.In the  = 13 bin, there are very few model galaxies in the luminosity range of the observations, but the  observed galaxies seem to lie along a plausible extrapolation of the simulated relation for lower luminosity galaxies.Also notable is the rather weak evolution in the  * - UV over this redshift interval from 8 <  < 13, with galaxies predicted to be slightly more luminous for a given stellar mass at higher redshift.
Fig. 9 shows a comparison between the modelled and observed  UV and rest-UV slope  UV .For the simulated galaxies, we compute  UV = −0.4(FUV −  NUV )/log( FUV / NUV ) − 2 with magnitude in far UV (FUV, centred at 1600 Å) and near UV (NUV, centred at 2300 Å) bands.These results are compared to results from JWST observations reported by Topping et al. (2022) and Cullen et al. (2023).In addition, we also show results from Hubble and Spitzer observations reported by Tacchella et al. (2022b).The median of the  UV - UV relations from all three redshift snapshots are overlaid in the last panel.Some of the observed galaxies have significantly redder values of  UV than our models predict, which could be due to the presence of dust in some of these galaxies.Additionally, our models assume a metallicity floor of 10 −3 Z sun , and the BPASS spectral synthesis models do not include Pop III type tracks or SEDs.The extremely blue slopes reported by Topping et al. (2022) are an exciting hint that we may be witnessing the formation of Pop III stars in some of these objects.The broad range of UV slopes seen in the observations suggests that the galaxy population could be a mixture of Pop II and Pop III stars at these epochs.
Fig. 10 shows the relation between  * and K UV .The factor K UV is an empirically measured conversion factor that relates the rest-frame UV luminosity to the star formation rate SFR = K UV ×   (UV), where we consider the FUV luminosity and SFRs averaged over 100 Myr for this comparison.Similar to previous figures, we show the individual galaxies with data points and the median and 84th and 16th percentiles with cyan lines and shaded regions.We compare our simulations with values reported by Madau & Dickinson (2014), K UV = 0.72 × 10 −28 M ⊙ yr −1 erg −1 s Hz and K UV = 0.88 × 10 −28 M ⊙ yr −1 erg −1 s Hz as adopted by Kennicutt, Jr. (1998).We note that these values are adjusted from a Salpeter (1955) to Chabrier (2003) IMF by multiplying by a factor of 0.63 (Madau & Dickinson 2014).Since K UV is effectively the efficiency of producing observed FUV light for a given SFR, it is expected that its value will be sensitive to star formation and chemical enrichment histories.The medians of the  * -K UV relations at  = 8 to 17 are overlaid in the last panel.As expected, galaxies in the early universe contain younger stellar populations and are more efficient at producing UV light.Similarly, Fig. 11 shows the relation between metallicity of cold ISM gas ( cold ) and  * .
Additional scaling relations with dark matter halo mass are presented in Appendix B.

DISCUSSION
In this section, we discuss the implications of recent JWST observations in the context of theory and simulations.We also discuss the caveats and uncertainties in the results presented in this work.

Dark matter halos at ultra-high redshift
Halo assembly and merger histories form the backbone of galaxy formation simulations, especially for the semi-analytic modelling approach.Cosmological simulations are subject to tension between simulated volume and mass resolution, subject to available computational resources.While extremely large volume is required to  capture overdense environments where the very rare, massive halos are found, it is also critical to have sufficient mass resolution to properly resolve halos' assembly histories.While this tension is always present in modelling galaxy formation (since we can observe a much larger dynamic range of galaxy environments than we can currently simulate), there has been less attention directed at creating a suitable suite of N-body simulations for studying structure formation in the "ultra-high redshift" Universe ( > 6) -up until now.Establishing connections between halos identified across multiple simulation snapshots and constructing the full merger histories of halos also requires high temporal resolution.However, most stateof-the-art cosmological N-body simulations store a limited number of snapshots at ultrahigh redshift (likely a thoughtful decision made to optimize storage usage), and these are insufficient to reconstruct merger histories of these halos.For these reasons, many publicly available cosmological simulations do not deliver halo merger trees that can be used by SAMs to explore predictions for galaxies in the exciting redshift frontier that has recently been opened up by JWST (10 <  < 17).We designed the gureft simulation suite specifically to overcome these barriers.In a companion paper (Yung et al. 2023), we will present updated fitting functions describing the halo mass function and other useful halo property distributions in the ultra-high redshift universe from the gureft suite along with a comparison with previous halo mass functions that have been used in the literature.
We add that the mass range probed by the higher resolution boxes, gureft-05 and gureft-15, may be probing a regime that can be affected by uncertainties in the physical nature of dark matter.In this work, we assumed 'vanilla' ΛCDM.If the dark matter has a wavelike or "fuzzy" nature, this could substantially impact the lowmass end slope of the HMF (Menci et al. 2017;Hui et al. 2017;Kulkarni & Ostriker 2022, see Hui 2021 for a review).Additionally, if the dark matter is warm or self-interacting, this could also impact the abundances of low-mass halos at early times and their internal structures and formation histories (e.g.Menci et al. 2016;Lovell et al. 2018;Adhikari et al. 2022).The nature of dark energy could also impact structure formation at early times.Models with Early or Dynamical Dark Energy can predict higher abundances of low-mass halos at early times (Menci et al. 2020;Klypin et al. 2020;Menci et al. 2022).

Comparison with other model predictions
A comprehensive comparison between our predictions and those of the large number of other published models and simulations is beyond the scope of this work, but we comment briefly here on the general theory landscape and on a few key comparison points.First, we comment on the predictions of the "Semi-analytic Forecasts for JWST" paper series (Yung et al. 2019a(Yung et al. ,b, 2020a)), which adopted the same semi-analytic model presented here with slightly different parameters, but implemented within merger trees constructed using the Extended Press-Schechter formalism.We show in Fig. 2 and Fig. 3 that there is very good agreement between the Yung et al. predictions and those of the present work from  ∼ 8-13, with the small differences on the faint end of the UVLF being largely due to the re-calibration of the stellar feedback parameters as discussed in Section 3. At  > 13, our new and more robust calculations based on our new gureft suite predict number densities that are higher by an order of magnitude or more than the EPS based calculations.We note that this has little to no impact on any of our published results or conclusions, as we deliberately focussed on redshifts  ≲ 12 specifically because we did not trust the EPS merger trees at higher redshifts.
Comparisons with other model predictions are shown in Harikane et al. (2023) for  = 12 and  = 14-16 (see their Figure 16) and FB23 (see their Figure 14) for  ∼ 8-14.The general impression from these comparisons is that our predicted number densities at  ∼ 8-10 are generally consistent (within a factor of a few) with those of large cosmological hydrodynamic simulations such as SIMBA (Wu et al. 2020), Millennium-TNG (Kannan et al. 2023), andTHESAN (Kannan et al. 2022; see also Yung et al. (2019b) for a detailed comparison of our model predictions with simulation predictions available at that time out to  = 10).The hydrodynamic simulation FLARES and the semi-analytic model DELPHI predict number densities at  < 28.5 and  ∼ 11 that are higher than ours by about an order of magnitude and a factor of 3, respectively (F23).Our predicted number densities are also very similar to those of the semi-empirical model UniverseMachine (see Behroozi et al. 2020, for a detailed comparison).In summary, the significant under-prediction of  ≳ 10 galaxies observed by JWST in physics-based models is a fairly generic result and certainly not specific to our models.

Selection robustness, photometry, redshifts, completeness
The recent observations with JWST are providing confirmation of the measurements with Spitzer and HST probing out to  ∼ 9-10, and extending the detection of galaxies into a new redshift frontier at  ∼ 10-17.It is encouraging that the new measurements of UV LF at  ∼ 8-10 with JWST are highly consistent with the previously published estimates.Although the detection of galaxy candidates at "ultra-high' redshifts  ≳ 10 with JWST is very exciting, it is important to keep a few caveats in mind.First, many of these objects are candidates that have been selected via a photometric redshift estimation procedure.Some studies first apply a "Lyman break" selection criterion (a colour-colour selection), and then compute photometric redshifts only for the colour selected high redshift candidates.Other studies (e.g.FB23 and FL23) compute photometric redshifts for all of the galaxies in the sample.It is not uncommon for galaxies to have a broad or bimodal redshift posterior.In the published studies, galaxies are assigned to the redshift bin corresponding to the "best fit" redshift.However, the estimates of rest-frame UV luminosity are subject to the distance uncertainties inherent in the redshift uncertainties.Strong emission line features are another source of uncertainty.The presence of strong emission lines in galaxy spectra can affect the redshift estimates (e.g.Finkelstein et al. 2022b) and are not always available in SED fitting templates (e.g.Larson et al. 2022).Some of the "ultra high-z" candidates may turn out to be at much lower redshifts.For example, multiple studies (Naidu et al. 2022b;Donnan et al. 2022;Finkelstein et al. 2023b;Bouwens et al. 2023b) presented a candidate in the CEERS field with a formal best-fit redshift of 16.4, which has now been confirmed via NIRSpec spectroscopy to instead be at  = 4.9.Harikane et al. (2023) have also reported a very luminous object with a redshift estimate of 16.41 in the Stephan's Quintet field.Follow-up spectroscopy and millimeter interferometry will gradually confirm the true redshifts of this and other such objects that may be discovered in the near future through JWST imaging.
It is also important to keep in mind that the areas and volumes of the surveyed fields to date are relatively small, as are the numbers of objects that go into luminosity function estimates.The updated CEERS  ∼ 11 sample presented in FL23 contains 27 galaxies.The  ∼ 13 sample is even smaller.There are three (seven) candidates in the 13 <  < 15 (12 <  < 15) redshift range in the full CEERS field reported in the analysis of FL23.B23a report 4 total (2 robust) candidates in the 12 <  < 14 redshift range in the SMACS0723 field and 3 total (2 robust) candidates in the GLASS field.Both of these fields are lensing cluster fields, and although B23a assume that no magnification correction is needed, it is curious that the angular number density in these fields is almost an order of magnitude higher than that in the much larger (and presumably unlensed) CEERS field (see Table 2).This hints that there may be unaccounted for magnification or other systematic effects impacting the LF estimates from these fields.FB23, FL23, and B23a provide extensive discussions of the overlap between high redshift galaxy candidates selected by different studies to date.B23a classify the fraction of candidates in different redshift and luminosity bins that are "robust", "solid", and "possible", finding that the estimated number density at  = 13 differs by 1.5 orders of magnitude between the "solid" and the "robust" samples.FL23 show a comparison of the photometric redshift estimates from different studies, which overall is fairly encouraging, but only includes objects in the CEERS field up to  ∼ 11.5.FL23 additionally report that the estimated rest-frame UV magnitudes for the same object can have significant scatter from one study to another (by typically a few tenths of a magnitude, but as much as a magnitude), and some studies can show a systematic offset in the estimated magnitudes by up to 0.3 magnitudes.All of the published luminosity functions have been corrected for completeness, and completeness corrections can also vary from one study to another even for the same raw observational data.However, the published results are typically presented only where the completeness is better than ∼ 50%, so this is not expected to be a dominant source of error at present.
There are other observational studies in the literature presenting ultra-high redshift galaxy candidates and luminosity function measurements from early JWST observations (e.g.Rojas-Ruiz et al. 2020;Atek et al. 2022;Adams et al. 2022;Bagley et al. 2022;Naidu et al. 2022b;Castellano et al. 2023;Bouwens et al. 2023b;Labbé et al. 2023;Harikane et al. 2022).We omit these observations from our comparison, as they are typically based on smaller fields and in some cases were published before the updated instrument calibration.The FB23, FL23, and B23a studies provide a detailed discussion of the consistency of their measurements with these earlier works.
While we neglect dust throughout this work, in a scenario where dust enrichment happened quickly, the intrinsic UV luminosities will need to be even more boosted that shown in Fig. 4 to compensate for the attenuation effect.However, FL23 and Morales et al. (2023) show that the colours remain fairly blue throughout the  ∼ 9 to 15, disfavouring the scenarios raised by the Ferrara et al. (2023) and Mirocha & Furlanetto (2022) models.

Cosmic Variance
The fields that have been observed by JWST to date are relatively small, and therefore it is expected that field-to-field variation in num-Table 3. Cosmic variance estimates for representative magnitude and redshift bins.The area is given in arcmin 2 .  is the (completeness corrected) number of galaxies in the sample. ave,TS08 is the average bias output by the TS08 calculator.mag is the magnitude bin used in the B20 calculator. cv,TS08 is the fractional root variance estimated using the TS08 method, and  cv,B20 is the same quantity but estimated using the B20 method.ber density due to large scale structure (galaxy clustering) could be a significant (perhaps even dominant) source of uncertainty.In this work we follow a fairly common convention in separating the variance in the counts along different lines of sight into that due to Poisson point sampling and that due to galaxy clustering.We refer to the latter component alone as "cosmic variance".As is generally the case when a new discovery space for galaxy populations is opened up, our estimates of the magnitude of the cosmic variance for these populations is extremely uncertain, because we do not know how strongly these objects are clustered on larger scales.Due to these uncertainties, in this work, we used two different bracketing approaches to estimate the cosmic variance for the galaxy populations with  UV ≃ −20 at  ∼ 11 and  ∼ 14.
The first method relies on abundance matching.Here, we assume that each halo above a critical mass is occupied by one galaxy, and we ask what is the minimum halo mass that yield the observed galaxy number density at a given redshift.The clustering of these halos relative to the underlying dark matter density field provides an estimate of the "bias"  for the observed galaxies, which then allows us to compute the cosmic variance  cv for a given field area, geometry, and redshift extent ( 2 cv =  2  2 DM , where  DM is the variance of the dark matter density field) .We use the online calculator of Trenti & Stiavelli (2008, hereafter TS08) for these calculations3 .We note that this tool has the limitation that the adopted cosmology is slightly different from the Planck-compatible cosmology that we adopt in the rest of this work, and that the estimates of halo number density and bias rely on the Sheth & Tormen (1999) approximation, which is known to be inaccurate at these extreme redshifts (Yung et al. 2023).
The second method is based on hydrodynamic simulations of galaxy formation.Specifically, we make use of the COS-MIC_VARIANCE_AT_HIGHZ calculator of Bhowmick et al. (2020, B20), which is based on the predictions of the bluetides cosmological hydrodynamic simulations4 .However, these predictions rely on bluetides correctly predicting the clustering of the ultra-high-z galaxy population.Given that the UV luminosity functions predicted by bluetides at  ≳ 11 are about an order of magnitude lower than the nominal observational measurements, it is not clear how reliable these predictions are.However, the TS08 abundance matching approach and the B20 simulation based approach should provide a reasonable bracket on the cosmic variance estimates.
We provide estimates of the cosmic variance  cv using these two methods for representative redshifts ( = 11 and 14) and magnitudes in Table 3.We note that  cv is the root fractional variance as defined in Eqn. 1 of Somerville et al. (2004).As an illustrative example, we adopt the CEERS field with the completeness corrected numbers of galaxies in redshift bins 9.7 <  < 13 and 13 <  < 15 from FL23.
The uncertainty due to cosmic variance for smaller fields such as SMACS0723 (9.5 arcmin 2 ) and GLASS (7.0 arcmin 2 ) are somewhat larger (e.g. cv = 0.29 at 9 <  < 11 and  cv = 0.36 for 12 <  < 14 using the TS08 method).For simplicity, we assume that the footprints of the fields on the sky are square (i.e. have an aspect ratio of unity).While elongated fields with aspect ratios much smaller than unity do have reduced cosmic variance relative to a square field of the same area (Moster et al. 2011), we have verified that for the field geometries investigated here, the cosmic variance is reduced by only 3% for the  = 11 field and 1.4% for the  = 14 field if we adopt the actual aspect ratio.
As anticipated, the  cv estimates from the B20 method are significantly larger than those from TS08.This is because the implied host halo masses and therefore halo and galaxy biases in these two pictures are quite different.The average biases in the abundance matching picture, as obtained from the TS08 calculator, are given in Table 3 and are  ≃ 12-15.As we noted, B20 predicts a lower galaxy number density at these redshifts than the observations (but their predicted UV LFs at  = 11 and  = 13 − 14 are quite consistent with our SAM predictions).Taking the observational LF estimates at face value, the different number densities could be reconciled either by assuming that the galaxies occupy the same mass halos but are brighter (as in our boosted "top heavy IMF" model), or by assuming that galaxies of a given luminosity actually occupy lower mass halos, which are more abundant.The abundance matching approach effectively puts galaxies in the "right" mass halos to reproduce the observed number density.These are lower in mass, and less clustered, than what bluetides predicts.

What have we learned about the physics of galaxy formation?
With robust halo catalogs and merger trees spanning halos over a mass range of ∼ 10 5.5 -10 12.5 from  ∼ 20-6, we are able to compute predictions for the properties of ultra-high redshift galaxies over an unprecedented dynamic range.Our predictions are based on the well-established Santa Cruz semi-analytic models, which adopt a very standard set of physical processes, including atomic cooling, partitioning of galaxies' ISM into multiple phases, star formation and stellar feedback, chemical evolution, and black hole seeding, growth, and feedback.Although these models contain simplified representations of these physical processes, we have shown in the past that the predictions for many key galaxy properties are remarkably similar to those from more detailed (and much more computationally expensive) numerical hydrodynamic simulations (Yung et al. 2019a;Gabrielpillai et al. 2022Gabrielpillai et al. , 2023)).One of the puzzles of galaxy formation manifested in observations of nearby galaxies is why star formation is so inefficient and why galaxies and halos capture such a small fraction of available baryons (e.g.Moster et al. 2010;Behroozi et al. 2013c).The standard solution to this puzzle is that heating and winds from massive stars and supernovae eject baryons and make star formation inefficient in low mass halos, and similar processes associated with accreting black holes eject gas and prevent it from cooling in massive halos (e.g.Somerville & Davé 2015).Our models, like all large volume cosmological simulations, contain parameterized phenomenological treatments of these key "feedback" processes, as it is infeasible to simulate the full range of scales and physical processes.These parameters have been tuned by hand to reproduce key global observable (or semi-observable) properties of galaxies in the nearby Universe, such as their stellar-to-halo mass relation, cold gas fractions, stellar metallicities, etc.There is absolutely no guarantee that the parame-terizations of key physical processes such as stellar feedback scale correctly in terms of the physical properties that are actually relevant.Therefore it is actually a bit surprising that the models perform as well as they do out to such high redshifts ( ∼ 10).In this section, we ask what we learn from the apparently growing tension between the model predictions and the observational estimates at  ≳ 10.
We note that we have verified in the past that under our current implementation, black hole feedback has a negligible impact on the galaxy populations at these very high redshifts, so the main processes shaping galaxy properties in our models are the rate of gas accretion and cooling, the efficiency with which cold gas can form stars, and the efficiency of stellar driven outflows, which eject gas from the ISM and suppress future star formation.Taking the observational estimates at face value for purpose of this discussion, we consider four hypotheses for why our models might be predicting lower number densities of galaxies than the observations suggest: (i) The standard ΛCDM model does not predict early enough formation of halos massive enough to host the observed galaxies.
(ii) Cooling is inefficient (iii) Star formation is inefficient (iv) Stellar feedback is too strong Hypothesis i: Not Enough Early Halo Growth: Several other studies have already shown that the rate of halo growth and the number density of halos massive enough to host the observed galaxies is sufficient in the standard ΛCDM cosmology with Planck-compatible parameters.Mason et al. (2023) showed that a model with "maximally" efficient star formation efficiency (all gas that accretes into halos forms stars with 100% efficiency) can produce predicted UV LF that are consistent with or higher than the observational estimates up to  ∼ 14.Similarly, F23 showed that the Behroozi & Silk (2015) model, which is based on a similar ansatz of star formation tracing halo accretion rates, can match the CEERS observed number densities out to  = 13.In agreement with other studies, we therefore conclude that although it is still of course possible that dark matter and/or dark energy behave differently than assumed in the "vanilla" ΛCDM model, which would result in modified evolution of the formation rate of dark matter structures, this is not necessary in order to accommodate the existing observational measurements up to  ∼ 14.

Hypothesis ii: Inefficient Cooling:
The second hypothesis is that enough halos have collapsed and accreted baryons into their circumgalactic medium (CGM), but perhaps for some reason cooling from the CGM into the ISM is inefficient, so that not enough cold gas accumulates in the ISM to fuel adequate star formation.We examine this possibility in Fig. 12, which is shown for  = 11.We see that in our models, the gas is cooling into the ISM at a rate similar to the accretion rate into the halo, so cooling is very efficient.The most common halos at these epochs have virial temperatures that place them near the peak of the cooling curve, leading to cooling times that are short compared to their dynamical times.Thus this is likely to be a fairly robust conclusion, in spite of our rather simplified treatment of cooling.
Hypothesis iii: Star formation is inefficient: Third, we consider the possibility that gas has cooled and accreted efficiently into the ISM, but perhaps for some reason galaxies are inefficient at making stars out of that gas in our models.For example, in our model we assume that star formation can only take place in gas that has been labelled as "molecular" by our gas phase partitioning scheme.We have adopted a partitioning scheme based on results presented by Hypothesis iv: Stellar Feedback is too strong: Finally, we examine the impact of stellar feedback our predictions at  ∼ 11.In our models, a mass outflow rate from the ISM is computed based on a parameterized mass loading factor.The mass loading factor is assumed to be only a function of the halo maximum circular velocity  max and two free parameters, a normalization and a slope.These parameters are fixed to match  = 0 observations (primarily the stellar mass function) and are not adjusted for the high redshift predictions.From Fig. 12 (right panel), we can see that the predicted mass loading factors range from about 2 in the highest mass halos in our gureft-90 box at  = 11, to several hundred in the lower mass halos.This implies that for every solar mass per year of star formation, up to several hundred times that amount of gas is ejected from the galaxy and possibly from the halo.It seems clear that lower mass loadings for stellar driven winds would lead to more rapid build-up of stars and luminous galaxies at ultra-high redshifts.Thus we conclude that efficient stellar feedback is the primary model ingredient that is responsible for the shortfall between the predicted and observed number density of luminous ( UV ∼ −19 to −20) at  ≳ 11 in our models.Many other models and cosmological simulations similarly adopt efficient stellar and supernova driven winds to achieve the low observed stellar-to-halo mass ratios in low-mass halos in the nearby Universe.Dekel et al. (2023) propose a promising explanation for why stellar feedback might be ineffective in the very early Universe, when the free-fall time is shorter than the time needed for massive stars to produce winds and supernovae.We highlight this process as one that is crucial to investigate in more detail in future work.

Theoretical and modelling uncertainties
In the above sub-section, we attempted to quantify the roles of different physical processes within the framework of our existing models.
However, there are several other theoretical uncertainties that we have not touched on, which we summarize here.We note that the metallicity of the circumgalactic medium impacts how rapidly it cools, so chemical yields and the efficiency with which metals are ejected from, and circulate back into, the CGM will also impact the predicted cooling rates.Similarly, the chemical .This shows the gas depletion time, which is given by the gas mass divided by the SFR, for molecular gas (orange) and for the total cold ISM gas (atomic, molecular, and ionized; turquoise) at  = 11.The horizontal grey line shows the corresponding age of the Universe.Our models actually predict that the majority of galaxies at  ∼ 11, especially the more massive ones, have depletion times that are much shorter than the age of the Universe at this epoch.
enrichment of the ISM and the formation of dust (not modelled here self-consistently) will have a strong impact on the modelling of star formation.We emphasize that the star formation prescription we have adopted was based on radiation-hydrodynamic simulations that are more than a decade old at this point, and on local universe conditions.In reality, H 2 may simply trace the dense gas where star formation is efficient rather than being a pre-condition for star formation (e.g.Krumholz et al. 2011).At gas metallicities less than about 0.1 Z ⊙ , additional drivers of the thermal state of the ISM become important, including reduced photoelectric heating, cosmic ray ionization heating, and H 2 cooling (Bialy & Sternberg 2019).
A related issue is the stochasticity, or burstyness, of star formation.
If the process of star formation at early times is highly stochastic, then galaxies could experience short periods of time when their UV luminosities are boosted due to starbursts.In fact, there is mounting observational evidence that ultra-high redshift galaxies did have extremely bursty star formation histories (e.g.Dressler et al. 2023;Endsley et al. 2023a;Looser et al. 2023).Several modelling studies have shown that very bursty star formation could help to increase the number of luminous objects that would be observable at very high redshift and reduce the tension between model predictions and ultrahigh redshift observations (Mason et al. 2023;Shen et al. 2023;Sun et al. 2023a,b).We also find that adding a stochastic burst component in post-processing appears to be able to reconcile our fiducial model predictions with observations out to  ∼ 13.However, in agreement with other studies (e.g.Shen et al. 2023), we find that a rather large stochastic component ( UV ∼ 2, where  UV is the root variance of a Gaussian random deviate in UV magnitude) is needed.This is much larger than the stochasticity produced in the high-resolution radiation-hydrodynamic cosmological simulations of Pallottini & Ferrara (2023), which yield typical  UV ∼ 0.6.We are therefore inclined to think that stochasticity alone is not responsible for resolving the discrepancy.However, weaker feedback, higher star formation efficiency, and bursty star formation are all expected to be consequences of the same physical changes in the ISM in ultra-high redshift galaxies.In future work, we plan to attempt to model these effects self-consistently in a physically motivated manner.
Another important process is the suppression of accretion and cooling by the meta-galactic photoionizing background after the Universe is reionized.We adopt a highly simplified treatment of photoionization feedback in our models, in which we assume that the Universe was reionized everywhere instantaneously at  = 8.After reionization, the baryon fraction that accretes into low mass halos is reduced, according to the the results of Okamoto et al. (2008).However, in the real Universe, early forming, luminous objects in dense environments will create bubbles which will start to overlap and form larger reionized regions.We do not properly model the complex interplay between photo-ionization feedback from both local and meta-galactic radiation fields and stellar feedback.This will be increasingly important to model accurately as JWST begins to push to lower luminosity galaxies.We also note here that Harikane et al. (2023) propose that the high observed number density of ultra-high redshift galaxies could be explained by the lack of suppression of star formation via photo-ionization before reionization.Our models do not include any suppression of star formation by the UV background at  > 8, yet they still underpredict the observed counts -therefore we suggest that this cannot be a complete explanation.
Another open question is how dust forms and is destroyed in the ISM at these early epochs.As noted above, dust is important for the chemistry and cooling of the ISM, and also impacts the observed SED of galaxies through attenuation and re-emission.Models and simulations have been presented that more self-consistently track the formation and destruction of dust through multiple channels (e.g.Popping et al. 2017;Vogelsberger et al. 2020;Dayal et al. 2022), but this physics is not incorporated into our current models.This will be an important ingredient to include as we update the realism of our modelling of star formation and the ISM in future work.
Another possibility is that some of the UV light in some of the ultra-high-z galaxies could be contributed by an accreting supermassive black hole (AGN), which would provide a luminosity boost (Pacucci et al. 2022).Our current models include black hole seeding and accretion, and do not predict a significant contribution from AGN at these redshifts, but this is subject to the details of our seeding and accretion modelling.Volonteri et al. (2023) investigated this possibility, finding that a massive black hole seed (such as might form through the "direct collapse" mechanism) growing at close to the Eddington rate could be detectable with JWST, and would be difficult to disentangle from star formation powered galaxies via the available JWST colours.Candidate AGN at ultra-high redshift have been reported from early JWST observations (e.g.Ono et al. 2023).However, in current samples, the majority of the detected objects have extended (rather than point source) morphologies, suggesting that an AGN is not the dominant source of luminosity (e.g.Harikane et al. 2023).Future JWST observations will help to clarify the contribution of AGN to the detected ultra-high-z population, and place constraints on models of black hole seeding and accretion.
Perhaps one of the largest uncertainties at ultra high redshifts is the properties of the stellar populations at these epochs.There is a large literature on the first stars (Pop III) which formed out of primordial gas and how their properties may have differed from present day stellar populations (e.g.Bromm et al. 1999;Abel et al. 2000;Zackrisson et al. 2011;Visbal et al. 2018).Although there is general agreement that the IMF was likely to have been more top-heavy in the early Universe, due both to the higher CMB temperature and lower metallicity, there is no detailed consensus on the expected Initial Mass Function (IMF) of Pop III stars, and even less on how the IMF may have evolved as the Universe is gradually polluted by metals (e.g.Pop III.2 stars).Clearly this will have a potentially profound impact on the galaxy SED, the chemical enrichment, and the stellar feedback -none of the existing cosmological simulations that have been compared with the JWST observations self-consistently incorporate these effects.Harikane et al. (2023) suggest that adopting a top-heavy IMF in early galaxies could help reconcile the theoretical predictions with observed number densities.They show that Pop III.1 and Pop III.2 populations with top-heavy IMFs modelled with the stellar population code YGGDRASIL (Zackrisson et al. 2011) can produce 3-4 times more UV photons for a given SFR than standard assumptions, in part due to the strong emission lines excited by massive stars.Raiter et al. (2010) similarly show the expected change in the ratio of UV luminosity to SFR for several IMF variants that might arise in the conditions typical of the high redshift Universe, also finding an expected increase of a factor of 2-10.This possibil-ity has also been discussed by FB23, who showed using abundance matching that a factor of ∼ 2 boost in the UV luminosities could reconcile the differences with models.We showed a simple representation of how a top-heavy IMF would affect our model predictions in Fig. 4, by simply boosting the luminosity of each galaxy in our models by a factor of 2-10.A boost of a factor of four brought our predicted number densities into good agreement with the observations.Clearly, modelling the formation of Pop III stellar populations and the transition through Pop III.2 to Pop II, and self-consistently incorporating appropriate stellar population models, is an important next step in modelling the ultra-high redshift Universe.Some early studies have found extremely blue UV slopes in ultra-high redshift galaxies, which could be an indication of the presence of extremely metal poor or even Pop III stars (Topping et al. 2022), although we note that these can also be indicative of high escape fraction (e.g.Endsley et al. 2023b;Jung et al. 2023;Mascia et al. 2023a,b).Emission line diagnostics from future JWST spectroscopic observations will be able to shed light on the physical nature of the UV radiation sources (Cleri et al. 2023).

SUMMARY AND CONCLUSIONS
In this work, we presented predictions for galaxy populations at  = 8-17 from the Santa Cruz semi-analytic model of galaxy formation, implemented within dark matter halo merger trees extracted from a new suite of N-body simulations called gureft.gureft is a suite of four cosmological volumes that can resolve halos over a broad range of mass (5 ≲ log  h /M ⊙ ≲ 12).Furthermore, unlike many existing N-body simulations, gureft stores finely spaced snapshots over the redshift range 40 <  < 6 to ensure adequate temporal sampling of merger histories for galaxies out to ultra-high redshift.The resulting predictions have unprecedented dynamic range over the exciting  ≳ 10 redshift frontier that has recently begun to be probed by JWST.
We presented predictions for the stellar mass function for  * (3 ≲ log  * /M ⊙ ≲ 10) and for the intrinsic (dust-free) rest-frame UV luminosity function for −9 ≳ log  UV ≳ −24 between  = 8 to 17.We also presented predictions for the scaling relations between stellar mass  * , rest-frame UV luminosity  UV , UV slope  UV , and gas phase metallicity  gas at  = 8, 11, and 13.In an appendix, we presented predictions for the relationship between stellar mass and rest-UV luminosity and halo mass at these same redshifts.These model outputs are compared to recent JWST and selected past Hubble and Spitzer observations.We also presented a detailed discussion of the observational uncertainties, including the potential impact of cosmic variance.We reach the following main conclusions: • We find excellent agreement between our predicted UV LF and observational estimates at  ∼ 8-10.At  ∼ 11, our predicted UVLF is about a factor of 10-15 lower than observational estimates at  UV ≃ −20.At  = 13, our model predictions at  UV ≃ −20 are about a factor of 30 lower than the observational estimates.
• We estimate representative uncertainties in the observed number counts at  = 11 and  = 14 due to cosmic variance from large scale structure.We find that this can be a significant source of uncertainty, and that the estimates themselves are highly uncertain, due to our lack of knowledge about the host halo masses of the observed galaxies and their clustering.
• We find that a modest "boost" in the ratio between UV luminosity and stellar mass of about a factor of four would bring our model predictions into reasonable agreement with the existing ob-  The cyan line marks the median and shaded area marks the 16th and 84th percentiles for galaxies in bins of  h .The last panel summarizes the evolution of  UV - h relation.

Figure 2 .
Figure 2. Predicted high-to ultrahigh-redshift stellar mass functions from the Santa Cruz SAM using gureft merger trees.The stellar mass functions based on merger trees extracted from the four different resolution gureft boxes are shown with different colours, as indicated on the figure label.Results are compared to the estimated stellar mass function based on HST observations reported by Song et al. (2016) at  ≲ 8, and previously published models from Yung et al. (2020b, cyan dashed lines).The grey dashed horizontal line marks where one object is expected for a survey area similar to that of the full CEERS area (∼ 100 arcmin 2 ) with Δ = 1 centred around the redshift labelled for each panel.

Figure 3 .
Figure 3. Predicted high-to ultrahigh-redshift rest-frame UV luminosity functions from the Santa Cruz SAM using gureft merger trees.The UVLF based on merger trees extracted from the four different resolution gureft boxes are shown with different colours, as indicated on the figure label.These results are compared to a compilation of observational measurements at high redshift (see text) and ultra-high redshift reported by Donnan et al. (2022, D22), Harikane et al. (2023, H23), Finkelstein et al. (2023b, CEERS Epoch 1 only; hereafter FB23), Finkelstein et al. (2023a, full CEERS field; hereafter FL23), Finkelstein et al. (2022b, F22b), Bouwens et al. (2023a, B23a), Pérez-González et al. (2023, PG23), Leung et al. (2023, L23), and Casey et al. (2023, C23) as well as previous predictions from Yung et al. (2019a,  ≲ 10) and Yung et al. (2020b, 11 ≲  ≲ 15).The grey dashed horizontal line marks where one object is expected for a survey area similar to that of the full CEERS area (∼ 100 arcmin 2 ) with Δ = 1 centred around the redshift labelled for each panel.Measurements derived from past Hubble and Spitzer Space Telescopes based studies are shown in grey.See text for more details on the observational studies included in this comparison.

Figure 4 .
Figure 4.A more detailed look at the UV LFs at  = 11 (left) and  = 13 (right).The combined outputs across the four gureft boxes are shown by the brown line.In the left panel, in addition to the  = 11 predictions, we also show the predicted UV LFs at  = 9 and 12 with the light brown dashed line above and below, respectively, as the observational sample of F23 spans a redshift range of  = 9.5-12.The observational measurements are the same as those shown in the corresponding panels in Fig.3.In addition, we show two estimates of the 1 uncertainty on the number density due to cosmic variance with the blue and pink error bars (see text for a full description of how these were calculated).The grey bar shows a representative total error, including the quoted error bar from the observational study combined with the estimated cosmic variance in quadrature.The light blue shaded regions and lines show the effect of "boosting" the UV luminosity of all galaxies in our model by a factor of 2-10, as indicated on the plot label.This illustrates that a fairly modest boost of a factor of ∼ 4 could bring our predictions into agreement with the observational measurements.

Figure 5 .
Figure5.A more detailed look at the UV LFs at  = 10 (left),  = 11 (middle), and  = 13 (right), where the fiducial predictions from this work made with the gureft merger trees are shown in brown, observational compilations as shown in the corresponding panels in Fig.3, and the potential effects of UV stochasticity shown with different shades of pink as labelled on the plot panels.This effect is implemented by adding a stochastic component to the predicted UV magnitudes of individual galaxies in post-processing, assuming the random component follows a Gaussian distribution centred at zero with standard deviation .We find that bursty star formation with a scatter of  ≃ 1.5-2 is required to bring our fiducial model into agreement with observations at  = 11-13.

Figure 6 .]
Figure 6.The evolution of the stellar mass function (left) and dust-free rest-frame UV luminosity function (right) over the redshift interval  = 6 to  = 17, using the combined gureft volumes.Our models predict relatively strong evolution in both quantities.

Figure 8 .
Figure 8. Scaling relation for stellar mass vs. rest-UV magnitude ( * - UV relation) for simulated galaxies at  = 8, 11, and 13.The small data points show SAM galaxies from the gureft boxes.The cyan line marks the median and the shaded area marks the 16th and 84th percentiles for galaxies in bins of  * .These results are compared with stellar mass estimates from SED fitting based on JWST observations from Whitler et al. (2022, red) and Santini et al. (2023, green), and Hubble and Spitzer observations from Tacchella et al. (2022b, grey).The last panel summarizes the evolution of  * - UV relation.

Figure 9 .
Figure9.Rest-UV magnitude ( UV ) vs. rest-UV spectral slope ( UV ) relation for simulated galaxies at  = 8, 11, and 13.The small data points show SAM galaxies from the gureft boxes.The cyan line marks the median and the shaded area marks the 16th and 84th percentiles for galaxies in bins of  UV .These results are compared with measurements from JWST observations fromCullen et al. (2023, blue)  andTopping et al. (2022, red), and Hubble and Spitzer observations fromTacchella et al. (2022b, grey).The last panel summarizes the evolution of the  UV - UV relation.

Figure 10 .
Figure10.The ratio of star formation rate to rest-UV luminosity, K UV , as a function of stellar mass for simulated galaxies at  = 8, 11, and 13.The small data points show SAM galaxies from the gureft boxes.The cyan line marks the median and the shaded area marks the 16th and 84th percentiles for galaxies in bins of  * .These results are compared with nominal values adopted by past studies Kennicutt, Jr. (1998, grey dot-dashed line) andMadau  & Dickinson (2014, grey dotted line).The last panel summarizes the evolution of this relation from  = 8 to 17.

Figure 11 .
Figure11.Metallicity of cold ISM gas as a function of stellar mass for simulated galaxies at  = 8, 11, and 13.The small data points show SAM galaxies from the gureft boxes.The cyan line marks the median and the shaded area marks the 16th and 84th percentiles for galaxies in bins of  * .The last panel summarizes the evolution of this relation from  = 8 to 17.

Figure 12 .
Figure 12.Diagnostics of several key physical processes in the Santa Cruz SAM at  = 11.The green line in the leftmost panel shows the maximum accretion rate of baryons into dark matter halos,  b  halo , where   is the universal baryon fraction and  halo is the total halo mass accretion rate averaged over 1 Gyr.The blue points show the cooling rate of gas in the Santa Cruz SAM, illustrating that cooling is very efficient.The middle panel shows the SFR (blue) and the mass outflow rate from the ISM (brown), illustrating that the mass loading factor in the SAM is quite high, such that a gas is ejected from the ISM at a rate of 2 to 200 times the SFR.

Figure B3 .
Figure B3.Rest UV magnitude vs. halo mass ( UV - h ) relation at  = 8, 11, and 13.The data points show SAM galaxies from the gureft boxes.The cyan line marks the median and shaded area marks the 16th and 84th percentiles for galaxies in bins of  h .The last panel summarizes the evolution of  UV - h relation.

Table 1 .
A summary of ultra-high-redshift resources available in current stateof-the-art cosmological simulations, including the highest redshift where snapshots are stored, the number of snapshots available at  ≳ 6, and the dark matter particle mass.

Table 2 .
Observed angular number density (number per arcmin 2 on the sky) of galaxies per field at 12 <  < 14.