The signature of galaxy formation models in the power spectrum of the hydrogen 21cm line during reionization

Observations of the 21cm line of neutral hydrogen are poised to revolutionize our knowledge of cosmic reionization and the high-redshift population of galaxies. However, harnessing such information requires robust and comprehensive theoretical modeling. We study the non-linear effects of hydrodynamics and astrophysical feedback processes, including stellar and AGN feedback, on the 21cm signal by post-processing three existing cosmological hydrodynamical simulations of galaxy formation: Illustris, IllustrisTNG, and Eagle. Overall and despite their different underlying galaxy-formation models, the three simulations return similar predictions for the global 21cm rightness temperature and its power spectrum. At fixed redshift, most differences are attributable to differences in the history of reionization, in turn driven by differences in the build-up of stellar sources of radiation. However, the impact of astrophysics is imprinted in the 21cm power spectrum through several unique signatures. First, we find significant small scale ($k \geq 10\, \rm {Mpc}^{-1}$) differences between Illustris and IllustrisTNG, where higher velocity winds generated by supernova feedback soften density peaks and lead to lower 21cm power in TNG. Second, we find more 21cm power at intermediate scales ($k \approx 0.8\, \rm {Mpc}^{-1}$) in Eagle, due to differences in ionization driven by highly effective stellar feedback, leading to lower star formation, older and redder stellar populations, and thus lower ionizing luminosities. Though subtle, these features could allow future observations of the 21cm signal, in conjunction with other reionization observables, to constrain theoretical models for galactic feedback at high redshift.


INTRODUCTION
Hydrogen in the intergalactic medium (IGM) was progressively ionized by light from the first galaxies during the epoch of reionization (Barkana & Loeb 2007).This process began a few 100 Myr after the Big Bang (roughly  ≈ 15 − 30), and ended when the Universe was over 1 Gyr old, by  ≈ 5.3, as inferred in Planck VI (2018).Direct observations of galaxies at these high redshifts are challenging, requiring long integration times, techniques such as strong lensing (Atek et al. 2018), or new telescopes such as JWST.Our knowledge of early galaxy populations and the sources of cosmic reionization is correspondingly limited.
Because reionization is driven by the ionizing light of galaxies, the progress of reionization is coupled to the population of galaxies in the early Universe.Therefore, we can learn about these first galaxies by observing the gas in the IGM.For instance, studies of the Lymanalpha transmission in the spectra of distant quasars (e.g.Fan et al. 2006;Becker et al. 2013;Eilers et al. 2018;Yang et al. 2020;Bosman et al. 2021) inform us of the time-frame of reionization, as well as the topology of the ionized regions as reionization completes.However, these techniques will not allow us to probe far beyond  = 6 due to ★ E-mail: lewis@iap.frthe increasingly large column densities of neutral hydrogen.This is where the 21cm line of atomic hydrogen plays a crucial role.
The 21cm signal amplitude from a particular volume traces the density of neutral hydrogen gas, its temperature and velocity, as well as the local radiation field (see Furlanetto et al. 2006, for a review of the physics).Thus, the sky-averaged signal traces the global progress of reionization and constrains first light.Already, there has been a reported detection of the global signal for  > 10 by EDGES (Monsalve et al. 2017), though this is strongly contested by results from SARAS-3 (Bevins et al. 2022).At the same time, interferometry and 21cm tomography can probe the heterogeneity of reionization: experiments such as LOFAR (Mertens et al. 2020), MWA (Trott et al. 2022) and HERA (Abdurashidova et al. 2022b) have begun to provide upper limits on the power spectrum of the 21cm signal.Over the coming decades, new 21cm observatories will see first light (e.g. the Square Kilometer Array; Koopmans et al. 2015).
In order to interpret this new data, we must understand the astrophysical processes and underlying reionization scenarios that shape the 21cm signal.Existing theoretical work can be divided into several categories based on the approach.Analytical models predict the evolution of the global 21cm signal (such as Furlanetto et al. 2006;Mirocha et al. 2015) and various statistics thereof.Such models are computationally cheap, and can be used to explore a wide range of cosmological and universal astrophysical parameters.
Cosmological hydrodynamical simulations, on the other hand, directly model baryonic matter, the formation of stars, and the complex non-linear coupling of physical processes across spatial scales and matter components.In particular, the implementation of stellar and supermassive black hole (SMBH) formation, and the corresponding feedback onto gas, are essential to regulate star formation and reproduce realistic galaxies (e.g.Dubois et al. 2014;Schaye et al. 2015;Pillepich et al. 2018a).Star formation and feedback models can fundamentally alter, for example, the stellar mass to halo mass relationship, in turn modifying the number and distribution of ionizing sources that drive reionization, and shape the 21cm signal at  ≲ 10.Feedback from star formation and SMBHs can also vary locally and over time, significantly affecting the distribution of matter in and around galaxies (e.g.Ayromlou et al. 2022), modulating the matter clustering also on ≳ Mpc spatial scales (e.g.Springel et al. 2018), heating the gas (e.g.Zinger et al. 2020) and determining the gas velocity structure in and around galaxies and haloes (Nelson et al. 2019b;Pillepich et al. 2021).All this can manifest itself differently at different cosmic epochs.However, simulations that do not model radiation transfer on the fly must treat reionization in postprocessing, either with excursion set techniques (Hutter 2018;Hutter et al. 2019) or by full radiative transfer methods (e.g.Georgiev et al. 2022;Semelin et al. 2017;Bauer et al. 2015).As a result, running cosmological hydrodynamical simulations as well as variants to explore the relationship between the 21cm signal and different underlying galaxy formation models is more expensive than with analytical or semi-analytical models.
Cosmological hydrodynamical galaxy simulations must also account for the coupled physics between gas and radiation that are relevant for reionization and the 21cm signal.Heating from ionizing (UV and X-ray) photons directly impacts the strength of the 21cm signal, and also suppresses star formation in low mass haloes (Dawoodbhoy et al. 2018;Wu et al. 2019), again potentially leaving an imprint on the 21cm signal.This effect can be captured with fully coupled radiative transfer simulations, where the gas ionization and heating are tracked on the fly.Second, in order to self-consistently derive the 21cm signal before IGM heating, one must track the production and transfer of Lyman −  photons to account for the Wouthuysen-Field effect (Wouthuysen 1952;Field 1958).Finally, correctly modeling the formation of the first stars is needed to determine the 21cm signal at early times (e.g.Magg et al. 2022;Sartorio et al. 2023), and relies on following the formation of H 2 -cooled mini-haloes, subsequent photo-dissociation of the molecular gas, and the transition from Pop III to Pop II stars (Klessen & Glover 2023).
Ideally, simulations must track the radiative transfer of all of the relevant photons in a fully coupled radiation and hydrodynamics (RHD) simulation.To date, several simulation projects have done so, focusing on reionization in a representative volume (for most EoR statistics i.e. ≳ 100 3 Mpc 3 ; see Iliev et al. (2014)), such as: CROC (Gnedin 2014), CoDa (Ocvirk et al. 2016(Ocvirk et al. , 2020;;Lewis et al. 2022), Aurora (Pawlik et al. 2017), and THESAN (Kannan et al. 2021).These efforts are increasingly successful at reproducing constraints on the ionization of the IGM.One should also note the SPHINX simulations (Rosdahl et al. 2018;Katz et al. 2021), that focus on more detailed but smaller volumes, and have been leveraged to investigate e.g.Lyman−  emission during the EoR (Garel et al. 2021).Because of the high computational expense of fully coupled RHD, simulations of volumes large enough to fully account for the effects of cosmic variance on 21cm statistics do not yet resolve galaxies (e.g.Semelin et al. 2007).Others couple approximate RT methods to the hydrodynamics (e.g.Davies et al. 2023).
In this paper we aim to strike a middle ground, by relying upon existing large cosmological hydrodynamical simulations of galaxies and by post-processing them to model reionization.Our goal is to compare predictions for the 21cm signal of hydrogen during reionization.In particular, we use the outcome of three cosmological simulations (Illustris, IllustrisTNG and EAGLE, see Section 2) and contrast their predictions to quantify the link between different galaxy formation models and the 21cm signal.These simulations allow us to self-consistently account for the complex interplay between hydrodynamics and galactic feedback processes.
In Sec. 2 we present the simulations used in this work, our approach for modeling the ionization and heating of the IGM gas during reionization, and finally our assumptions when computing the brightness temperature of the 21cm line, and its power spectrum.Then, in Sec. 3 we showcase the predicted brightness temperatures, starting with maps, moving on to the average global signal, and finally the 21cm power spectra.To explain these findings, we study statistics about the IGM gas and galaxies in Sec. 4. We finish with our concluding remarks in Sec. 5.

Cosmological hydrodynamical simulations of galaxies
In this paper, we investigate the 21cm signal from three cosmological hydrodynamical galaxy simulations suites: Illustris, IllustrisTNG, and Eagle.We base our analysis on the similar, roughly 100 3 Mpc 3 , volumes.All three simulation projects feature models that account for the star formation and baryon feedback physics required to form reasonably realistic galaxies by  = 0.However, significant differences exist in the implementation, parametrization, and outcome of these models.Whereas these have been compared to each other and to observational constraints across a diverse set of galaxy and large-scale structure properties at low redshifts (e.g.Somerville & Davé 2015;Vogelsberger et al. 2020), here we focus on the effects of the different underlying galaxy-formation models on the resulting 21cm signals during the epoch of reionization.In fact, differences in the feedback models across these three simulations are expected to impact the temperature and density of gas in the environments of galaxies, in turn altering the 21cm signal from these regions.

Illustris
The original Illustris simulation (Vogelsberger et al. 2014;Sijacki et al. 2015;Genel et al. 2014), encompasses a volume of 110.7 3 Mpc 3 , resolved by 1820 3 particles and an equal number of gas cells at the initial conditions.Each dark matter particle (gas cell) has a mass of 6.29 × 10 6 M ⊙ (1.26 × 10 6 M ⊙ ).Dark matter and stars have  = 0 gravitational softening lengths of ∼1.4 kpc, while the gas softenings are adaptive down to ∼0.7 kpc.Illustris was performed with the moving-mesh code arepo (Springel 2010), which solves the coupled equations of gravity, galaxy astrophysics, and hydrodynamics in expanding universes by adopting, for the former, an N-body Tree-PM scheme and, for the latter, a Riemann solver on an unstructured Voronoi mesh.
The Illustris galaxy formation model (Vogelsberger et al. 2013;Torrey et al. 2014) includes the key physics thought to be responsible for galaxy formation, and we here summarise the main aspects, focusing on the implementation of star formation and feedback.Gas

Illustris
IllustrisTNG Eagle  cools radiatively, via hydrogen and helium collisional processes, bremsstrahlung, and inverse Compton cooling off the CMB (Katz et al. 1996), in addition to metal-line cooling.It is heated by a spatially uniform, though time-variable, background radiation field (Faucher-Giguère et al. 2009).
The unresolved dense star-forming ISM gas is modeled with a sub-resolution two-phase model, which partitions star-forming cells into cold and hot phases (Springel & Hernquist 2003).Stars can form from the cold phase via a stochastic generation of discrete stellar particles, above a threshold density of   = 0.1 cm −3 .These represent stellar populations that form simultaneously, following a Chabrier (2003) initial mass function (IMF).Galactic-scale winds due to supernova feedback are modeled with a kinetic decoupled wind scheme (Springel & Hernquist 2003), which is effective at regulating the star formation of lower mass galaxies.
SMBHs are seeded in all sufficiently massive dark matter haloes (≳ 7 × 10 10 M ⊙ ).They subsequently grow over time via smooth gas accretion and SMBH-SMBH mergers.Feedback from SMBHs is modeled via three channels: thermal (or quasar), mechanical (or radio), and radiative.Thermal or quasar-mode feedback heats the nearby gas, occurs for SMBHs with high accretion rates, and is time continuous.Mechanical or radio-mode feedback takes the form of energy injected into the halo by jet-inflated bubbles, is implemented for SMBHs with low activity states, and is time stochastic.Finally, the radiative feedback directly alters the local radiation field, reducing the effectiveness of cooling in nearby gas.

IllustrisTNG
The three IllustrisTNG simulations, TNG100 and TNG300 (Nelson et al. 2018;Springel et al. 2018;Marinacci et al. 2018;Naiman et al. 2018;Pillepich et al. 2018b) and TNG50 (Pillepich et al. 2019;Nelson et al. 2019b), are the successors to Illustris and are also based on the code arepo.Here we focus on the intermediate volume simulation, TNG100, which spans 110.7 3 Mpc 3 and which has the same initial conditions random seed as Illustris.TNG100 has 2 × 1820 3 dark matter particles and initial gas cells, with corresponding particle/cell masses of 7.5 × 10 6 M ⊙ and 1.4 × 10 6 M ⊙ (slightly different than Illustris because of the different values of the cosmological parameters -see Section 2.1.4).
The TNG galaxy formation model has several significant changes with respect to Illustris, such as the inclusion of magnetic fields (see Pillepich et al. 2018a, for an overview).Here we focus on the model differences that could affect the gas over the large scales pertinent to the study of the 21cm signal.
First, the low accretion state SMBH feedback model is replaced by a more local, kinetic feedback channel.The SMBH seed mass was also increased, compensating for the removal of the boost factor in the Bondi-Hoyle accretion rate (see Weinberger et al. 2017, for a full review of the modified AGN feedback and SMBH physics).The updated SMBH feedback model produces a more realistic massive galaxy population (Donnari et al. 2019;Rodriguez-Gomez et al. 2019;Nelson et al. 2019b).Second, the galactic winds generated by stellar feedback are updated in a number of aspects, including metallicity-dependent energetics, isotropy of the energy input at the injection scale, and the introduction of a redshift scaling of launch velocity.Overall TNG has stronger and faster winds, particularly at high redshift, producing important heating and enrichment around star-forming galaxies at early times (Pillepich et al. 2018a).
We also analyse "non-radiative" runs of the TNG100 simulation in which we deactivate much of the baryonic physics: cooling, star formation and feedback.This TNG100-NR simulation has the same initial conditions and resolution as TNG100, enabling direct comparisons.Its behaviour is closer to the assumptions made in analytical (and some semi-analytical approaches) regarding baryonic matter.
It also serves as a control for the physics that we wish to study in this paper.In this simulation, where there are no collisional or radiative cooling or heating processes, we avoid making any assumptions about the temperature of the gas.When required, we use the TNG100 stellar particles as sources in TNG100-NR.
Though many of the physical ingredients for galaxy formation are largely similar as in Illustris and TNG, their implementations are different.First, there is no sub-grid two-phase ISM model.Instead, star-forming gas follows an effective equation of state.The Chabrier (2003) IMF is also adopted, while the stochastic star-formation scheme relies on a metallicity dependent density threshold.Third, stellar feedback from Type II SNe is modelled by a stochastic thermal injection scheme as described in Dalla Vecchia & Schaye (2012), as opposed to the kinetic schemes of the other two simulations.This heats nearby gas to very high temperatures to produce effective outflows.Third, SMBHs are seeded in lower mass (> 10 10 M ⊙ ) haloes.Fourth, the feedback from SMBHs is also modelled by a single thermal channel (as opposed to three channels of Illustris and TNG), and is time integrated rather than continuous (Booth & Schaye 2009).
The data of these three simulations, both catalogs and particlebased data, are publicly available and described by Nelson et al. (2015), The EAGLE team (2017), Nelson et al. (2019a) for Illustris, EAGLE and TNG, respectively.In all three simulations, dark matter haloes and galaxies were identified using, consecutively, a Friendsof-Friends and the SUBFIND algorithms (Springel et al. 2001). 1 2.2 UV/X-ray background and temperature of the IGM Illustris, TNG and EAGLE are not fully coupled radiation and hydrodynamics (RHD) simulations and therefore they do not explicitly model the heating from ionizing photons (UV and X-ray) that originate from galaxies and SMBHs.Hydrogen reionization is implemented by activating a time-dependent, spatially-uniform ionizing background (UVB).In the case of Eagle, this is adopted from Haardt & Madau (2001) and turned on at  = 11.5 (Schaye et al. 2015).Illustris and TNG also include a UVB, from Faucher-Giguère et al. (2009), which is only turned on at  = 6.Therefore, prior to  = 6, in Illustris and TNG100 the IGM is cold ( ≪ 10 4 K), whereas any non-negligible UVB would act to heat this gas to ∼ 10 4 K.
To model more realistic gas temperatures at  > 6 on the large spatial scales of the IGM, we recalculate gas cell temperatures in postprocessing.To do so we re-run the TNG model cooling network for each gas cell, including the self-shielding from Rahmati et al. (2013), stopping when equilibrium is reached, i.e. when the relative change in temperature between cooling steps is smaller than 10 −4 .For low density IGM gas, this happens relatively quickly (in ≲ 10 Myr) compared to the time separation of snapshots (> 40 Myr), avoiding transitions between snapshots.As the heating is done in post-processing, we cannot consistently account for heating due to gravitational effects, and stellar and AGN feedback.Therefore, in cells that are already hotter than typical temperatures from photo-ionization (> 2×10 4 ), we retain the original simulation temperatures.
We verify this approach with a variation of the TNG100-3 simulation -a run of the same TNG100 volume with the same initial conditions, unchanged galaxy-formation model but with 64 (4) times worse mass (spatial) resolution -where the UVB was included for all  < 11.This resolution is sufficient to capture effects in the low-density IGM, where we modify temperatures.Adopting a power law for the temperature-density relation in low density gas ( H < 10 −3 cm −3 ), we find < 8% relative difference in temperatures between the full UVB run and our equilibrium approach.Furthermore, a comparison of the average IGM temperature of TNG, Illustris, and Eagle to observational constraints (which we do not show) returns a broad agreement (with e.g.Boera et al. 2019;Gaikwad et al. 2020), which is satisfactory for this comparative study (as it does not focus on modeling the most realistic Reionization).
This post-processing allows us to model the temperature in the IGM when heating has already finished ( ≲ 8 − 10, see Ciardi & Madau 2003).At higher redshifts, our simulations and postprocessing cannot account for some of the physical effects that are important in the early heating of the IGM.2 Consequently, throughout this paper, we refrain from making predictions for  > 8.

Brightness temperature of the 21cm line
In this work, we derive the differential brightness temperature of the 21cm line with respect to the CMB.The relevant expression is given by (Furlanetto et al. 2006): where  b is the differential brightness temperature (hereafter, simply referred to as the brightness temperature, for brevity),  s is the spin temperature of the gas,  CMB is the temperature of the cosmic microwave background,  is the cosmological redshift, and   0 is the optical depth at 21cm line center. is the wavelength of observation, which is cosmologically redshifted (i.e. =  0 /(1 + )).
Making several simplifying assumptions (see Furlanetto et al. 2006, for details), this can be rewritten as (Mesinger et al. 2011): where Ω  is the cosmological baryon density, Ω  is the cosmological density parameter for matter, z is cosmological redshift,  HI is the mass fraction of neutral hydrogen gas,  H is the gas density of hydrogen nuclei (and ⟨ H ⟩ its mean),  is the Hubble parameter, and   is the gas velocity gradient along the line of sight (LoS).We use the cosmological parameters of each simulation. H is taken from the simulation data. CMB is taken to be  CMB = 2.725(1+ ) (Furlanetto et al. 2006;Kannan et al. 2021), and   is computed using simulation gas velocities along the x direction.For  HI , we adopt a post-processing strategy described below (Section 2.4).
The velocity term of Eq. 2 can diverge if the LoS gas peculiar velocity gradient is negative and close to the Hubble expansion rate, leading to power spectra of the brightness temperature that can be artificially dominated by velocity effects.A possible remedy is to limit the values of   / (e.g.Mesinger et al. 2011), though this poses the question of which threshold to pick, and renders the resulting  b more complicated to interpret.Here, we circumvent this issue and adopt an approach similar to Mao et al. (2012) as implemented in the tools21cm software package (Giri et al. 2020).For every LoS ( direction3 ),  b in each cell of the uniform grid is sampled by 20 pseudo-particles.These particles are then moved along the LoS direction according to the local gas peculiar velocities (as directly measured from the hydrodynamical simulations) and Hubble expansion.The particles are then binned back onto the uniform grid, smoothing the initial  b values and mimicking the effects of redshift space distortion (RSD) on  b .We discuss these choices further in the section 4.4.
A common assumption is that, during reionization,  s ≫  CMB (e.g.Mesinger et al. 2011;Kannan et al. 2019), with the effect that the spin temperature term in Eq. 2 vanishes.However, in this paper, we wish to examine the effect of galaxy formation model choices on the power spectra of the brightness temperature.These can alter the temperature of gas around galaxies, e.g.due to stellar or SMBH outflows.This encourages us to use the gas temperature in our model of  s , in order to capture the differences from the different galaxy formation models. s can be expressed as follows (Eq.3): where   is the coupling coefficient for collisions,   is the coupling coefficient for the Wouthuysen-Field (WF) effect,  gas is the gas temperature, and  c is the colour temperature of the Lyman- line.
As we cannot compute   from first principles, we limit our study to  ≲ 8, when the heating of the IGM is finished,   ≫   ≈ 1, and  c ≈  gas , so that  s ≈  gas (e.g.Ciardi & Madau 2003;Furlanetto et al. 2006).We checked these assumptions by computing   ,   , and  s as in Gillet et al. (2021), and found negligible differences in our results.In TNG100-NR, we always assume  s ≫  CMB .

Deriving 𝑥 HI in post-processing
As mentioned in Section 1 and described in Section 2.3, the 21 cm brightness temperature directly depends on the fraction of neutral hydrogen gas and thus on its density.In TNG, Illustris, and Eagle, the fraction of neutral hydrogen gas in each resolution element is determined by a balance between collisional processes computed from the gas state, and photo-heating and photo-ionisation processes computed using a uniform ultra-violet background (UVB).This UVB represents the combined emission of all ionising sources.
In practice, the UVB is applied to all gas in a spatially uniform manner.Thus, in these simulations, the IGM is progressively ionized in a rather homogeneous way, while in reality the IGM reionizes due to nearby sources.Therefore, the first areas to reionize are situated close to the overdensities where galaxies form.To accurately model the 21cm signal during the EoR, we must take this into account, and post-process the simulations so that the ionized fraction of hydrogen gas reflects the source density.We carry out this post-processing with the CIFOG code (Hutter 2018).

CIFOG
CIFOG (Hutter 2018) is a tool for post-processing simulations in order to determine the ionization fractions of hydrogen and helium as they evolve throughout reionization.It is based on the excursion set formalism widely used in 21cm predictions (see e.g.Furlanetto et al. 2006;Mesinger et al. 2011).However, CIFOG allows the input of grids of gas density and ionizing photons rates as directly predicted or derived from hydrodynamical galaxy simulations, whilst retaining a low computational expense when compared to full radiative transfer.For our use case CIFOG allows post-processing of the simulations to capture varying star formation and galaxy formation models in order to assess their imprint on the 21cm hydrogen emission line.
For precise details about CIFOG's design and examples of its use, we refer the reader to Hutter (2018) and Hutter et al. (2020).
With CIFOG we generate outputs at the same redshifts for all three simulations.Since the 21cm signal is proportional to the neutral fraction of gas, and because the average neutral gas fraction evolves by several orders of magnitude during the final ≈ 100 Myr of reionization, we take CIFOG outputs separated by ≈ 20 Myr to sample rapid changes in the 21cm signal.

Eulerian grid
CIFOG only supports Eulerian input grids of size 2  (with  an integer).We also require an evenly sampled grid to compute the power spectrum of the simulation fields.We therefore construct a set of Eulerian grids for the density, temperature, and line of sight velocity fields of each snapshot, for each simulation.For the bulk of the analysis, we choose a grid of 1024 3 cells, giving a spatial resolution of roughly Δ  ≈ 100 kpc (depending on the simulation).This spatial resolution is adequate for resolving the large spatial scales that can be constrained by current or near-term future observations of the 21cm emission line of hydrogen (see e.g.Koopmans et al. 2015;Kolopanis et al. 2019Kolopanis et al. , 2022;;The HERA Collaboration et al. 2021).
We use the standard cubic-spline kernel with an adaptive size and deposit gas mass and mass-weighted gas properties accordingly into the Eulerian grid (following Nelson et al. 2016).For TNG and Illustris the kernel size is the spherical volume-equivalent radius of each Voronoi cell.For Eagle, we use the SPH adaptive smoothing length.Rarely, low density void regions are not sampled, and in these cases we infill the grid values from nearest neighbor values. 4ith CIFOG we save regularly spaced outputs of the  HI field, obtaining more CIFOG outputs than simulation snapshots.This is particularly true for  > 6 and in Eagle, for which we only have 3 snapshots covering the 6 ≤  ≤ 8 interval.When necessary we associate each  HI field to the closest in redshift snapshot.Where possible, we verified that our conclusions persist if we limit ourselves to the study of redshifts where we have a snapshot.This is possible because the evolution of  HI is the main driver of large scale evolution of the brightness temperature and power spectrum of the 21cm line during the epoch we study (e.g.Furlanetto et al. 2006).

Sources of ionizing radiation and escape fraction
We model the sources of ionizing radiation from star forming regions as follows, neglecting throughout radiation from SMBHs. 5o determine the hydrogen ionizing luminosities of stellar particles we use the BPASS v2.2.1 (Eldridge & Stanway 2020) stellar evolution model to tabulate ionizing luminosity per stellar mass as a function of stellar particle age and birth metallicity, including the effects of binaries.We include all ionizing wavelengths from the BPASS v2.2.1 stellar spectra, spanning 1.2397 × 10 4 eV ≥ ℎ ≥ 13.6 eV.
Due to the coarse resolution of our Eulerian grid, and the approximate treatment of radiative transfer performed by the CIFOG code, we must account for the absorption of ionizing photons by neutral HI gas as they travel from stellar sources to the IGM heuristically.We take the simplest choice and assume a constant fraction of all produced photons are absorbed by dense neutral galactic gas.The corresponding escape fraction is therefore an effective value, that does not evolve in time, or account for any dependencies on galactic properties (as increasingly suggested by simulations, for instance: Rosdahl et al. 2022;Kostyuk et al. 2022).Nevertheless, similar models are widely used, particularly in analytical and semi-analytical approaches (e.g.Dayal et al. 2020).We calibrate the escape fraction (  esc ) using the timing of reionization in TNG100, finding a reasonable result for  esc = 0.2 (  20% esc ).

Fiducial post-processing model
Putting together the methodologies described above, throughout this paper we adopt the following fiducial choices to obtain a prediction of the 21cm brightness temperature from the Illustris, TNG100 and Eagle simulations.We assume that the spin temperature ( s ) is coupled to the gas temperature ( gas ).We adopt ionizing luminosities of stellar populations from the binary BPASS V2.2.1 stellar evolution model (Eldridge & Stanway 2020).We assume a fixed escape fraction of 20%.We then derive the mass fraction of ionized gas in each cell, using the source and hydrogen density distributions from each simulation, with CIFOG.Our fiducial model for the brightness temperature of the 21cm line accounts for redshift space distortion (RSD) along the line of sight.
In Appendix A2 we examine the sensitivity of our 21cm brightness temperature calculation to all different model choices and assumptions discussed above.

Computing power spectra
Beyond its spatial average, the most informative and accessible summary statistic of the 21cm brightness temperature is its 2-point correlation function, or power spectrum.
To compute the power spectrum of the brightness temperature ( 2 b Δ 2 21 , or 21cm power spectrum hereafter), we first determine the mean of the perturbation subtracted  b .We then use the tools21cm package (Giri et al. 2020) to compute  2 b Δ 2 21 in Fourier space.The smallest scales we probe are limited by the grid resolution, while the largest are limited by the simulation box sizes.In our case this corresponds to the interval  ∈ [≈ 0.06 cMpc −1 , ≈ 58.49cMpc −1 ].However, the power spectra at the largest scales (smallest k) suffer from under-sampling, and those at the smallest scales (largest k) are susceptible to aliasing.Throughout the paper we plot a conservative range, where we have more than eight resolution elements contributing to the radially averaged power spectrum.This corresponds to  ∈ [≈ 0.11 cMpc −1 , ≈ 29.25 cMpc −1 ].
In order to investigate the differences between  2 b Δ 2 21 across simulations, we also decompose the dimensionless power spectra according to the individual terms of Eq. 2: where Δ 2 , is the power spectrum of the field X, and Δ 2 , is the cross power spectrum between fields X and Y.This allows us to compare the contribution of different simulation fields to the total  2 b Δ 2 21 , with two caveats.First, we do not compute 3rd order and higher terms, which can be important at small scales (e.g.Furlanetto et al. 2006;Georgiev et al. 2022).Second, since the velocity effects are not directly included in the computation of  b but accounted for by smearing  2 b Δ 2 21 , the contribution from the velocity term (i.e.redshift space distortion) is derived by a simple subtraction between the final  2 b Δ 2 21 and the  2 b Δ 2 21 before the correction for velocity effects is made.Thus, we do not separate between different velocity and cross-correlation velocity terms.
In Appendices A2 and A1 we examine the sensitivity of our power spectra calculations to different model choices and assumptions.Further, in Appendix A3 we study the convergence of our fiducial  2 b Δ 2 21 predictions with respect to the resolution of our Eulerian grid.

Maps of the 21cm brightness temperature
Figure 1 shows slices of differential brightness temperature | b |, where each panel is ≈100 Mpc a side, at  = 8, 7, 6 in TNG100, Illustris, and Eagle.Emission peaks occur at the overdensities in filaments and galaxies.As time progresses, more and larger regions of low  b appear.These are mainly driven by the formation and growth of ionized bubbles around collapsed structures.
There are many similarities among the simulations, particularly between TNG100 and Illustris as they share the same initial conditions.However, differences due to star formation and gas distributions are also apparent.The TNG100 and Eagle maps have more, smaller bubbles of low  b (ionized HI bubbles), when compared to Illustris, but also more bubbles overall.This suggests that the distribution of star formation amongst haloes differs strongly at these high red-shifts.Within these bubbles, one can distinguish faint filament-like emission features that correspond to dense underlying gas structures.

Evolution of the average 21cm brightness temperature
The left panel of Fig. 2 shows the redshift evolution of the average brightness temperature in TNG100, Illustris, and Eagle.The markers and thin lines show the evolution of  b according to our fiducial post-processing model (Section 2.4.4).For all three simulations,  b decreases over time as the volume filling factor of ionized hydrogen increases, and the total volume where  b ≈ 0 increases.
We predict a lower signal in Eagle across all redshifts.Illustris and TNG100 are similar at  = 8, while the signal decreases faster in TNG100 and is closer to the Eagle result for  ≲ 6. Differences in  b between TNG100 and Illustris arise from their unique feedback prescriptions.This is likely also the case for differences in the  b of Eagle, as they cannot be explained by differences in large scale density (closer than 2%).The shaded regions show a level of modeling uncertainty representing the breadth of all explored postprocessing models (described in Appendix A).Broadly, the Illustris and TNG100 models are the closest, whereas Eagle-based models of  b show the largest scatter in values, lying below the other two simulations for most model combinations.
The middle panel of Fig. 2, shows the evolution of the average  b as a function of the spatially-averaged neutral fraction (average  HI ).This accounts for differences in reionization histories across the simulations and source models.As expected,  b decreases rapidly with decreasing average  HI , and the three predictions of  b are strikingly close at fixed  HI , irrespective of the underlying hydrodynamical galaxy simulation.The Eagle model variants are again the most diverse from the fiducial model.Gillet et al. (2021) perform full RHD simulations, with sub-grid models for star formation and luminosities based on smaller, more resolved RHD simulations.They report a similar evolution of the brightness temperature as ours from Illustris, TNG100 and Eagle, particularly when compared to Eagle.On the other hand, we find significantly higher temperatures than the analytical result (in the saturated case) from Mirocha et al. (2017).However, when accounting for our different reionization histories, our results are close to their predictions near  = 6, and they are even closer to the RHD simulation results of Gillet et al. (2021).

Power spectra of the 21cm brightness temperature
Fig. 3 quantifies the fiducial power spectra of the 21cm brightness temperature,  2 b Δ 2 21 , and the fiducial reionization histories from TNG100, Illustris, Eagle, and TNG100-NR (see Section 2.1.2).We focus on the epoch between  = 8 and  ∼ 6.For all simulations and times, the 21cm power spectrum increases with decreasing spatial scale, reflecting the underlying matter density power spectrum and the high variance in the density of baryons at small scales.
Broadly, all the predicted 21cm power spectra show the same time evolution: as the first large (>1 Mpc) ionized bubbles form between  = 8 and  = 7, the large scale variance in the brightness temperature increases, driving an increase in the power of  b at large spatial scales, which reaches a maximum somewhere near  ≈ 6.5 ( HI ≈ 0.65).Beyond this point, the power drops at all scales as the volume filling fraction of ionized hydrogen climbs to unity (see e.g.Kannan et al. 2021, for recent similar findings).
Beyond this similar evolution of the 21cm power spectra across different simulations, the values across scales also appear relatively consistent with one exception: both TNG100 and Eagle have significantly less small scale power than Illustris and TNG100-NR.Finally, although the reionization histories of the simulations are similar (5.3 < ( HI = 0.1) ≤ 6 for the fiducial model), they are not identical.For instance, Eagle starts reionization earlier than the other boxes under our fiducial assumptions.This can been seen in the very low normalisation of 21cm power spectrum at  = 5.7 in Eagle with respect to the other simulations at the same redshift.
To directly compare the three simulations, the top row of Fig. 4 shows the 21cm power spectra at three redshifts (left to right), with all simulations in each panel (TNG100, Illustris, Eagle, and TNG100-NR).Overall and as qualitatively seen above, the 21cm power spectra are very similar across the board.Indeed, the changes produced by our various post-processing models and choices are of the same order or larger than the differences across simulations at fixed redshifts.Across scales and for the first two studied epochs (top left two panels), the different predictions for each simulation show similar trends and normalisation.In fact, the fiducial 21cm power spectra of TNG100 and Eagle are similar for most small scales.This is somewhat surprising as, although they are both recent galaxy formation simulations, they are only calibrated to produce realistic galaxy populations at  = 0.In addition, the small-scale signals of Illustris and TNG100-NR are similar.Both predict consistently higher 21cm power spectra for all models at the smallest scales we study.Since TNG100, TNG100-NR, and Illustris were run with the same initial conditions, these changes stem solely from the galaxy formation physical models and the choices made for their parameters.
The higher power on small spatial scales, according to Illustris and TNG100-NR, likely arises due to differences in the matter power spectrum of gas, that is the main contributing term to the 21cm power spectra at small scales (Furlanetto et al. 2006).Feedback simultaneously smooths out baryons at small scales within galaxies, while pushing a fraction of the smallest scale power to larger scales.Since the small-scale differences between TNG100 and Illustris must arise from the different prescriptions for star formation, and stellar and SMBH feedback, and since TNG100-NR was run without star formation or feedback and is very close at small scales to Illustris, then the high-redshift feedback in Illustris is clearly weaker than in TNG.These conclusions are consistent with the changes to the stellar feedback model between the Illustris and IllustrisTNG models.In particular, high-redshift galactic winds have higher initial velocities in TNG100 than in the original Illustris model (see Pillepich et al. 2018a, for more details).
In the top right panel ( = 5.7), reionization is almost complete in all the models: any large changes in the power spectra are set by different average  HI .Modulo this difference, which mostly affects the overall normalisation, the shape of the power spectra are similar.
We find our results to be quite similar across all scales to those of Hutter (2018) at  = 7.8, that use CIFOG to post-process N-body dark matter simulations.However, their fiducial prediction is closer to some of our model variants than to our fiducial model.At  = 6.3 the Hutter (2018) curve is much flatter than our predictions, so that we only agree at large scales.On the other hand, all of our predictions are compatible with the most recent observational upper limits (black arrows).However, with  2 b Δ 2 21 < 10 3 mK 2 (Abdurashidova et al. 2022a), current data is not yet constraining.Significant observational advances are required in order to discriminate across our results.
To account for reionization history differences among the simulations and models, we turn to the 2  row of Fig. 4 where the 21cm power spectra in each panel are shown at epochs with similar average neutral fractions (average  HI ).For all simulations and times, the predictions are closer, and the shaded areas representing our other post-processing models are much tighter and closer to our fiducial results.To enable quantitative comparisons, we show the same results in the 3   row, but normalised by the TNG100 outcome.We find that, overall, our various predictions are similar: all models are within less than a factor of two from the fiducial TNG100 predictions.TNG100 and Eagle are very similar at small scales, as commented upon above.In particular, the fiducial predictions from Eagle and TNG100 are roughly within 15% for  ≥ 2 Mpc −1 , i.e. at scales smaller than about 500kpc.At the same scales, and before  HI = 0.3, Illustris predictions are almost identical to TNG100-NR, and both have more power than Eagle and TNG100, by an amount that increases up to 200 % by  ≈ 20 Mpc −1 .
Near  ≈ 0.8 Mpc −1 , our fiducial Eagle predictions are about 30 % higher than those based on TNG100, TNG100-NR, or Illustris (when  HI ≲ 0.6).Depending on our post-processing model combinations, this difference is less or more pronounced.At the largest spatial scales, all simulations agree to within 50 %, except Illustris when  HI = 0.85, which has far more power than TNG100 and whose predictions are very dependent on our post-processing modeling choices -the shaded area covers 50 − 180 %.TNG100 and TNG100-NR are quantitatively similar at large scales.
When accounting for the reionization history, we find that our predictions are the most similar across all scales when average  HI = 0.3, with the majority of results lying within ≈ 40% of the fiducial TNG100 result.
Comparing to previous models, the predictions from other works are within a factor two of our results.The agreement with the Hutter (2018) predictions is improved by comparing results at fixed average  HI .Since the THESAN (Kannan et al. 2021) results are based on similar physical prescriptions as TNG100 (with the addition of fully coupled radiative transfer), the comparison between the THESAN and TNG100 predictions is informative: see the dashed black curves in the middle panels of Fig. 4. At average  HI = 0.6 the lowest TNG100 prediction is close to the THESAN outcome on large spatial scales.However, at progressively smaller spatial scales (larger ), the THESAN result is systematically 50 % lower than TNG100.At average  HI = 0.3, the agreement is better: at large scales the results between TNG100 and THESAN are interchangeable, whereas at the smallest scales the THESAN outcome is close to the lowest TNG100 prediction.However, near  ≈ 2 Mpc −1 , the THESAN predictions are again roughly 50 % lower.This larger power in TNG100 versus THESAN may be due to our use of CIFOG in lieu of a full radiative transfer (RT) method.Indeed Hutter (2018) shows differences of a similar degree between their CIFOG scheme and a full RT scheme at these scales and larger.However, coupling the TNG100 models to RT could also have significant effects on galaxy formation and the sources of reionization, potentially increasing the differences with respect to TNG100.We further expand on the comparison to other simulation-based models and observations in the Section 4.
Overall, our results paint a picture of relatively close predictions for the 21cm signals at  = 6 − 8 from recent cosmological hydrodynamical non-RT galaxy simulations.This is particularly true for Eagle and TNG100 at small scales, and in comparison to other theoretical works.At the same time, unique features from individual simulations are present (e.g. the bump at  ≈ 0.8 Mpc −1 in Eagle with respect to the other boxes), which are robust to our post-processing modeling choices.In the next section, we discuss these findings, in terms of the power spectra of the constituent fields of  b , with respect to galaxy properties, and as a result of our post-processing choices.

Contributions to the 21cm power spectrum
Fig. 5 shows the largest contributing terms of the dimensionless power spectrum of the 21cm brightness temperature, Δ 2 21 , that we compute (as per Eq. 2) alongside the fiducial predictions from each simulation, at three key moments of reionization.As before, in all four simulations, the fiducial Δ 2 21 (solid thick curves) increases with time at large scales.Here, we directly see the rise of the  HI term (dotted) at large scales between average  HI = 0.85 and  HI = 0.3, which drives a corresponding increase for the total 21cm power.
It has been extensively shown that the two most important components to the 21cm spatial correlations are the gas density power spectrum and the neutral fraction power spectrum (Furlanetto et al. 2006).Our results echo these findings: for all simulations and considering all cosmic epochs, the  HI term (dotted) is the most important at large scales ( < 2 Mpc −1 ) whereas the gas density term (dashed) dominates at smaller scales, especially towards the end of reionization.We find that the next most important contribution to our fiducial result comes from RSD (i.e. the velocity term; solid thin curves), which can contribute close to 10 % of the total power between  = 1 and 20 Mpc −1 , depending on the simulation.
This decomposition is only precisely valid if the correlations between the constituent fields of  b are null, which is the not the case.This is visible at certain scales, where the sum of the depicted components does not equal the fiducial result.Particularly at the smallest scales, the difference between the gas density power spectrum and Δ 2 21 is made up by the missing (negative) contribution from the higher order terms, that we omit in this work.However, this simplification does not affect a relative comparison between the simulations.
At  ≈ 1 Mpc −1 , there is more  HI power in the Eagle simulation than in TNG100 and Illustris: this contributes to the bump in the 21cm power spectrum in Eagle with respect to TNG100 near  ≈ 1 Mpc −1 ., according to TNG100, Illustris, Eagle, and TNG100-NR in our fiducial post-processing model.The colours denote the progression of the power spectra from  = 8 (light) to  ≤ 5.7 (dark).The associated colour bars also show the average neutral fraction at each corresponding redshift.Curves are thinner on scales that are compromised by aliasing or box size effects.For all simulations, the 21cm power spectrum first increases with time at large spatial scales; then, as a greater fraction of the volume is ionized, it decreases homogeneously over all scales.
At the largest scales, and at average  HI = 0.85, the Illustris  HI power is greater than in the other simulations, potentially resulting in the correspondingly larger  2 b Δ 2 21 in Illustris.At smaller spatial scales, when the gas density power spectrum is largest, we find that differences in the Δ 2 21 across simulations map to differences in the gas density power spectrum, at least for average  HI ≥ .35.Indeed, the spatial correlations of gas density and velocity in Illustris and TNG100-NR are similar over the scales where the simulations have comparable normalised 21cm power spectra.At the same time, the apparent agreement between the 21cm power spectrum from Eagle and TNG100 is actually a coincidence that arises from a lower gas density, but higher velocity, power spectrum in Eagle.The significant differences (up to a factor of 2) in the gas density and velocity power spectra at  ≥ 10 Mpc −1 in TNG100 and Illustris (and TNG100-NR), that share the same initial conditions, can only arise from the underlying differences in the baryonic and feedback physics models.The similarity of the Illustris and TNG100-NR results suggests that feedback in Illustris has a negligible impact on gas at these scales and high redshifts ( ≳ 6).With respect to TNG100 and Eagle, stellar feedback in Eagle may be more effective at ejecting matter from the central regions of galaxies, with even higher velocity   , from TNG100, Illustris, Eagle, and TNG100-NR across time.Coloured curves show the fiducial results, whereas the shaded areas indicate the spread of results using different post-processing models for  b (Section 2.4.4 and Appendix A2).Thinner curves denote scales where the predictions are affected by resolution or box size.Top row: power spectra for all four simulations at fixed redshifts.Middle row: power spectra grouped so that the average neutral fraction ( HI ) in every panel is similar.Bottom row: Same as the middle row, but normalising all power spectra by the TNG100 result to quantify the level of (dis)agreement.
winds than in TNG.This would explain the lower matter density and higher velocity power spectra in Eagle.In fact, for  ≤ 8 Mpc −1 , and particularly as reionization completes, the velocity term from Eagle has the most power (by a factor of ≲ 2), indicating the greatest variation in line of sight velocity gradients.
Finally, we can confirm that the different predictions of the largescale 21cm power spectrum from Illustris and TNG100 are due to differences in the large-scale ionization fraction.To directly compare the stellar source populations, we use the shared initial conditions of TNG100 and Illustris to perform another CIFOG run using TNG100 and our fiducial assumptions, but with the sources of ionizing radiation as predicted by Illustris (see Section 2.4.3).Fig. 6 shows the ratio of the 21cm power spectrum from this new run (named "Illustris sources") and from our fiducial TNG100 and Illustris predictions to the TNG100 result, when average  HI ≈ 0.6.The outcome of this test run is strikingly close to those from Illustris for scales  ≤ 7 Mpc −1 , demonstrating that it is the differences in the source population rather 21 , in our fiducial post-processing model and its contributing terms (as detailed in Eq. 2), for all simulations.To simplify this plot, we do not show any constituent cross terms of Δ 2 21 , which can however contribute significantly at some scales and ionization levels (e.g.Georgiev et al. 2022).We have grouped the simulations at redshifts when their average ionization fractions are similar.Solid thick curves show the fiducial results.Dashed curves denote the approximate contribution to Δ 2 21 from the gas density term, whereas dotted and solid thin curves represent the  HI term and the approximate contribution of the redshift-space velocity distortions (velocity term), respectively.x HI = 0.60 ± 0.1 Illustris TNG100 f 20.0% esc f 20.0% esc Illustris sources Figure 6.Relative 21cm power spectra for an experimental analysis of the TNG100 simulation, keeping all fiducial choices except we use the sources of ionizing radiation as predicted by Illustris.This test run is called "Illustris sources" (dashed dotted curves) and is compared to the fiducial Illustris and TNG100 predictions via the ratio of the 21cm power spectrum to TNG100.Each power spectrum is taken at an epoch where average  HI ≈ 0.6.The differences in the 21cm power spectrum at  ≤ 7 Mpc −1 in Illustris vs. TNG100 are caused by differences in the stellar populations rather than in the gas state.Thinner curves denote scales where the predictions are affected by resolution or box size.
than in the physical gas state that drive the most important differences in the 21cm power spectrum, for most scales.
Despite allowing for differences in the simulations' gas temperatures stemming from choices in the implementation of feedback, we find that accounting for temperature in our modeling does not significantly distinguish the 21cm power predictions of the simulations.This resemblance has two likely explanations: First, and foremost, we have limited ourselves to the post-heating epoch, so that the IGM gas in all three simulations is similarly hot.Second, it may be that we do not consider small enough scales for the temperature of the coolest, densest neutral clumps and the hottest outflows to significantly impact the 21cm power.

Galaxy properties predicted by the different simulations
To understand the role of different source populations in creating scale-dependent differences in the 21cm power spectra from simulation to simulation, we consider the galactic properties predicted by Illustris, TNG100 and Eagle and the underlying star formation and feedback models at play. Figure 7 shows several properties of galaxies as a function of dark matter halo mass at a representative redshift of  = 7.
First, we examine the stellar mass of galaxies (top left panel, panel A).Overall and for all simulations, the stellar mass increases between  h = 10 8 and 3 × 10 11 M ⊙ .This is expected as the more massive haloes are able to accrete and concentrate more gas in spite of stellar feedback, permitting more star formation.For the highest mass haloes ( h ≳ 2 × 10 10 M ⊙ ),  ★ is a power law function of  h .At these masses, the TNG100 and Eagle simulations are very similar, whereas Illustris galaxies have on average somewhat more stellar mass at fixed halo mass.For lower masses, stellar feedback is efficient at suppressing star formation, and this results in a knee in the average stellar mass to halo mass relation, giving way to a stronger decrease of stellar mass with decreasing halo mass.The mass at which this knee manifests itself arises from details in the physical models of each simulation. 6anel B shows the fraction of the total stellar mass contained in each halo mass bin.In other words, it shows which halo masses contain the most stellar mass.This is governed by two factors: the abundance of galaxies decreases with increasing mass, and stellar mass increases with halo mass.These competing trends give rise to a halo mass interval that forms the highest fraction of all stars.
The resulting curves are strikingly different for each simulation.For instance, stars in Illustris are more likely to form in > 10 10 M ⊙ haloes than in the other two simulations.Indeed, the halo mass bin that contains the highest fraction of stellar mass is 10 9 M ⊙ in TNG versus 2 × 10 9 M ⊙ in Eagle and 10 10 M ⊙ in Illustris.Further, the most star forming halo masses contain somewhat different fractions of the total stellar mass: ≈ 5 % in TNG100 and Illustris, but 8 % in Eagle.Despite the differences between TNG100 and Eagle, they have close fractions of stellar mass for  h ≳ 10 10 M ⊙ , and only differ for lower masses.Stars in TNG easily form in ≲ 10 9 M ⊙ haloes, while in Eagle these haloes form very little stellar mass, potentially related to resolution differences.For the highest mass haloes, all three simulations are almost identical.
Panel C shows the average mass fractions for gas (solid lines) and baryons (dotted lines).Each mass fraction is defined as the ratio between the gas mass (or baryon mass) and the total mass of all particles and cells associated with the SUBFIND sub-haloes (i.e.gravitationally-bound material).As before, TNG100 and Eagle are more similar to each other than to Illustris: the average baryon and gas fractions in Illustris haloes is nearly constant with halo mass.In contrast, the TNG100 and Eagle fractions are lower for all halo masses, and decrease until a few 10 9 M ⊙ , above which they rise with increasing halo mass.These differences between Illustris and TNG100 haloes are driven by physical models changes, including the non-monotonicity.In fact, there is no corresponding drop in the total halo gas fraction in TNG100-NR.This indicates that the halomass trends of panel C for TNG100 and Eagle are associated with cooling, star formation, and feedback.The latter can expel gas from the TNG100 and Eagle haloes, whilst heating their surroundings and slowing accretion of new gas from the IGM -predominantly at lower redshifts (e.g.Pillepich et al. 2018a).
The differing effectiveness of feedback between the simulations explains why the haloes that contain the most stellar mass are more massive in Illustris.For >3×10 9 M ⊙ , haloes in Eagle have the lowest average baryon and gas fractions, suggesting the strongest feedback effects.TNG100-NR actually has a lower gas fraction than TNG100, despite its absence of feedback.This may be due to the lack of gas cooling in TNG100-NR, slowing the accretion of gas into galaxies.These differences are reflected in the 21cm power spectra, e.g. in Fig. 5.The simulations with the highest gas fractions (Illustris and TNG100-NR) also have the largest small-scale power in the spectra of gas density.In TNG100 and Eagle, where the gas fractions are lower, the gas density power is also lower at small scales (Schneider & Teyssier 2015;van Daalen & Schaye 2015).We infer that stellar feedback in TNG100 and Eagle is capable of evacuating gas from galaxies out to ≤ Mpc scales (see Ayromlou et al. 2022, for a study of large-scale feedback-driven gas redistribution).

Connecting galaxy properties to the ionizing luminosity
We now move to connect the galactic stellar masses to their ionizing luminosities, in the right hand panels of Fig. 7.
Panel D shows the average galaxy ionizing luminosity (or  intr ).In our source modeling (Section 2.4.3), at fixed stellar population age and metallicitiy, the ionizing luminosity scales linearly with stellar mass.Thus, the average  intr increases with halo mass, roughly as  ★ .However, the conversion is not straightforward, because of the dependence on stellar ages and metallicities, which are affected by the detailed star formation and enrichment histories of individual galaxies.Indeed, the highest mass haloes in Illustris appear to have somewhat higher average  intr , with respect to the other simulations, than their  ★ alone suggests.
Panel E shows the fraction of the total ionizing luminosity produced in each halo mass bin.Qualitatively, this metric closely resembles the stellar mass counterpart (panel to the left).However, the halo masses that contain the largest fraction of overall stellar mass contain even larger fractions of the total ionizing luminosity.Further, these masses are slightly lower for the luminosity curves than for the stellar mass curves.As a result, the ionizing stellar photons overwhelmingly (roughly 50 %) come from ≈10 9 M ⊙ haloes in Eagle, whereas in TNG100 and Illustris the ionizing sources are more evenly distributed across halo masses.This implies different typical sizes and spatial distribution of ionized bubbles in Eagle with respect to the other two simulations, and could be associated with the higher  HI power seen in Eagle near  ≈ 0.8 Mpc −1 (Fig. 5).
To understand this difference with respect to the stellar mass fractions, we show the total halo ionizing luminosity per stellar mass (in units of number of photons with ℎ ≥ 13.6 eV s −1 M −1 ⊙ ), that we will call specific luminosity, in panel F. Across all three simulations, the specific luminosities of the lowest mass central galaxies are roughly 3 × 10 45 s −1 M ⊙ −1 .Between 3 × 10 9 M ⊙ and 3 × 10 10 M ⊙ , the specific luminosities of TNG100 and Eagle decrease to 10 45 s −1 M ⊙ −1 at 2 × 10 9 M ⊙ and 8 × 10 9 M ⊙ respectively.In this halo mass range, Illustris has significantly (≳ 0.4 dex) higher specific luminosity than the other two simulations.At higher mass, the TNG100 and Eagle specific luminosities increase to approximately 1.7 × 10 45 s −1 M ⊙ −1 at 10 11 M ⊙ .All this suggests that the different predictions of the 21cm power spectrum from the various simulations arise not just because of differences in the galactic stellar masses available to emit ionizing photons, but also from the properties of the stellar populations that have (on average) different specific luminosity.Panels G and H quantify the stellar mass-weighted, average stellar ages and metallicities of Illustris, TNG100 and Eagle galaxies.In all three simulations, galaxies are young, and have recently formed in the lowest mass haloes, while ages increase to a maximum near 10 10 M ⊙ , after which the ages either plateau (in Illustris) or decrease with increasing halo mass (TNG and Eagle), reaching a rough agreement for the highest mass haloes.Since the ionizing emissivity decreases rapidly with age, this propagates into differences in specific ionizing luminosity (panel F).In fact, the ordering of the age curves roughly gives the ordering of the specific ionizing luminosity curves, showing that stellar age is the main driver of differences in the ionizing luminosities of galaxies at fixed stellar mass.
The average age of TNG100 galaxies increases more rapidly at lower masses than in the other two simulations, leading TNG100 to have the lowest specific ionizing luminosities in low mass haloes.Between 10 9 M ⊙ and 5 × 10 9 M ⊙ , the average stellar age increases rapidly in all simulations.This is most pronounced in Eagle, leading its ionizing luminosities to drop.Beyond this point, the typical ages stagnate in Illustris, continue to rise in Eagle, and decrease in TNG100.Consequently the specific luminosity in TNG100 increases.For 10 10 M ⊙ and beyond, the specific luminosities in Eagle decrease rapidly, converging with the other simulations for massive haloes.These age comparisons map well to the differences between baryon fractions: the oldest populations have the lowest baryon fractions.When feedback suppresses star formation in a galaxy, the ionizing luminosity decreases via two channels.First, there is simply less stellar mass.Second, the missing stellar mass would have been young stars efficiently producing ionizing photons.
Panel H shows that the mass-weighted stellar metallicities of galaxies always increase with halo mass.This is readily understood as they x HI = 0.60 ± 0.1 Eagle TNG100 f 20.0% esc f 20.0% esc TNG populations Figure 8. Relative 21cm power spectra, comparing several numerical experiments.We compare our fiducial Eagle simulation result (solid green) to a case where we rescale the ionizing luminosity of the stellar particles as a function of galaxy halo mass, so that the average ionizing luminosity per stellar mass is the same as in TNG100.This test run is called "TNG100 populations (dashed dotted curves) and its 21cm power spectrum is compared to the predictions from the fiducial Eagle and TNG100 runs via a ratio, as in Fig. 6.Each power spectrum is taken at an epoch where average  HI ≈ 0.6.The higher 21cm power spectrum at ≈ 0.8 Mpc −1 of Eagle is partially explained by its older, redder, and more metal rich stellar populations.Thinner curves denote scales where the predictions are affected by resolution or box size.accrue more metals through successive stellar generations.For low and intermediate mass haloes ( halo ≲ 10 10 M ⊙ ), there are significant differences in the stellar metallicities.Illustris galaxies are more metal poor than in TNG100 and Eagle.However, the majority of stars in this galaxy mass range are metal poor ( absolute < 10 −3 ), falling below the minimum metallicities for which our stellar evolution model makes ionizing luminosity predictions.These differences therefore do not affect the ionizing emissivities of galaxies.At the same time, the metallicities of the higher mass haloes are fairly similar.Overall, this confirms that the differences in specific ionizing luminosity are mainly driven by differences in stellar population ages, as a result of differing stellar feedback models.

The impact of different stellar populations
To further investigate the impact of the stellar ages and metallicities for the 21cm power spectrum, we perform the following test.We make a variant of the fiducial model and CIFOG run for Eagle, where the ionizing luminosity of the stellar particles is scaled as a function of galaxy halo mass, so that the average ionizing luminosity per stellar mass is the same as in TNG100.In other words, we compensate for the luminosity-weighted effects of the differences in stellar population ages and metallicities between TNG100 and Eagle.
In Fig. 8, we compare this "TNG100 populations" run to the fiducial TNG100 and Eagle predictions.We find that the higher power bump in the power spectrum predicted by Eagle around ≈ 0.8 Mpc −1 is roughly half as strong in the "TNG100 populations" test.At the same time, it has roughly the same power as TNG100 at the smallest scales, whereas elsewhere it is interchangeable with the fiducial −0.04 Mpc −1 ) in TNG100, TNG100-NR, Illustris, and Eagle.We also show relevant upper limits from observations (black markers) and other hydrodynamical simulation-based predictions from the literature (black lines).The predictions of Reis et al. (2020) are represented roughly as a grey polygon.Simulation-based predictions lie within roughly 1 dex, but differ up to factors of several, depending on scale.
Eagle run.This confirms that scale-specific differences among our 21cm power spectrum predictions arise not only from differences in the stellar mass available to produce ionizing photons, but also in the typical ages (and metallicities) of the stellar populations in the different simulations.It also further shows that differences between the Eagle and TNG100 fiducial 21cm power spectra at ≈0.8 Mpc −1 arise from underlying choices in the baryon and stellar feedback models.

Comparison to previous work
Figure 9 shows the redshift evolution of the 21cm power spectrum,  2 b Δ 2 21 , at large scales predicted in our fiducial modeling by Illustris, TNG100, TNG100-NR and Eagle.Despite significant differences among the simulations and models, the large scale evolution of the 21cm power spectrum is similar across the board at  > 6.As time progresses there is a gentle rise caused by the formation of large ionized regions increasing the large scale variance of the 21cm signal.After  ≈ 5.7 the curves drop as the volumes become predominantly ionized.The large-scale 21cm signal from Eagle is slightly lower than in the other simulations.In Eagle and TNG100 reionization also occurs earlier than in Illustris and TNG100-NR.As our predictions lie at least 2 dex below the most constraining observational upper limits, the differences are currently too small to differentiate against.
Other results that use hydrodynamical simulations, either via postprocessing or by using fully-coupled RHD, find similar evolution of the large-scale 21cm power spectrum.In fact, all of these predictions can be confined in a area occupying less that one dex in 21cm power.This is striking.First, such simulations all adopt different implementations of baryonic and galactic astrophysics, and yet converge on what is to be expected on large spatial scales for the 21cm clustering.Second, this is not the case for other types of modeling, such as the semi-analytic works of Reis et al. (2020), who find a much greater diversity in large scale power, spanning close to 6 dex.In fact, the extremely high and low predicted large-scale 21cm power spectra arise when considering very late heating and very early reionization of the IGM (respectively), which are disfavoured if taking into account other constraints on the IGM and reionization, e.g. the electron scattering depth, and the optical depth in the Lyman- forest.In our case, we avoid early reionization scenarios by calibrating our source models for late reionization (as suggested by recent data e.g.Kulkarni et al. 2019a;Bosman et al. 2021) and adopt similar heating histories through our choices of UVB.

Implications
In this work we have quantified the differences among the predictions from state-of-the-art cosmological hydrodynamical simulations of galaxies.A general result arises: across all scales and considered epochs, our predictions for the global 21cm signal and the 21cm power spectra are within a factor of a few, even without accounting for the different reionization histories of the simulations.This is all the more impressive, as we have highlighted substantial differences amongst the galaxy populations and drivers of reionization in the different simulation volumes.When one considers the huge diversity of predictions from SAMs, that span many orders (e.g.Reis et al. 2020), our results appear to imply that many of these models lie outside of possible outcomes from current cosmological hydrodynamical simulations of galaxy formation.However, a large fraction of this diversity stems from different assumptions regarding the reionization and heating history of the Universe.Namely, our results are arguably in strong disagreement with models that heat up late, and/or reionize early.In addition, no observations at these high redshifts have been used to constrain the underlying galaxy-formation and feedback models adopted e.g. in Illustris, TNG and Eagle.
The largest differences among the simulations occur, not surprisingly, at small spatial scales (i.e. for  ≥10 Mpc −1 ) where the strength of supernova feedback models imprints a signature into the gas.At these scales, the signature of the escape fraction model (see App. A1) in the 21cm power is significantly smaller.As a result, future upper limits on the 21cm power spectrum at these scales can constrain high-redshift feedback models.Such effects are challenging to account for in the many analytical or semi-analytical works in the literature.In our approach, the resulting predicted differences in the 21cm signals are significant (up to a factor of two).However, they are also relatively modest compared to the large range of theoretical predictions.Thus, a significant leap in observations is needed to directly constrain these differences (e.g.Abdurashidova et al. 2022b;Koopmans et al. 2015;Kolopanis et al. 2022;Trott et al. 2021).Furthermore, at such small scales, it is more challenging to impose upper limits on  2 b Δ 2 21 (due to larger thermal noise at small scales; e.g.Abdurashidova et al. 2022a), limiting the feasibility of such a constraint in the near future.
We also identify an excess of 21cm power near  ≈ 0.8 Mpc −1 in Eagle with respect to the other simulations, due to topological differences in the  HI field.This arises from a large difference in the ionizing photon budget of haloes.In Eagle, most ionizing photons are produced in ≈10 9 M ⊙ haloes at  = 7, in much greater proportion than in Illustris or TNG.The result is differences in the typical ionized bubble sizes.We attribute these to feedback model choices, as Eagle galaxies have lower baryon fractions, and older stellar populations than TNG100 (the closest comparable simulation).Indeed the more efficient feedback in massive Eagle galaxies makes their stellar populations older, redder, and less ionizing.In fact, this latter manifestation of feedback appears to be the main driver of the  ≈ 0.8 Mpc −1 bump in 21cm power in Eagle.Although the relative difference in the resulting power spectra between the simulations is small (at most 1.5 times greater in Eagle), these scales are easier to observe.However, as we show in Appendix A1, changes to the escape fraction (and its dependence on halo mass) can also manifest similarly in the 21cm power.This complicates any interpretation of the 21cm signal at these scales.Indeed, the ionizing luminosities of galaxies are the product of stellar mass, ionizing efficiency, and escape fraction.Thus, a low escape fraction could be misconstrued as less efficient star formation, or stellar populations with low ionizing efficiencies.Fitting observations with a model that parameterizes all three of these aspects would therefore be challenging, as they are highly degenerate.At the same time, this highlights the potential importance of accounting for these ionizing efficiency effects: they may be key in providing more accurate ionizing escape fraction estimations from 21cm observations.

Limitations of the adopted methodology
Illustris, TNG100, and Eagle all have volumes of roughly 100 3 Mpc 3 .Although these are large volumes for galaxy formation simulations, it has been shown (Iliev et al. 2014) that volumes of at least ≈ 200 3 Mpc 3 are required to achieve convergence of ionized region sizes, and large-scale 21cm power spectrum statistics.On top of this, such volumes would allow us to predict measurements at larger scales, where upper limits on the 21cm power spectrum are more constraining.Although these larger volumes would be desirable for our study, we highlight the need to compromise between volume and resolution in order to be able to model galaxies realistically, in order to accurately capture the impacts of feedback.
At the same time, our works relies on post-processing the simulation-predicted fields, such as gas density and gas temperature, to determine the ionized fraction of hydrogen gas.This does not allow us to account for the reaction of the gas to photo-ionization and photo-heating in a self-consistent way, which has been shown to have an important impact on the IGM (D'Aloisio et al. 2020), and on the star formation of galaxies (Dawoodbhoy et al. 2018;Wu et al. 2019;Borrow et al. 2022).Our approach does not rely on solving the radiative transfer equations, but uses an approximate description using excursion set formalism with CIFOG (Hutter 2018).This method does not explicitly conserve the number of photons, leading to excessive ionization.Moreover, we do not track the evolving temperature of the gas, leading to inconsistencies.In particular,  b in neutral regions could be artificially low, due to heating that only occurs once the region is ionized (since the emission of the ionized regions is ≈ 0, the reverse situation is not problematic).In the future, full RT post-processing will enable additional physical effects to be captured, at the cost of additional computational expense.
Finally, in this work, we have focused exclusively on the ionizing radiation from stellar sources, which are thought to be the dominant drivers of reionization.As such, we do not model any ionizing contribution from AGN, although ionizing photons from AGN are accounted for in the simulation UVBs.However, they may play a role, especially in later reionization scenarios, when a higher density of luminous AGN are present.
We have also limited ourselves to times when the IGM has already been heated by a UV and X-ray background ( ≲ 8).The prior-heating stage has a profound impact on the 21cm signal (e.g.Pritchard & Furlanetto 2007;Fialkov et al. 2014;Ross et al. 2017), and extending to such times would require explicitly implementing and exploring separate X-ray source models, for AGN and X-ray binaries, for reionization and patchy (i.e.source driven) heating.This would imply handling X-ray ionization independently with their own source field and ionization cross sections, complicating our approach (for instance, see Sartorio et al. 2023, who study impact of X-ray binaries).Additionally, when moving to higher redshifts, the masses of the galaxies producing the most of the UV and X-ray emission decrease, a resolution challenge for the simulations considered here.Expanding to higher redshifts, when the heating of the IGM is not yet complete, may reveal other differences among the simulations.
Finally, in our current approach, we have also refrained from computing full light cones (as in Chapman & Santos 2019): our approach ignores scattering events, breaks down for large optical depths, and does not account for the width of the 21cm line.Further, our treatment of the gas peculiar velocity gradient remains approximate.That being said, we show that the velocity effects are comparatively small, and do not differ greatly between simulations, meaning that our conclusions regarding the velocities are unlikely to change considerably with a more realistic treatment of the velocities.Computing full light cones would also allow the creation of mock observations in a more straightforward manner.Full forward modeling including foreground effects would also allow stronger conclusions about the true feasibility of observing the features we discuss in this paper.For now, we reserve these steps for future studies.

SUMMARY AND CONCLUSIONS
In this work we have taken advantage of the rich baryonic physics of large cosmological hydrodynamical simulations of galaxies to predict the emission from the 21cm line of neutral hydrogen in the post-heating era ( ≈ 6 − 8).In particular, we have post-processed the outcome of three simulations -Illustris, IllustrisTNG (specifically TNG100), and Eagle in order to compare and contrast their predictions for 21cm observables.Our key results are as follows: • The predicted global 21cm signal and 21cm power spectra of Illustris, IllustrisTNG, and Eagle are similar, with differences up to a factor of a few at most.These differences are of the same order as the uncertainties in the post-processing modeling, even on small scales ( ≥ 5 Mpc −1 ).
• Variations between the predicted power spectra of the 21cm brightness temperature at  ≲ 8 are mainly driven by large-scale differences in the progress of reionization.
• On the smallest spatial scales,  ≥ 10 Mpc −1 , the three simulations have more significant differences, due to their different models for baryonic and feedback physics.Illustris predicts significantly more power than TNG or Eagle.This occurs due to stellar feedback differences, particularly the higher galactic wind velocities of TNG versus Illustris.The resulting outflows impact the gas densities and velocities around galaxies.At the same scales, the TNG100 and Eagle 21cm power spectra are similar.
• On intermediate spatial scales,  ≈ 0.8 Mpc −1 , we find that key aspects of the sources of ionizing radiation impart features.Eagle predicts a higher 21cm power spectrum, driven by more power in the  HI field.Compared to the other simulations, Eagle galaxies in haloes of ≈10 9 M ⊙ produce a larger fraction (≈ 50 % at  = 7) of the total ionizing radiation, influencing the topology of reionization.This is partly due to the stellar feedback in Eagle, which has overall stronger impacts than in TNG in the most massive haloes, leading to lower star formation and older, redder stellar populations.These 21cm power spectrum features are similar to those imprinted by different choices of the escape fraction.
• As a result, future measurements of the 21cm signal, in addition to other reionization observables, could constrain the details of feedback and the astrophysics of galaxy formation, at small scales (≥10 Mpc −1 ).At larger scales (≈ 0.8 Mpc −1 ), the signature of feedback is degenerate with the escape fraction.For a given cosmology, data therefore jointly constrains the source model, feedback, and other galactic astrophysics at these scales.
Our findings suggest that current state-of-the-art cosmological hydrodynamical simulations of galaxies, coupled with post-processing modeling, are viable and compelling tools to study the 21cm signal at large scales during the epoch of reionization.In fact, the impact of stellar feedback on the ages and metallicities, and thus ionizing emissivities, of stellar populations is often omitted in the modeling of the 21cm signal, which instead, for a given predicted 21cm signal, we find to be largely degenerate with the ionizing escape fraction model.Our findings therefore highlight the need for properly accounting for these effects in order to e.g.constrain the ionizing escape fraction from the current observed limits on 21cm power.On the other hand, future studies with fully coupled radiative transfer, as well as higher resolution hydrodynamical simulations with resolved interstellar medium and supernova feedback physics, will enable more quantitative predictions for the astrophysics within the 21cm signal.

DATA AVAILABILITY
Data directly related to this publication and its figures will be made available on request from the corresponding author.The Eagle, Illustris and TNG simulations are publicly available (Nelson et al. 2015;McAlpine et al. 2016;Nelson et al. 2019b).Illustris and TNG are accessible in their entirety at www.tng-project.org/data.
• Finally we also consider a model in which sources associated with the central galaxy of detected dark matter haloes are the only contributors of photons towards reionization.We assume that the photon escape fraction for these galaxies decreases with increasing host halo dark matter mass as reported by some recent simulations (see, for instance, Lewis et al. 2020;Rosdahl et al. 2022).Our model sets  esc for central galaxies in haloes of  h ≤ 5 × 10 9  ⊙ to 0.3.Beyond this threshold,  esc decreases with increasing halo mass according to a power law of exponent −0.5.This is the 'halo  esc ' model or  esc ( h ).
In each of these cases, we calibrated the different  esc models so as to yield a similar (within ≈ 50%) cumulative total emission of ionizing photons from  = 15 till  = 5.5 in TNG100.We then test these configurations by running them with CIFOG on TNG100 to ensure that their reionization redshift is similar.Finally, we apply the calibrated models to all the studied simulations.
Figure A1 shows the ratio of Δ 2 21 for the different considered  esc models in TNG, TNG100-NR, Illustris, and Eagle with respect to the fiducial choice (  20% esc ).Overall, the choice of source model affects the final 21cm power spectrum by roughly 40% or less, with the most dramatic changes occurring at larges scales.The amplitude of these variations is similar to the large scale differences we observe between simulations in Sec.2.3.
For the halo mass model (or  esc ( h ); dotted lines), we find moderate to large differences near  ≤ 1 Mpc −1 , with either a slight re-normalisation or a gentle decrease towards smaller scales.This is reminiscent of the results from the "TNG populations" (Fig. 8), and "Illustris sources" (Fig. 6).Indeed, by introducing the right halo mass dependence to the escape of ionizing photons, we can mimic the source field of a different simulation that has a different stellar mass to halo mass relation.
For the evolving model (or  esc (); dashed lines), and for  HI > 0.30, we find little difference at  = 1 Mpc −1 , with all simulations showing differences at larger and smaller scales.Whereas when  HI = 0.30, the evolving model is close to the constant model in TNG and Illustris (barring some re-normalisation).The scale dependent changes seen in the evolving model with respect to the constant model may seem surprising, as at fixed redshift, we do not alter the escaping luminosities amongst galaxies, but apply a common escape fraction to all sources.However, the driving sources of reioinzation evolve over time.Thus, by introducing a time evolution in  esc we can favour the time integrated contribution of specific halo populations, affecting the sizes and distributions of ionized regions.By  HI = 0.30, these inherited differences wash out, explaining the shift with respect to the constant model.

A2 Brightness temperature model variants
Throughout this work, we consider several model variants, in addition to our fiducial model: -model A: Simplest model, assumes a null line of sight velocity gradient, and  CMB ≈  s .
-model B: Model A, but  CMB ≠  s , and  s ≈  gas .
-fiducial: As described in 2. Model B, but accounting for the LoS velocity gradient from the gas (Note that for TNG100-NR we assume  CMB ≈  s ).results, that do not include any velocity effects (redshift space distortion along the LoS), are very similar to each other, but substantially different to the fiducial model that does account for these effects.
Accounting for velocity effects incurs up to a 20% reduction in power (near ≈10 Mpc −1 ) for all simulations, with the scale at which the maximum difference occurs being slightly different between simulations.At large scales ( ≤1 Mpc −1 ) and when average  HI < 0.6, velocity effects increase power in Eagle.As reionization progresses, velocities have a much greater large-scale effect in Eagle, increasing  2 b Δ 2 21 by up to ≈ 35% at the largest scales.Since the model A and B curves are indistinguishable, there are no significant effects from the temperature of the gas.At large scales, this is understandable as we purposefully place ourselves at times when the IGM has already been heated (and the temperature term in Eq. 2 is ≈ 0) However, at small scales this could appear surprising.Indeed, different stellar feedback prescriptions alter the density and velocities of the gas at such scales, therefore one could have reasonably expected to see differences arising from the hot gas expelled by supernova and AGN feedback.In fact, for there to be an appreciable difference, one would require the differences in hot winds to significantly alter the term 1 −  gas / CMB (i.e. for the unheated gas to be cool or close to the temperature of the CMB), and this may only occur at very small scales which are smoothed over in our predictions.

A3 Resolution and convergence
Figure A3 shows the ratio of  2 b Δ 2 21 at ⟨ HI ⟩ = 0.85, 0.50, 0.38 and at two different resolutions (256 3 and 512 3 ) with respect to the full resolution of 1024 3 cells (full curves).For each resolution, we perform the entire post-processing pipeline as described in Sec. 2. For all average  HI , the two highest resolution results agree to within 10% for  ≲ 5.0 Mpc −1 (accounting for noise and aliasing effects at the largest scales).At smaller scales (larger k), where the  2 b Δ 2 21 is dominated by the the gas density terms, the 512 3 and 1024 3 are different by up to 40% (at the smallest resolved scales).This makes intuitive sense, as lower resolution smooths the small scales and suppresses the variance in the gas density terms of  2 b Δ 2 21 , leading to lower power.Thus, were the grids higher resolution, one could expect higher small scale  2 b Δ 2 21 predictions.Seemingly, larger more resolved grids would be required (e.g. at least 4096 3 ) for our predictions to be converged at ≥10 Mpc −1 .This is confirmed by examining the dashed curves that show the power contribution of the hydrogen density.Indeed from  ≲ 5.0 Mpc −1 , there are > 20% differences in the hydrogen density power, explaining all the small scale resolution effects.The large scale differences in power arise because the 256 3 CIFOG output is at such a different redshift, and a different TNG100 snapshot was used to compute the power spectra.The remaining large scale discrepancies can be explained by differing ionization power, which is sensitive to changes in the average ionization of the various CIFOG outputs used for this comparison.

Figure 1 .
Figure 1.Evolution of the absolute differential brightness temperature of the 21cm line, for the TNG100, Illustris and Eagle simulations (top to bottom), from  = 8 (left) to  = 6 (right).All are computed with our fiducial post-processing model.Hydrogen becomes progressively more ionized in a spatially localised and heterogeneous manner, reflecting the distribution of ionizing photon sources.This causes the brightness temperature values (  b ) to decrease in correspondingly overdense regions.Within ionized regions, the remaining neutral gas in self-shielded areas emits a weak signal.Despite Illustris and TNG100 sharing the same initial conditions, their maps differ, showcasing the impact of the galaxy formation physics and feedback models in determining  b .

Figure 2 .
Figure2.Left: Time evolution of the mean differential brightness temperature of the 21cm line for TNG100 (blue), Illustris (red), and Eagle (green).Diamonds show our fiducial post-processing model, whereas the shaded areas encompass all our explored models.Also shown are the analytical prediction ofMirocha et al. (2017) in the saturated case, and the predicted mean signal from the full radiation and hydrodynamics simulation ofGillet et al. (2021).Middle: Average brightness temperature as a function of average  HI (used as a proxy for the progress of reionization; right).Right: Average  HI as a function of redshift in the fiducial model.The three simulations predict very similar  b at fixed  HI , whose evolution is largely set by the decrease of average  HI during reionization.In this figure, the results from other works appear truncated so as to represent a similar redshift range as in the left hand panel.

Figure 3 .
Figure3.From left to right and top to bottom: evolution of the power spectrum of the 21cm brightness temperature,  2 b Δ 2 21 , according to TNG100, Illustris, Eagle, and TNG100-NR in our fiducial post-processing model.The colours denote the progression of the power spectra from  = 8 (light) to  ≤ 5.7 (dark).The associated colour bars also show the average neutral fraction at each corresponding redshift.Curves are thinner on scales that are compromised by aliasing or box size effects.For all simulations, the 21cm power spectrum first increases with time at large spatial scales; then, as a greater fraction of the volume is ionized, it decreases homogeneously over all scales.

Figure 4 .
Figure 4. Direct comparison of the power spectra of the 21cm brightness temperature,  2b Δ 2 21 , from TNG100, Illustris, Eagle, and TNG100-NR across time.Coloured curves show the fiducial results, whereas the shaded areas indicate the spread of results using different post-processing models for  b (Section 2.4.4 and Appendix A2).Thinner curves denote scales where the predictions are affected by resolution or box size.Top row: power spectra for all four simulations at fixed redshifts.Middle row: power spectra grouped so that the average neutral fraction ( HI ) in every panel is similar.Bottom row: Same as the middle row, but normalising all power spectra by the TNG100 result to quantify the level of (dis)agreement.

Figure 5 .
Figure 5. Normalized 21cm power spectrum, Δ 221 , in our fiducial post-processing model and its contributing terms (as detailed in Eq. 2), for all simulations.To simplify this plot, we do not show any constituent cross terms of Δ 2 21 , which can however contribute significantly at some scales and ionization levels (e.g.Georgiev et al. 2022).We have grouped the simulations at redshifts when their average ionization fractions are similar.Solid thick curves show the fiducial results.Dashed curves denote the approximate contribution to Δ 2 21 from the gas density term, whereas dotted and solid thin curves represent the  HI term and the approximate contribution of the redshift-space velocity distortions (velocity term), respectively.

Figure 7 .
Figure 7.Average properties of the central galaxies in TNG100, Illustris, and Eagle as a function of halo mass, at  = 7. Thinner curves show the means in bins with fewer than two galaxies (in these bins, the individual galaxies are also shown).The areas around curves shows the central 90 % result obtained from bootstrapping the sample of galaxies.Starting from the top left: Panel A: Galaxy stellar mass -The vertical coloured lines show the minimum possible mass of a halo resolved by 100 dark matter particles in each simulation.Panel B: Ionizing photon luminosity from the stellar particles of central galaxies.Panel C: Fraction of total stellar mass in the central galaxy haloes.Panel D: Fraction of total ionizing luminosity from central galaxies.Panel E: Baryon, gas, and stellar mass fractions to total mass.Panel F: Ionizing photon luminosity per stellar mass -The black triangular marker shows the ionizing luminosity in our modeling of a young (1Myr), low metallicity ( absolute = 10 −3 ) stellar population of mass 10 6 M ⊙ .Panel G: Mass-weighted stellar ages, i.e. average galaxy-wide stellar ages weighted by the stellar-particle masses.Panel H: Mass-weighted stellar metallicity.

Figure 9 .
Figure 9. Redshift evolution of the 21cm power spectrum at large scales (0.2 +0.05 −0.04 Mpc −1 ) in TNG100, TNG100-NR, Illustris, and Eagle.We also show relevant upper limits from observations (black markers) and other hydrodynamical simulation-based predictions from the literature (black lines).The predictions ofReis et al. (2020) are represented roughly as a grey polygon.Simulation-based predictions lie within roughly 1 dex, but differ up to factors of several, depending on scale.

FigureFigure A1 .
FigureA2shows the ratio of  2 b Δ 2 21 for the different models in TNG,TNG100-NR, Illustris, and Eagle with respect to the fiducial model of each simulation.Each panel shows the results from the simulations when average  HI was similar.The model A and model B

Figure A2 .
Figure A2.Different  b models and their relative impact on the  2 b Δ 2 21 with respect to our fiducial model.Including redshift space distortion effects (the difference between model B and the fiducial model), has a substantial effect on the PS at large scales in Eagle and ≈1 Mpc −1 scales in all the simulations.

Figure A3 .
Figure A3.Comparison between 21cm brightness temperature, density, and ionization power spectra from the TNG100 simulation.In each panel, power is computed at redshifts where the average ionized fraction is similar.using three different underlying Eulerian grids of a different resolutions.The results are given in fractions of the 1024 3 results.

Table 1 .
Summary table of simulation characteristics, key differences in physical model assumptions and methods, and references for the important physical model aspects.