21cmEMU: an emulator of 21cmFAST summary observables

Recent years have witnessed rapid progress in observations of the Epoch of Reionization (EoR). These have enabled high-dimensional inference of galaxy and intergalactic medium (IGM) properties during the first billion years of our Universe. However, even using efficient, semi-numerical simulations, traditional inference approaches that compute 3D lightcones on-the-fly can take $10^5$ core hours. Here we present 21cmEMU: an emulator of several summary observables from the popular 21cmFAST simulation code. 21cmEMU takes as input nine parameters characterizing EoR galaxies, and outputs the following summary statistics: (i) the IGM mean neutral fraction; (ii) the 21-cm power spectrum; (iii) the mean 21-cm spin temperature; (iv) the sky-averaged (global) 21-cm signal; (vi) the ultraviolet (UV) luminosity functions (LFs); and (vii) the Thomson scattering optical depth to the cosmic microwave background (CMB). All observables are predicted with sub-percent median accuracy, with a reduction of the computational cost by a factor of over 10$^4$. After validating inference results, we showcase a few applications, including: (i) quantifying the relative constraining power of different observational datasets; (ii) seeing how recent claims of a late EoR impact previous inferences; and (iii) forecasting upcoming constraints from the sixth observing season of the Hydrogen Epoch of Reionization Array (HERA) telescope. 21cmEMU is publicly-available, and is included as an alternative simulator in the public 21CMMC sampler.

A popular alternative approach is to use emulators (e.g.Kern et al. 2017;Schmit & Pritchard 2017;Shimabukuro & Semelin 2017;Jennings et al. 2019;Ghara et al. 2020;Mondal et al. 2022;Bye et al. 2022a;Lazare et al. 2023).Once trained on a set of simulation outputs, an emulator can replace the expensive, on-the-fly simulation step in Bayesian inference: a single likelihood evaluation taking ∼ 0.1 s instead of ∼ 1 hour.As such, the computational cost is amortized, requiring only the initial database of simulations in order to perform subsequent, inexpensive inferences 1 .Of course, such amortized inference is restricted to the theoretical model that is used to train the emulator.Moreover, there is also the additional emulator error to account for, which can be non-negligible for high precision measurements and in corners of parameter space that are poorly sampled (e.g.Kern et al. 2017, Appendix B of HERA22).Nevertheless, emulators allow us to rapidly perform many inferences of the same model, testing the impact on the posterior of different likelihood choices, priors, and new data.Moreover, the emulator error is subdominant compared with current, relatively low S/N observations, such as the 21-cm power spectrum upper limits.
Here we present 21cmEMU 2 -a public emulator of several summary outputs from the semi-numerical code 21cmFAST 3 .These include (i) the volume-averaged hydrogen neutral fraction; (ii) the 21-cm power spectrum; (iii) the global 21cm brightness temperature; (iv) the neutral IGM spin temperature; (v) the ultraviolet (UV) luminosity functions (LFs); (vi) the Thomson scattering optical depth of CMB photons.Our emulator was trained on summary observables from the withHERA inference in HERA22, which sampled nine astrophysical parameters that characterize galaxy properties.As a result, our work presents a few important improvements over previous emulators.The unprecedented number of summary outputs allows us to include complementary, multi-wavelength probes of high- galaxies and the EoR when computing the likelihood.Moreover, our physically-motivated galaxy parametrization (Park et al. 2019) allows us to easily motivate different choices of priors.We will periodically update 21cmEMU to include new summary outputs and astrophysical models.
We showcase our emulator by re-analysing the HERA power spectrum upper limits published in HERA22.We also perform inferences including various combinations of the data, illustrating the constraining power of each probe on the posterior.One call of 21cmEMU takes ∼ 0.1 s (compared to ∼ 1 hour for 21cmFAST), with a typical inference finishing in a few hours.
This paper is organized as follows.In Section 2, we introduce the data used to train the emulator.In Section 3, we introduce the network and discuss its architecture, training procedure, and performance.In Section 4, we showcase applications of the emulator to EoR/CD inference problems.We conclude in Section 5. We assume a flat ΛCDM cosmology, with (Ω Λ , Ω  , Ω  , ℎ,  8 ,   ) = (0.69, 0.31, 0.049, 0.68, 0.82, 0.97), consistent with results from (Planck Collaboration et al. 2020).Unless stated otherwise, all lengths are in comoving units.
1 Another form of amortized inference is to train neural density estimators to fit the likelihood or likelihood / evidence ratio using simulated data (e.g.Papamakarios et al. 2018;Alsing et al. 2018;Cole et al. 2022).This is referred to as simulation-based inference (SBI), and has the additional benefit of not having to specify an explicit functional form for the likelihood.SBI has recently been applied to mock 21cm observations (Zhao et al. 2022a,b; 2 SIMULATED DATASET Our datasets for training and testing are taken from the withHERA inference from HERA22, using an increased number of livepoints (18k).This inference used the Multinest (Feroz et al. 2009) sampler in 21cmMC4 (Greig & Mesinger 2015, 2017, 2018) with a flat prior on all astrophysical parameters within the ranges shown in all of the corner plots (e.g. Figure 6).The likelihood was determined by current observations of the EoR history, galaxy luminosity functions and 21cm upper limits (discussed in detail in Section 4.1).
We use all of the Multinest outputs, including both accepted and rejected samples, resulting in 1.8M parameter samples.Of these, we randomly select 1.28M for training, 183k for validation and 330k for testing.The database is standardized (subtract the mean and divide by the standard deviation of each summary statistic) before being passed into the network for training.
Our datasets are generated with the public 21cmFAST v3 code (Mesinger & Furlanetto 2007;Mesinger et al. 2011;Murray et al. 2020).21cmFAST is a semi-numerical simulation code that operates under the assumption that dark matter halos host galaxies which source inhomogeneous, large-scale cosmic radiation fields.Matter density and velocity fields are generated using second-order Lagrangian perturbation theory (e.g.Scoccimarro 1998).Galaxy properties are assigned to dark matter halo fields using empirical scaling relations, following the parametrization in Park et al. (2019).The ionizing, X-ray and soft UV cosmic radiation fields sourced by these galaxies are computed with a combination of excursion set and direct integration along the lightcone.The ionization and thermal state of the IGM gas are then tracked with a set of coupled differential equations, allowing us to compute the various observables discussed below.For further details on the simulation code, the interested reader is directed to (Mesinger & Furlanetto 2007;Mesinger et al. 2011;Murray et al. 2020).Below we summarize the astrophysical parameters used as input to 21cmEMU, and the summary observables that are the corresponding output.

Galaxy model and astrophysical parameters
The input consists of nine parameters that characterize bulk galaxy properties.Two parameters (  * ,10 ,  * ) describe the stellar-to-halo mass relation (SHMR), which is a power-law for the faint galaxies (hosted by  ℎ ≲ 10 12  ⊙ halos) that dominate the cosmic radiation fields at  > 5 (e.g.Kuhlen & Faucher-Giguère 2012;Dayal et al. 2014;Behroozi & Silk 2015;Mutch et al. 2016;Sun & Furlanetto 2016;Yue et al. 2016): Here Ω  is the universal baryon energy density (as a fraction of the critical energy density), Ω  is the total matter (i.e.cold dark matter and baryon) energy density, and  * ≡  * ,10 is the stellar fraction, with  * ,10 corresponding to the fraction of galactic gas in stars normalized to the amount in a halo of mass  10 ≡ 10 10  ⊙ , and  * the power-law index.
Star formation is assumed to occur on a time-scale that goes with the Hubble time,  −1 (), (or analogously the dynamical time, which also scales with the Hubble time during matter domination): . (2) The characteristic star formation timescale,  * ∈ [0, 1], is another free parameter.
Star formation is suppressed in small mass halos due to inefficient gas cooling and/or feedback (e.g.Hui & Gnedin 1997;Springel & Hernquist 2003;Okamoto et al. 2008;Sobacchi & Mesinger 2013;Xu et al. 2016;Ocvirk et al. 2020;Ma et al. 2020).We account for this suppression by assuming only a fraction exp (− turn / ℎ ) of halos host active star forming galaxies.The characteristic halo mass scale below which the abundance of galaxies is exponentially suppressed,  turn , is another free parameter.
The specific X-ray luminosity escaping the galaxies is also taken to be a power-law in energy (e.g.Das et al. 2017),   ∝  −   , with the index   left as a free parameter.The luminosity is normalized via the soft band X-ray luminosity per unit SFR, another free parameter: where  0 , the last input parameter, is the minimum energy of X-ray photons capable of escaping their host galaxy.In summary, the nine input parameters are: (i)  * ,10 : normalization of the SHMR, defined at  ℎ =  10 .
(iv)  esc : power-law index of the ionizing UV escape fraction to halo mass relation.
(v)  * : characteristic star formation timescale, defined as a fraction of the Hubble time.
(vi)  turn /M ⊙ : characteristic mass below which halos become exponentially less likely to host an active star forming galaxy. (vii) : soft-band X-ray luminosity per unit SFR escaping the galaxies.
(viii)  0 /keV: minimum X-ray energy that can escape the galaxies.
(ix)  X : power-law index of the X-ray spectral energy distribution (SED).This simple parametrization is easy to interpret physically and is consistent with observations of the UV LFs as well as the scaling relations found in galaxy simulations and semi-analytic models.

Observational summaries
For a given set of cosmological and astrophysical parameters, 21cmFAST calculates the corresponding 3D lightcones of IGM properties.When performing inference, these lightcones are generally compressed into summary statistics that are compared directly with observations.Here we do not attempt to directly emulate the 3D lightcones of the various cosmological quantities, and instead only emulate the following summary observables (motivated by existing EoR/CD observations discussed in Section 4.1): (i)  HI () -the volume-averaged neutral fraction of hydrogen and helium as a function of redshift (also commonly referred to as the EoR history).
(ii)  b () -the volume-averaged (global) 21cm brightness temperature (e.g.Madau et al. 1997;Furlanetto 2006;Pritchard & Loeb 2012): , where  21 is the 21cm optical depth of the intervening gas,   ≡ / ρ − 1 is the baryon overdensity, with  being the baryon density, and  S and  R are the spin and background temperatures, respectively.We assume throughout that the radio background is provided by the CMB,  R =  CMB is the temperature of the CMB.We note that 21cmFAST computes the brightness temperature at each cell location, x, using the exact expression in the first line of the equation above; the second line is a Taylor expansion in the limit of  21 ≪ 1 that provides physical intuition.
(iii)  S () -the mean spin temperature of the neutral IGM as a function of redshift.The IGM spin temperature is only defined for neutral hydrogen that is outside of the cosmic HII regions that surround galaxies.Specifically, the volume average is performed over those cells in the simulation box with  HI ≥ 95 %.
(v) ( 1500 , ) -the non-ionizing UV luminosity function (UV LF), defined as the number density of galaxies per UV magnitude,  1500 , as a function of redshift.The ∼1500 Å rest frame luminosity is calculated from the SFR:  * ( ℎ , ) = K UV ×  UV , where K UV = 1.15 • 10 −28 M ⊙ yr −1 Hz s erg −1 assumes a Salpeter initial mass function (e.g.Madau & Dickinson 2014;Sun & Furlanetto 2016).The UV luminosity is related to the AB magnitude using (Oke & Gunn 1983): log      , where   is the Thompson scattering cross section and   is the electron number density calculated assuming hydrogen and helium are singly ionized at a fraction (1 −  Hi ) and that helium is doubly ionized at  < 3.
Although the last two quantities are computed analytically by 21cmFAST and are therefore reasonably fast, we still emulate them for two reasons.The first is to provide users of 21cmEMU with a standalone package.The second is that the analytic calculation is still slower than the emulator prediction time: emulation reduces the runtime from ∼ 1 s to ≲ 50 ms for a single parameter combination (≲ 1 ms per parameter set if in a large (≳ 100) batch), with a relatively low emulation error (see Section 3).
We use 84 redshift bins in the range  ∼ 5 − 35 for all summaries except the 21cm PS.For the 21cm PS we exclude high redshift bins that generally have a very weak signal, keeping 60 redshift bins spanning  ∼ 6 − 21, and 12  bins spanning  ∼ 0.04 − 1 Mpc −1 .

EMULATOR ARCHITECTURE AND PERFORMANCE
21cmEMU is implemented using Tensorflow (Abadi et al. 2015;Developers 2022), with an architecture consisting of (see diagram in Fig. 1): • one large block (8 layers with 1k nodes each) of fully-connected (dense) layers whose output is fed into all of the branches.
• one branch per summary observable.
Since the 21-cm power spectrum is a smooth function of wavemode and redshift (e.g.Fig. 3), it can be interpreted as a 2D image.Therefore we use a convolutional neural network (CNN) in the 21-cm PS branch and fully-connected layers in the other branches.The network trains on all of the summaries at once (i.e.multi-task learning), using a weighted sum of root mean squared error (RMSE) losses with one term per summary, where each observable has a different weight.We assign the largest weight to the 21-cm PS as it is higher dimensional with the largest dynamic range, and thus more difficult to learn.
In Figure 2, we show the total training and validation losses as a function of epoch in black and orange, respectively.We also show the learning rate schedule used during training with a dashed gray line.We see a smooth decline of the validation loss up to ∼ 100 epochs.Our final network is taken at the minimum of the validation loss, at epoch 150.The training takes about eleven GPU hours (∼3.5 min per epoch) with the full database (1.8M samples).
Below, we discuss the branch architecture and performance for each summary observable in turn, summarizing the results in Table 1.Throughout, we illustrate the emulator performance using examples from the test set, as well as the distributions of absolute differences (Abs Diff) and fractional errors (FE) over the entire test set.The latter two are defined for each observational summary, , as: where  true refers to the 21cmFAST direct simulation output and  pred is the corresponding 21cmEMU prediction.We compute the above averaged over different bins in  and/or different models in the test set, as described below.One drawback of the FE metric is that it can diverge to infinity as the denominator goes to zero.To avoid this, we use floors for the values of the denominator: log(Δ 2 21,floor ) = 0.1; T,floor = 5 mK, and xHI,floor = 10 −4 .The other summaries,  e ,  S , and UV LFs do not have a floor value.

The 21-cm power spectrum
The power spectrum branch consists of thirteen 2D convolution layers with wide (up to 7 redshift bins × 3 k bins) kernels and two upsampling layers that gradually build the (, ) PS image based on the output of the shared block, as seen in Figure 1.We use a pixelbased RMSE loss, weighted by the inverse of the estimated thermal noise corresponding to a 1000h SKA1-low observation (taken from Prelogović et al. 2022; for more details see Section 2.2.1 in that work).Weighting by the inverse of the noise forces the CNN to be more accurate in (, ) bins that are easier to observe: generally corresponding to lower redshifts and larger scales.
In Figure 3, we compare the emulator prediction for the 21-cm power spectrum with its corresponding target from 21cmFAST.We show a single sample from the test set, with the 21cmEMU prediction on the left and the 21cmFAST target in the middle panel.This sample was chosen as it has the closest median fractional error to that of the entire test set; thus it can be considered representative of the typical emulator performance.It is difficult to see a difference between the two PS with the naked eye.We the FE of this single sample in the rightmost panel.The FE is generally sub-percent, rising to ∼percent in regions of low power.
In these 2D images we clearly see the well-known trend of three peaks in the redshift evolution of the large-scale 21-cm PS and two peaks in the small-scale evolution (e.g.Pritchard & Furlanetto 2007).In general, the features evolve smoothly over (, ), showcasing why we use a CNN in the 21-cm PS branch of 21cmEMU.
We quantify the 21-cm PS prediction error in the top left panel of Figure 4.In the top sub-panel, we plot the redshift evolution of the PS amplitude at  = 0.1 Mpc −1 , with 21cmEMU predictions shown via dash-dotted lines and the corresponding 21cmFAST targets shown with solid lines.We chose to plot  = 0.1 Mpc −1 because the strongest constraints by current interferometers are around these scales; smaller scales are dominated by thermal noise and larger scales by foregrounds (e.g.Mertens et al. 2020;Trott et al. 2020;The HERA Collaboration et al. 2022a,b).The ten models plotted here were chosen at random from the test set.We again see that the differences between the emulator and "truth" are difficult to spot with the naked eye.
In the bottom sub-panel we show the absolute difference between each pair of curves in the top sub-panel, as well as the median absolute difference (dashed black line) and the 68% / 95% confidence limits (CL; dark / light gray) computed over the entire test set.We see that the median (68%) 21cmEMU absolute error at This translates to a median (68%) fractional error of 0.70% (1.0%)5 at this wavemode and 0.55% (2.4%) when averaged over all wavemodes.This is far below observational uncertainties in the near-term, thus justifying the use of an emulator.The error rises slightly at lower redshifts, owing to the broader distributions of possible PS, including very small values post reionization.In Appendix A, we show the evolution of the 21-cm power spectrum fractional error as a function of the input 9D astrophysical parameters.

The 21-cm global signal
The 21-cm global signal branch consists of seven fully-connected layers with 600 -1000 nodes each.We quantify the performance of 21cmEMU on the global signal in the top right panel of Figure 4. We show the redshift evolution of the global signal (top) and absolute differences (bottom) for the same ten random samples from the test set.
As for the 21-cm power spectra, the difference between the 21cmFAST calculation and 21cmEMU prediction is difficult to see with the naked eye and is generally ≲ 1 mK.We see from the bottom sub-panel that the 95% CL of the errors in the test set is also ≲ 1 mK.This translates to a median (68%) FE of 0.25% (0.82%).
We see from both the global signal and the PS that our training set spans a wide range of heating and ionization histories.This is due to the fact that we include both accepted and rejected livepoints of the HERA22 inference in the training set, in order to have the largest dataset possible.Extending beyond the ranges of the most likely models allows 21cmEMU to generalize beyond the HERA22 posterior distribution, accurately predicting even unlikely models that, e.g. have not reionized by  = 5.

The 21-cm spin temperature in the neutral IGM
The  S branch consists of five fully-connected layers with 400 nodes each.We quantify the network performance on the mean 21-cm spin temperature in the right panel of the middle row of Figure 4.In the top sub-panel, we show ten examples of the emulated spin temperature curve (dash-dotted line) and the corresponding true curves from the test set (solid line).In the bottom sub-panel of the plot, we show the absolute error for each of the ten examples, the median for the entire test set with the black dashed line, and the 68% / 95% CL regions in shaded in dark/pale grey as a function of redshift.We can see that the absolute difference is log T S,true /K − log T S,pred /K < 0.01 at 95% CL over most of the redshift range.The FE of the log of the mean spin temperature over the entire test set is 0.032 % and the 68% CL is 0.13 %.
We recall that the spin temperature is calculated by taking the global average over all cells in the simulation box that have  HI ≥ 95%.When there are no cells satisfying this condition, the spin temperature becomes undefined.We account for this by having the emulator predict the redshift at which the spin temperature becomes undefined. 6The emulator correctly predicts the exact redshift bin below which  S becomes undefined for 95.1% of the models in our test set, and is only one bin off (Δ ∼ 0.1) for 4.89% of the models.

The global history of reionization
The EoR history branch, like the spin temperature branch, consists of five fully-connected hidden layers with 500 nodes each.In the left panel of the middle row in Figure 4, we show the EoR histories of our ten parameter samples (top sub-panel), and the corresponding prediction error (bottom sub-panel).We see that the absolute differ- ences are ≲0.005 for 95% of the models in the test set.The FE is 0.0075% for the median and 0.095% at the 68% CL.

The CMB Thomson scattering optical depth
The Thomson scattering optical depth branch consists of three layers of 30 nodes each as it outputs only one number.We show the FE of the  e prediction in the lower right panel of Figure 4.The ten parameter samples are denoted with different color dots.Over the entire test set, we see a median fractional error of 0.1% and a 0.25% FE at 68% CL.There is a notable increase in the prediction error as well as its bin-to-bin variance toward higher values of  e .This is due to a small number of samples in this unlikely corner of parameter space: fewer than 1% of the models in the test set have  e > 0.11.

Galaxy UV luminosity functions
The LFs branch consists of five layers of 400 nodes each.The network outputs the LFs at four redshifts (z = 6, 7, 8, 10) and magnitude bins ranging from -20 to -10.In the lower left panel of Figure 4, we show the emulated and simulated LFs at  = 6 (the performance at the other redshifts is comparable).The hatched region denotes the range spanned by LF observations used in the inference in the following section.
We can see that the emulator is very accurate in the flat range spanned by the existing observations, while it is less accurate around the faint-end turnover.At all of the redshift bins, we have that the absolute difference log  true /Mpc −3 − log  pred /Mpc −3 < 0.1 over the majority of the magnitude range.
We provide an alternative setting in 21cmEMU that allows the user to skip the emulation and directly calculate the CMB optical depth and UV LFs using 21cmFAST.This improved accuracy however comes at the cost of a slower runtime: ∼ 700 ms per call compared with ≲ 50 ms using emulation.

Summary of 21cmEMU performance and context with other emulators
In Table 1, we summarize the performance of 21cmEMU for each summary in the first row, using the fiducial training set of 1.3M samples.In general, the median (68%) emulation fractional error is at the level of ≲ 0.5% (1%).The most accurate prediction is achieved with the EoR history, most likely due to the fact that it is a monotonic and smooth function, making it easier to learn.The least accurate summary is the power spectrum, which is understandable as it is two dimensional with the largest dynamic range.
It is difficult to directly compare the performance of 21cmEMU with other emulators of EoR/CD observables, due to their different astrophysical parametrizations and training set sizes.Nevertheless, at face value 21cmEMU's accuracy is better than achievable with state-ofthe-art emulators (e.g.Mondal et al. 2020;Bye et al. 2022b;Yoshiura et al. 2023).For example, comparing with the recent, bespoke 21-cm global signal emulator 21cmVAE (Bye et al. 2022b), we obtain a factor of 2.2 (1.5) lower median (95th percentile) RMS error (see their eq.1).Our median 21-cm PS FE is a factor of ∼ 10-100 lower than that of the bespoke PS emulators in Kern et al. 2017;Ghara et al. 2020, when compared over the same redshift/wavemode ranges.
This improvement in 21cmEMU over previous works could be attributed to several factors.Firstly, we have a training set of unprecedented size: 1.3M samples.This is orders of magnitude larger than used in previous works (generally ranging from thousands to tens of thousands).We quantify how 21cmEMU's accuracy changes with the training set size in the following section.
Secondly, the improvement in power spectrum emulation could be attributed in part to our novel CNN architecture.Previous 21-cm PS emulators used only fully-connected layers which are not as efficient in processing 2D images such as the PS.
Finally, the fact that 21cmEMU emulates many different observables allows the prediction of any one of these to be helped by the others.Indeed, we verified explicitly that the 21-cm PS emulation is improved when the other summary outputs are included in the loss (i.e. when all branches are trained together).In addition to improving performance, including multiple EoR/CD observables is extremely important in the current era where 21-cm observations are not strongly constraining.As we show in Section 4.2, complementary galaxy and EoR observations are needed to obtain a likelihood-dominated (as opposed to prior-dominated) posterior (see also HERA22).

Varying the size of the training set
Since 21cmEMU was trained on an uncharacteristically-large training set, it is useful to see how it performs with smaller training sets.To do so, we remove some models at random, retrain 21cmEMU on the reduced training set, and test its performance on the same test set.
In Figure 5, we plot the median FE in each summary as a function of the training set size.We normalize the FE so that unity corresponds to the fiducial, 1.3 M training set.We also explicitly list the performance using half of the database (640k samples), and 1% of the database (13k samples) in the middle and bottom rows of Table 1.

APPLICATION TO INFERENCE
In this section, we apply 21cmEMU to inference problems.We use the 21cmMC driver (Greig & Mesinger 2015), which now includes the option to use either 21cmFAST or 21cmEMU as the simulator.21cmMC incorporates three highly parallel samplers: EMCEE (Foreman-Mackey et al. 2013), Multinest (Feroz et al. 2009), and Ultranest (Buchner 2016(Buchner , 2019;;Buchner 2021); in this work we use the latter two as discussed further below.
First, we run the same inference as was previously run in HERA22 using 21cmFAST in order to see how emulator error affects the posterior.After this validation, we showcase the potential of 21cmEMU by performing several new inferences demonstrating: (i) how different observations are complimentary; (ii) the approximate impact of new, late-ending EoR constraints; (iii) the potential impact of upcoming H6C HERA observations.Each 21cmEMU inference took roughly a day on a GPU, compared with a few weeks on a cluster had we used 21cmFAST directly.

Comparison with direct simulation
We run the same inference as in HERA22 (10k livepoints) with 21cmEMU.Doing this allows us to directly compare the inference results between the two methods.
The likelihood in HERA22 incorporates four data sets: (i) Thomson scattering CMB optical depth -this term compares the Thomson scattering CMB optical depth from the proposed model with the one from the analysis of Planck Collaboration et al. ( 2020) by Qin et al. (2020), whose posterior is characterized by median and 68% credible interval (CI):   = 0.0569 +0.0081 −0.0086 .The likelihood function is a two-sided Gaussian.
(ii) The Lyman forest dark fraction -this term compares the mean neutral fraction at  = 5.9 with the upper limit of  Hi < 0.06 ± 0.05 at 68% CI obtained from QSO dark fraction (McGreer et al. 2015).The likelihood function is unity at  Hi ( = 5.9) < 0.06, decreasing as a one-sided Gaussian for higher neutral fraction values.
(iv) 21-cm power spectrum upper limits -this term accounts for HERA H1C 94 night observations at  = 8 and  = 10, presented in The HERA Collaboration et al. (2022b).The likelihood is the upper limit likelihood discussed in HERA22.
These individual likelihood terms are multiplied to obtain the total likelihood.When using 21cmEMU for inference, we add the median emulator error in quadrature to the measurement uncertainties for each corresponding likelihood term.
In Figures 6 and 7, we compare posteriors obtained using 21cmFAST (cyan) to that using 21cmEMU (orange).Both were run using the MultiNest sampler with the same number of livepoints (10k, yielding ∼ 60k posterior samples).In the lower left of Fig. 6 we plot the 1D and 2D marginal PDFs for our astrophysical parameters, while in the top right we plot 95% CI of some of the summary observations (see caption for details).In Fig. 7 we plot the corresponding spin temperature PDFs in the two HERA bands, which was one of the main results of the HERA22 paper.We note that the 21cmFAST and 21cmEMU posteriors are nearly identical, testifying that the emulation error is fairly negligible when performing inference using current data sets.The only notable difference is in the  * PDF, which is slightly broader when 21cmEMU is used as a simulator compared with 21cmFAST.We find no notable trends of the emulator error with this parameter, concluding the small difference could be due to stochasticity in sampling and/or a higher dimensional covariance of the emulator error.
In Figure 6 we also include a run using 21cmEMU and the same HERA22 likelihood, but with the UltraNest sampler (purple curves; 5k livepoints, yielding ∼ 70k posterior samples).Although broadly consistent with the previous two posteriors, it is evident that the choice of sampler (purple vs orange) results in a larger difference than the choice of simulator (purple vs cyan).In particular, the UltraNest posterior is more accurate towards the edges of the prior range, resulting in flatter posteriors at the edges: this behavior is also recovered using the EMCEE sampler as shown in Lazare et al. (2023).Moreover, UltraNest's vectorization makes it ∼ 10x faster when using an efficient simulator like 21cmEMU.Therefore, in subsequent sections we only show posteriors generated with UltraNest.
We remind the reader that the emulator was trained on the HERA22 nested sampling output.This inference took ∼400k core hours.Once trained however, the emulator performs amortized posterior estimation in only 225 core hours using Multinest or in 30 core hours using Ultranest.

Impact of different observations on the posterior
Having tested the emulator in the previous section, we now use it to perform multiple inferences that would be too costly with direct simulation.We begin by quantifying how the individual terms from the HERA22 likelihood discussed in the previous section affect the posterior.We do this by removing the terms one by one, and comparing the resulting posteriors in Figure 8.
In orange we show the full HERA22 posterior from the previous section, including all likelihood terms.In green, we remove the HERA power spectrum upper limit constraint.We see that the only consequence is that the   /SFR parameter becomes unconstrained.In the panel on the right, we can also see the 95% CI of the power spectrum and 21-cm global signal becoming wider around z ∼ 6−10.As discussed in HERA22, the 21-cm power spectrum limits is the only measurement sensitive to the IGM temperature during the cosmic dawn.
Next, if we remove constraints on the EoR history (here corresponding to the dark fraction and   likelihood terms), using only the UV LFs in the likelihood, we obtain the posterior shown in blue.We see that EoR history measurements allow us to set (lose) constraints on the ionizing escape fraction (here parametrized via  esc,10 and  esc ), which disappear completely when their corresponding terms are not included in the likelihood.Including only the UV LFs does disfavor very early reionization,  > 11, because the redshift evolution of the SFR density implied by UV LF observations is too steep to allow arbitrarily early EoR, even with escape fractions close to unity.
Finally we show the prior distribution in the space of UF LFs, 21-cm PS, 21-cm global signal, and EoR history in gray.We see that all of the posteriors in these spaces are significantly broader than the priors, and are thus likelihood dominated (i.e. are not sensitive to the prior choices).Moreover, each likelihood term adds complimentary information, highlighting the importance of combining observational datasets when interpreting the high-redshift Universe.

Impact of late reionization
Recent observations of the large-scale opacity fluctuations in the Lyman alpha forest (e.g.Becker et al. 2015;Bosman et al. 2018Bosman et al. , 2022) ) imply a late end to reionization  < 5.6 (Choudhury et al. 2021;Qin et al. 2021a).In this section, we explore how such new EoR history constraints would impact the previously-shown posterior.Unfortunately, the current version of 21cmEMU does not emulate the Lyman alpha forest, and so we cannot compute a likelihood directly in the observed space of Ly opacity fluctuations.Instead we take a more approximate approach, computing the likelihood in the space of EoR histories, i.e.  Hi ().To construct a likelihood in this space, we use the EoR history posterior from Qin et al. (2021a), who did in fact compute a likelihood from forward-modeled Ly opacities in addition to the dark fraction and  e observations.Specifically, we compute a new Late EoR posterior by replacing the dark fraction and  e likelihood terms with a Gaussian likelihood evaluated at three redshifts,  Hi (z = 6) = 0.25 ± 0.07,  Hi (z = 7) = 0.58 ± 0.1, and  Hi (z = 8) = 0.79 ± 0.09, ignoring any covariance between redshifts.Although this is obviously an approximation to computing the likelihood directly in the space of the observations, it suffices to qualitatively show the impact of new EoR history constraints.
In Figure 9, we show the previous (Fiducial) posterior in purple (71k samples) together with the new (Late EoR) posterior in orange (18k samples).Understandably, the corresponding recovered EoR history in orange is narrowly centered around the three points at =6, 7, 8 used to define the likelihood.As a consequence, the posterior of the Thomson optical depth also becomes more narrow, shifting toward lower values while still being within the range allowed by Planck observations.The resulting PDF of  esc,10 for Late EoR is also narrower, and shifted towards smaller values.Even the power law scaling of the escape fraction with halo mass,  esc , is constrained to within ± 0.3 (68% C.I.) for Late EoR, whereas the Fiducial posterior only sets a lower limit for this parameter.The remaining parameters are unaffected by the change to the Late EoR likelihood.
We also see that the recovered 21-cm large-scale PS for Late EoR is narrower at  < 8.The large-scale 21-cm PS during the EoR peaks around its midpoint (e.g.Lidz et al. 2007;Pritchard & Furlanetto 2007), which occurs at  ∼7-8.The HERA22 upper limits disfavor higher values of the 21-cm PS at  ∼ 8, but the tail towards small PS values seen in the Fiducial posterior (corresponding to small  Hi ), shrinks when moving to the Late EoR posterior.

Forecasts for HERA Phase II sixth-season observations
We now forecast parameter constraints that could be achievable with the sixth season of HERA observations, taken in 2022-2023 (Berkhout et al., in prep.) (Berkhout et al., in prep.).This season of observing used Phase II of the HERA instrument, spanning 50-230 MHz (omitting the FM band, 90-110 MHz), expanding coverage to Cosmic Dawn and late reionization with respect to Phase I (which was used for HERA22).While analysis of this season's data is ongoing, its broad characteristics are known (Dillon & Murray 2021): approximately 1300 hours of unflagged data over ∼ 150 nights, with an average of ∼ 148 unflagged antennas per night.Although further flagging will certainly occur during processing, this dataset will be HERA's most sensitive data release to date, by a significant factor.
We create a mock observation corresponding to this upcoming dataset.For the "true" cosmic signal, we use the Evolution of Structure (EOS) 2021 release (Muñoz et al. 2022).EOS2021 is a large simulation (1.5 cGpc per side with 1000 3 cells) made with 21cmFAST, with the goal of being our current "best guess" for the true cosmic signal.Although it used the same parametrization for galaxy scaling relations as is used here (see Section 2.1), the physical model of EOS2021 has a few notable differences.Instead of leaving  turn as a free parameter, EOS2021 explicitly calculated a local  turn (x, ) based on feedback from the local ionizing and photo-disassociating backgrounds, as well as the relative velocities of baryons and dark matter.Furthermore, EOS2021 explicitly accounted for putative PopIII star formation in the first, H 2 -cooled galaxies (MCGs, e.g.Tegmark et al. 1997;Abel et al. 2002;Bromm & Larson 2004;Haiman & Bryan 2006), which dominated the back-Figure 6.Comparison of posteriors obtained using 21cmFAST and 21cmEMU after performing an inference with the same HERA22 likelihood.The darker/dashed regions represent 68% CIs, while pale/thin dashed regions represent 95% CIs.The orange and purple posterior distributions are obtained using the MultiNest sampler (10k livepoints, ∼ 60k posterior samples), while the cyan posterior distribution is obtained using the UltraNest sampler (5k livepoints, ∼ 70k posterior samples).The median value and the 68% CIs of the 1D marginal PDFs are written above each column of the corner plot.In the panels on the top right, all highlighted regions correspond to 95% CIs.In the top middle panel, we plot the luminosity functions for redshifts 6,7, and 8.For the luminosity function likelihood, we use the data shown in black squares (Bouwens et al. 2015(Bouwens et al. , 2016;;Oesch et al. 2018).In the top right, we show a panel with three summary statistics, namely the redshift evolution of the 21-cm power spectrum at  = 0.13 cMpc −1 , the 21-cm global signal and mean neutral fraction, from top to bottom.The black squares in the power spectrum plot correspond to the two deepest limits for each HERA redshift band ( = 0.13 cMpc −1 at z ∼ 8 and  = 0.17 cMpc −1 at z ∼ 10).In the bottom plot, the black square denotes the upper limit on the average neutral hydrogen fraction obtained from the QSO dark fraction (McGreer et al. 2015).In the bottom right, we show the PDFs of the Thomson optical depth together with the Planck result used in the likelihood.The astrophysical parameter ranges shown in the corner plot correspond to the extent of the flat priors assumed for the inferences.ground radiation fields at  > 16, for their fiducial parameter choices.As a result of the models being different, 21cmEMU could result in a biased recovery of the EOS2021 signal; we quantify this below.
We use 21cmSense7 (Pober et al. 2013(Pober et al. , 2014) ) to obtain thermal and sample variance estimates of the HERA 6th season data, and describe our methodology and assumptions in App.B. We consider our sensitivity estimate to be realistic, with a few important caveats, for example the potential over-estimation of sensitivity when treating 'similar' baselines as identical (Zhang et al. 2018).The largest unpredictable caveat is of course the presence of instrumental systematics, for which we describe our approach in more detail below.
Radio telescopes, including HERA, impose their own signature on observations -dependent on the primary beam attenuation, antenna layout, channelization and other instrumental characteristics.The effect of this instrumental signature on the observed power spectrum is such that neighbouring Fourier modes are linearly 'mixed' via a 'window function' matrix (e.g.Liu & Tegmark 2011;Gorce et al. 2023).We calculated this window function using the hera_pspec8 package.We did not use the exact HERA beam as in Gorce et al. 2023.Instead, we used the Gaussian beam approximation which we deemed sufficient for this forecast (see Figure 7 in Gorce et al. 2023 for a comparison).Once we obtain the HERA window function, we matrix multiply it with the emulated model to properly compare with the forecast.
We perform inference using the EOS2021 cosmological signal with the sensitivity estimates from 21cmSense as the mock observation (see Fig. B2 in Appendix B).This inference takes about 30 GPU h to run to convergence with UltraNest.In Fig. 10 we show the resulting posterior (HERA 6th season in orange) together with the previous result (Fiducial in purple).In the top right panel we show the mock PS at  ∼ 0.16Mpc −1 as orange points with associated error bars.We see that based purely on the available S/N, the HERA sixth season data have the potential to detect the cosmic PS during the EoR (6 <  < 8).The 95% CI of the inferred PS (orange) go tightly around the data points.This unbiased recovery is reassuring, given the above-mentioned differences in the theoretical models used for the mock and forward-modeled data.Indeed, most of the "true" astrophysical parameters from EOS2021 (denoted with blue lines in the corner plot) are consistent with the recovered orange posteriors.Parameters governing X-ray heating,  X<2keV /SFR and  0 , are recovered with the lowest accuracy, with the true values residing outside of the 68% CI of their 2D PDF.This is understandable, because the 21cmEMU forward models do not include the additional radiation from  2 -cooling galaxies, which dominate the X-ray heating at  > 16.
Comparing to current constraints (Fiducial posterior in purple), we see that that HERA 6th season data have the potential to drastically improve our knowledge of the EoR.The HERA 6th season EoR history  Hi () is constrained to within ±0.05 (95% C.I.): a factor of ≳ 7 improvement over current limits.As a result, we can place strong constraints on the characteristic ionizing escape fraction,  ecs,10 , and its dependence on galaxy mass,   , which are almost completely unknown currently.
It is important to note that these two posteriors use a different form for the likelihood.For the HERA 6th season forecast, we assume that there are no residual systematics after processing of the HERA data.This is in contrast to the previous likelihoods, which assume that each -mode has a positive systematic whose prior amplitude is uniform and unbounded (cf.The HERA Collaboration et al. 2022b).In practice, assuming no residual systematics results in a two-sided Gaussian likelihood, corresponding to a 'detection', whereas the traditional likelihood has been a one-sided error-function corresponding to an 'upper-limit'.We make this choice as it is not straightforward to sample from the unbounded uniform prior for systematics when creating the mock data for the forecast.The resulting tighter parameter posteriors for the new data are therefore the result of an admixture of the new more sensitive data and the (effectively) more constrained priors on systematics.and UltraNest.The full posterior with all four probes is plotted in purple (exactly the same as the purple in Figure 6).In green, we show the posterior without the HERA power spectrum upper limits term.In blue, we additionally remove the neutral fraction and Thomson optical depth terms, leaving only the UV luminosity functions terms.On the top right half of the plot, we show the 95% CI of the same three posteriors but in the space of summary statistics: first the UV LFs, and then a panel with the 21-cm power spectrum, 21-cm global signal, and EoR history, top to bottom, and finally a panel with the Thomson optical depth.In grey, we plot the summary statistic 95% CI assuming a flat distribution across all nine astrophysical parameters which is what was used for the prior for the 21cmFAST inference.

CONCLUSION
Here we introduced 21cmEMU: a publicly-available emulator of several summary observables from 21cmFAST.We trained the emulator on 1.3M pseudo-posterior samples from the inference in HERA22.The input consists of a nine parameter model characterizing the UV and X-ray outputs of high redshift galaxies.The output consists of: (i) the 21-cm power spectrum as a function of redshift and wave-mode; (ii) the IGM mean neutral fraction as a function of redshift; (iii) the UV luminosity function at four redshifts 6, 7, 8, and 10; (iv) the Thompson scattering optical depth to the CMB; (v) the mean spin temperature as a function of redshift; and (vi) the 21-cm global signal as a function of redshift.The emulator predicts all of these quantities with under ∼ 2.4% error at 68% CL, and a computational cost that is lower by a factor of ∼10000 compared to 21cmFAST.
We varied the size of the training set, finding only a modest de- crease in performance (a factor of ∼2 decrease in the FE) as the number of samples was reduced from 1.3M to ∼100k.Below ∼100k samples, we saw a sharp drop in performance, with the fractional error increasing roughly as the inverse of the size of the training set.
We validated the emulator's performance in inference by comparing the posteriors obtained with 21cmEMU vs 21cmFAST using the same likelihood (taken from HERA22).We found a very modest difference between these two posteriors, further illustrating that the emulator error is negligible when performing inference using current data.
Next, we profited from the speed of our trained emulator to perform multiple inferences that would otherwise be very costly using direct simulation.First, we studied the constraining power of each term in our fiducial likelihood.We found that current observations are very complementary, with UV LFs constraining the stellar-to-halo mass relations, EoR history probes constraining the ionizing escape fraction, and the addition of 21-cm PS upper limits constraining the X-ray luminosity to SFR relation.
We also explored the impact of new EoR history constraints, driven by opacity fluctuations in the Lyman forest.These recent observa- ), here we demarcate its range during the EoR (i.e. 6 <  < 8 where the mock observations imply a detection).For more details about the panels, see the legend in Fig. 6.
tions imply much tighter constraints on the EoR history, finishing at  < 5.6 (e.g.Qin et al. 2021b;Choudhury et al. 2021).The inclusion of these new limits tightened the recovered constraints on the ionizing escape fraction and its scaling with halo mass.The impact on other parameters was modest.
Finally, we presented forecasts of parameter constraints achievable with ongoing 6th season phase II observations with the HERA telescope.Optimistically, we could expect a detection of the 21-cm PS at  ∼6-7.This would result in a dramatic improvement in the recovered timing of the EoR, allowing us to infer  Hi () to within ±0.05 (95% C.I.): a factor of ≳ 7 improvement over current limits.As a result, we could place strong constraints on the characteristic ionizing escape fraction and its dependence on galaxy mass, which are almost completely unknown currently.We cautioned however that this forecast is optimistic, mainly because it assumed there are no residual systematics in the processed data (see Appendix B for more details).
21cmEMU was trained on a database of summary observables where only one seed i.e. random set of initial conditions is available per combination of astrophysical parameters.In the future, we hope train the emulator on a database that samples many different seeds in order to emulate a full likelihood function rather than only approximate the mean as we do right now.This is important since Prelogović & Mesinger (2023) showed that using a single random seed when forward modeling can bias the inference results.
We make 21cmEMU publicly-available at https://github.com/21cmfast/21cmEMU, and include it as an alternative simulator in the public 21cmMC 9 sampler.We will periodically release updated versions, trained on the latest galaxy models and expanding the choice of summary outputs.21 .The colour of each bin in the 2D histogram is a weighted mean of the fractional error of the samples therein.On the diagonal, we show the 1D marginal density distribution of each astrophysical parameter in the test set.Note that the range of astrophysical parameters in the corner plot corresponds to the ranges taken for the flat prior of the inference used to generate the database.Δ = 122.07kHz is the channel width of the observation and  int = 300 sec is the coherently-averaged LST-bin size used in the analysis12 .Furthermore,  is a 'flag' function that takes the value 0 or 1 depending on the location of the 3D mode with respect to the foreground wedge (see below).
In this equation,  k ⊥ represents the number of samples of this angular scale observed coherently throughout the observing season (i.e.observations that are averaged together as visibilities).In 21cmSense, this is estimated by creating a grid on the ì  ⊥ plane, whose cells are the size of the primary beam of the instrument in Fourier-space (for HERA, this is 7 at 150 MHz), and binning the the baseline coordinates into this grid13 .In addition to the number of samples in a bin coming from different (redundant) baselines, we also have samples from the same baseline at different times.Here, samples at the same LST on different nights are averaged coherently, but samples at different LSTs are averaged incoherently (i.e. after forming power spectra).Currently, 21cmSense only has support for specifying the number of nights observed and the number of hours observed each night (thereby specifying the number of LST bins in conjunction with the LST bin duration).However, in realistic obser- vational programs, the same LST bins are not observed each night (whether due to the evolution of the sky throughout the season, or through flagging / data quality concerns).To partially account for this, we define a function  obs (LST) which counts the number of unflagged days observed over the season for any given 300-second-long LST bin (note that this accounts for flags of the entire observation, due to things like poor weather or correlator malfunctions, but not antenna-or channel-specific flags).To map this non-constant function of LST bin onto the schema available in 21cmSense, which assumes the same LST bins are observed each night, we set and  day =  LST × 300 sec.This achieves the same resulting thermal noise level, under the assumption that the sky temperature is constant over the LST bins.We use actual sixth-season HERA measurements for  obs , as shown in Fig. B1.We calculate  days,eff = 67.4 for coherent averaging and  day = 21 ℎ for incoherent averaging (i.e. the thermal noise from our observing pattern is equivalent to observing 253 300-second LST bins each for 67.4 days).Finally, we apply a further factor of  = 0.9 to  days,eff to broadly account for finer-scale flags applied during analysis that are unaccounted in the LST-bin observing pattern of Fig. B1.In summary, we have The line-of-sight modes observed depend on the channel width, as already defined, and also the bandwidth of the observation.While HERA Phase II observes 200 MHz of bandwidth from 50 -250 MHz, power spectra are estimated in smaller 'spectral windows' whose size is determined by a number of factors.Chiefly, the windows are as wide as possible, so as to include the largest scales where the signal is strongest, but are constrained by lightcone evolution (Trott 2016;Datta et al. 2012;Greig & Mesinger 2018) to be effectively smaller than ∼ 10 MHz.In practice, spectral windows are chosen to lie between strongly flagged channels (eg.FM band and Orbcomm), which means their width is not constant.Here, we use constant 20 MHz spectral windows, where we assume a Blackman tapering function is applied to each window to reduce the effective bandwidth to ∼ 10 MHz (and an appropriate scaling factor of 1.737 is applied to the final noise level).We calculate noise estimates for all spectral windows between 50-250 MHz, excluding the FM band between 90-110 MHz.
We use a model for  sky that is a power-law in frequency, with amplitude and spectral-index obtained from simulated auto-correlations of the diffuse sky, using the GSM (de Oliveira-Costa et al. 2008) and the HERA Phase II primary beam (Fagnoni et al. 2021)  We construct several estimates of the noise based on different effective observing arrays.The sixth season of HERA data observed with a maximum of 209 antennas simultaneously in any given night (of the total 350 antennas available).The bulk of these antennas observed consistently throughout the season, though a fraction of them were swapped in and out.In our estimates here, we assume that the same antennas observe consistently throughout the season, which is a reasonable approximation.Nevertheless, in practice, even though 209 antennas are being correlated at any given moment, some fraction of them are flagged over all channels (e.g.due to swapped polarizations, non-redundancies from physical effects such as feed displacement, or X-engine failures that affect a subset of antennas, etc.).The average number of antennas actually observing per-night throughout the season is as-yet unknown, though initial estimates place it at ∼ 150 antennas (Dillon & Murray 2021).Here we use  ants = 148, where the antennas are drawn randomly from the set of 209 antennas that actually observed throughout the season.In all cases, we use only baselines whose East-West length is greater than 15 m (i.e.we exclude North-South baselines, as their systematics are more difficult to filter out), and also only include baselines shorter than 150 m, similar to analyses of previous HERA seasons.
After obtaining the 3D sensitivity grid, we incoherently average into 1D spherical | |-shells with bins of width Δ | | .In this process, we flag out (| ⊥ |,  | | )-bins within the foreground 'wedge' (Liu et al. 2014a,b), defined by with || the baseline length (in meters) corresponding to a given  ⊥ , and  | | / a redshift-dependent cosmological factor converting bandwidth into cosmic distance.This corresponds to the 'horizon' limit of foregrounds in delay-space, plus a conservative buffer of 0.1 ℎ/Mpc (corresponding to the buffer used in first-season HERA analyses).
In addition to the thermal variance, cosmic-(or sample-) variance is added, proportional to a fiducial cosmological power spectrum,  2 theory divided by the number of LST bins and  ⊥ -modes in a spherical shell.We note that using the number of LST-bins is inspired by the idea that LST-bins should capture the entire duration of 'coherence', equal to roughly the beam-crossing time for an antenna.However, HERA is conservative in using shorter coherence times, resulting in many more LST-bins.This reduces the thermal sensitivity, but artificially reduces the cosmic variance estimated by 21cmSense.Nevertheless, since cosmic variance is generally a subdominant contribution to the total variance, this should not have a large effect on the results presented here.For the fiducial theoretical model, we here use the model from Muñoz et al. (2022).
We summarize the parameters used in Table B1 and show the full HERA phase II 6th season sensitivity forecast in Figure B2.
There are a few caveats to these estimates.Most importantly, baselines found within 7 UV-bins together are considered redundant, while in the HERA analysis only baselines within 10 cm are considered redundant.This will artificially increase thermal sensitivity estimates.Secondly, the sky temperature is considered constant over the LST bins.To minimize the effect of this limitation, we use a sky model that is based at the most-observed LST (7 hours).Thirdly, cosmic variance is reduced as the square root of the number of LST bins, instead of the number of independent 'fields' observed.This artificially increases the sensitivity from cosmic variance, though this should not have a large effect since this is the sub-dominant contribution on most scales and redshifts.Finally, in this forecast we did not decimate the -bins as was done in previous analyses.This results in some unaccounted covariance between -bins that would tend to over-estimate the sensitivity.We do not expect this to significantly affect the qualitative conclusions derived from the forecast.B1.Note that in practice, HERA decimates the -bins to avoid requiring non-diagonal covariance.Here have not done this, though the effect on results should be small.

Figure 1 .
Figure1.Schematic of the 21cmEMU architecture.Astrophysical parameters (top; c.f. Section 2.1 are inputted through a large block of fully-connected layers.The output from this shared block is then passed on into five blocks (much smaller than the shared block).The pink, blue, green, yellow, and orange fully-connected branches output the Thomson scattering optical depth, UV LFs, mean hydrogen neutral fraction, spin temperature, and global signal, respectively.The output from the shared block is also reshaped into an image and is passed into a 2D convolutional neutral network which outputs the 21cm power spectrum.The convolutions gradually build the PS image.The window size varies among the layers.The number of filters (yellow layers) decreases toward the end of the CNN.

Figure 2 .
Figure 2. Training loss (black line) and validation loss (orange line) as a function of training epoch.The learning rate curve is also shown with the dashed gray line and the corresponding right axis.

Figure 3 .
Figure 3.The spherically-averaged 21-cm power spectrum as a function of wavemode and redshift for a sample in the test set.The 21cmEMU prediction is shown on the left while the 21cmFAST result is on the right.This sample has a 21-cm PS FE that is roughly comparable to the median value of the whole the test set, and can thus be considered representative of the emulator performance.The rightmost panel shows the fractional error for this single sample.

Figure 4 .
Figure 4.A subset of summary outputs from 21cmEMU for ten random samples from the test set.Panels show: redshift evolution of the  = 0.1 Mpc −1 21-cm PS amplitude, redshift evolution of the mean 21cm brightness temperature, redshift evolution of the mean spin temperature in the neutral IGM, the CMB optical depth, UV LFs at  = 6, the EoR history (clockwise from upper left).Colors denote the astrophysical parameter sample with solid (dashed) lines corresponding to outputs from 21cmFAST (21cmEMU).In the bottom sub-panels, we show the absolute differences between the predicted and true quantities shown in the top sub-panels.Absolute differences of the ten random samples are shown with the corresponding colors, while the median absolute differences (FE in the case of   ) computed over the entire test set are shown with dashed, black curves.Dark (light) shaded regions enclose 68% (95%) CL.

Figure 5 .
Figure 5. Median fractional error of each summary as a function of the training set size.The FE is normalized so that unity corresponds to the fiducial, 1.3 M training set.

Figure 7 .
Figure 7.Comparison of the mean spin temperature distribution from 21cmFAST and 21cmEMU for each of the two HERA bands after performing an inference with the exact same likelihood.The credible intervals have been calculated using the highest posterior density method.The dark (light) cyan shaded region shows the 68% (95%) CI.The solid cyan line shows the distribution for 21cmFAST with 10k livepoints using MultiNest.The dashed orange line shows the same but for 21cmEMU.The dashed purple line shows the distribution for 21cmEMU but using the UltraNest sampler with 5k livepoints.

Figure 8 .
Figure8.Contribution of various likelihood terms to the total posterior.The corner plot on the left shows the 95% CI of three inferences, all run with 21cmEMU and UltraNest.The full posterior with all four probes is plotted in purple (exactly the same as the purple in Figure6).In green, we show the posterior without the HERA power spectrum upper limits term.In blue, we additionally remove the neutral fraction and Thomson optical depth terms, leaving only the UV luminosity functions terms.On the top right half of the plot, we show the 95% CI of the same three posteriors but in the space of summary statistics: first the UV LFs, and then a panel with the 21-cm power spectrum, 21-cm global signal, and EoR history, top to bottom, and finally a panel with the Thomson optical depth.In grey, we plot the summary statistic 95% CI assuming a flat distribution across all nine astrophysical parameters which is what was used for the prior for the 21cmFAST inference.

Figure 9 .
Figure 9. Same as Fig. 6, but comparing the fiducial posterior (purple) with one obtained by replacing the QSO dark fraction and   likelihood terms with a "Late EoR" likelihood denoted by the three points with error bars in the middle right panel (orange.The "Late EoR" likelihood is based on the inference results in Qin et al. 2021a, which included recent measurements of opacity fluctuations in the Lyman alpha forest.In the top right sub-panels, we show both the 68% (darker) and 95% (paler) C.I.

Figure 10 .
Figure 10.Forecasted constraints from mock HERA Phase II season six observations (see text for details) are shown in orange.The mock PS amplitudes at  ∼ 0.16 Mpc −1 are shown as orange points with error bars in the top right panel, together with current upper limits from HERA22.The parameters of the cosmological simulation used for the mock observation, EOS2021, are denoted with blue lines in the corner plot.We caution that the theoretical model used in EOS2021 and that used in 21cmEMU are somewhat different, as discussed in the text.As  turn in EOS2021 evolves with redshift (see Fig.5inMuñoz et al. 2022), here we demarcate its range during the EoR (i.e. 6 <  < 8 where the mock observations imply a detection).For more details about the panels, see the legend in Fig.6.

Figure A1 .
Figure A1.Distribution of the mean fractional error of the emulated log Δ 221 .The colour of each bin in the 2D histogram is a weighted mean of the fractional error of the samples therein.On the diagonal, we show the 1D marginal density distribution of each astrophysical parameter in the test set.Note that the range of astrophysical parameters in the corner plot corresponds to the ranges taken for the flat prior of the inference used to generate the database.

Figure B1 .
Figure B1.The number of times each 300-second LST bin was observed and un-flagged in HERA's sixth season, used for sensitivity estimates.Note that this accounts only for flags arising from strong effects that affect large swathes of the observed antennas and/or channels (eg.lightning storms, correlator outages), and further flags are applied in the downstream analysis.

Figure B2 .
Figure B2.HERA phase II 6th season sensitivity forecast obtained using 21cmSense with the parameters specified in TableB1.Note that in practice, HERA decimates the -bins to avoid requiring non-diagonal covariance.Here have not done this, though the effect on results should be small.

Table 1 .
Performance of the 21cmEMU network when trained on the full database, half of the database and 1% of the database.
Fagnoni et al. (2021)is not able to use different sky models for different LST-bins, so this choice represents the temperature for the most-observed LST bin.For  rcv , we use a frequency-dependent model based on electromagnetic simulations performed inFagnoni et al. (2021), interpolated by a cubic spline.This model is close to a power-law at low frequencies, with an amplitude of ∼ 600 K at 50 MHz and asymptoting to a const ∼ 60 K by 200 MHz.