Machine-learning recovery of foreground wedge-removed 21-cm light cones for high-𝑧 galaxy mapping

Upcoming experiments will map the spatial distribution of the 21-cm signal over three-dimensional volumes of space during the Epoch of Reionization (EoR). Several methods have been proposed to mitigate the issue of astrophysical foreground contamination in tomographic images of the 21-cm signal, one of which involves the excision of a wedge-shaped region in cylindrical Fourier space. While this removes the 𝑘 -modes most readily contaminated by foregrounds, the concurrent removal of cosmological information located within the wedge considerably distorts the structure of 21-cm images. In this study, we build upon a U-Net based deep learning algorithm to reconstruct foreground wedge-removed maps of the 21-cm signal, newly incorporating light-cone effects. Adopting the Square Kilometre Array (SKA) as our fiducial instrument, we highlight that our U-Net recovery framework is robust to instrumental limitations and noise. We subsequently evaluate the efficacy of recovered maps in guiding high-redshift galaxy searches and providing context to existing galaxy catalogues. This will allow for studies of how the high-redshift galaxy luminosity function varies across environments, and ultimately refine our understanding of the connection between the ionization state of the intergalactic medium (IGM) and galaxies during the EoR


INTRODUCTION
The redshifted 21-cm spectral line has been identified as a promising probe of the early Universe and cosmic structure formation (e.g.Furlanetto et al. 2006;Morales & Wyithe 2010a;Pritchard & Loeb 2012;Liu & Shaw 2020).Ongoing and future experiments (LOw Frequency ARray (LOFAR); van Haarlem, M. P. et al. 2013, Murchison Widefield Array (MWA); Bowman et al. 2013, Square Kilometre Array (SKA); Koopmans et al. 2015, Hydrogen Epoch of Reionization Array (HERA); Deboer et al. 2017) will measure spatial fluctuations in the 21-cm signal from the Cosmic Dawn through the Epoch of Reionization (EoR).While several current experiments (LOFAR, MWA, HERA) have been optimally designed to measure the statistical properties of the 21-cm signal via the power spectrum, the upcoming SKA is configured to produce three-dimensional tomographic maps of the 21-cm signal over a wide range of redshifts.These maps will trace the evolution of the IGM's ionization state through the growth of ionized bubbles, or HII regions (e.g.Morales & Wyithe 2010b).While it is expected that galaxies were the primary source of ionizing photons during the EoR (e.g.Yan & Windhorst ★ jacob.kennedy@mail.mcgill.ca2003; Bouwens et al. 2012;Finkelstein et al. 2012;Robertson et al. 2013), the properties of these source galaxies remain poorly constrained (e.g.Robertson et al. 2010).Leveraging the sensitivity of the 21-cm background to the complete galaxy population presents the opportunity to indirectly constrain galaxy properties via their impact on the IGM.Therefore, by connecting 21-cm tomography to galaxy surveys, we may identify the galaxies responsible for reionization and ultimately improve our understanding of galaxy populations during this epoch.
The primary obstacle to 21-cm tomography is astrophysical foreground contamination.Galactic and extra-galactic foregrounds can be up to three to four orders of magnitude brighter than the cosmological signal (Bernardi et al. 2009(Bernardi et al. , 2010)).While foreground emission is expected to be spectrally smooth, further complications arise due to the interferometric nature of observing instruments.Various methods have been proposed to resolve the issue of foreground contamination in the context of the EoR 21-cm signal (see Liu & Shaw 2020 for a summary).Several of these address the problem of mode-mixing, whereby the chromatic response of interferometers causes foregrounds to leak from lower to higher  ∥ modes (Fourier wavenumbers parallel to the line-of-sight; LoS).This process leads to the localization of foregrounds in a wedge-shaped region of Fourier space, known as the foreground wedge (Datta et al. 2010;Morales & Wyithe 2010b;Parsons et al. 2012;Vedantham et al. 2012;Trott et al. 2012;Hazelton et al. 2013;Pober et al. 2013;Thyagarajan et al. 2013;Liu et al. 2014a,b).The boundary of the wedge can be expressed as a function of the cylindrical Fourier space coordinates ( ⊥ ,  ∥ ), where  () ≡ √︁ Ω  (1 + ) 3 + Ω Λ ,  FoV is the angular radius of the field of view of the interferometer, Ω  is the normalized matter density, and Ω Λ is the normalized dark energy density.The equivalency in Equation ( 1) defines  to be the angle between the wedge boundary and the  ⊥ -axis (Fourier modes perpendicular to the LoS).A visual depiction of the foreground wedge is presented in Figure 1, illustrated in the two-dimensional plane of cylindrical Fourier space.While the complementary EoR Window in Figure 1 denotes the area of Fourier space that is not readily contaminated by foregrounds, restricting one's observations to this region ensures any cosmological information located within the wedge is not accessed.This method of foreground avoidance alone is thus unfavourable for 21-cm tomography given all Fourier modes are necessary for high fidelity imaging (Liu & Shaw 2020).Such complications motivate the consideration of alternative foreground mitigation techniques.
Recently, deep learning-based approaches to foreground removal have been presented in the literature (e.g.Li et al. 2019;Makinen et al. 2021;Gagnon-Hartman et al. 2021;Bianco et al. 2023).In particular, Gagnon-Hartman et al. (2021), henceforth referred to as GH21, demonstrated the success of a U-Net-based deep learning algorithm to recover Fourier modes that are obscured by foregrounds.Their algorithm does not rely on any knowledge of the foregrounds themselves, and enables the reconstruction of mock 21-cm images at fixed redshifts after all modes lying within the foreground wedge have been completely nulled out.Importantly, GH21 showed that the reconstruction of ionized bubbles was still possible even in the presence of such an aggressive foreground filter.More recently, Bianco et al. (2023) demonstrated an alternative U-Net based-architecture that was able to identify neutral and ionized regions in foreground contaminated 21-cm maps.The work of Bianco et al. (2023) differs from GH21 however, in that Bianco et al. (2023) simulated foreground removal with a wide variety of methods including Principal Component Analysis, foreground wedge removal, and polynomial fitting to simulate realistic foreground residuals.Importantly, both GH21 and Bianco et al. (2023) noted the applicability of such reconstructive algorithms for follow-up observations of ionizing sources, notably galaxies, located within recovered ionized regions.
In this study, we advance the work of GH21 in two primary ways.First, we modify the U-Net algorithm employed to improve recovery performance post-foreground wedge excision, and extend the recovery effort to include 21-cm maps with light-cone effects (Datta et al. 2014;Plante et al. 2014).Given light-cone effects will be implicit in future observations of the EoR, this effort will provide a more realistic picture of the success of the U-Net recovery methodology presented in GH21.As the second and principal advancement, we demonstrate how one can use U-Net recovered 21-cm light-cones to direct high-redshift galaxy surveys and provide context to existing galaxy catalogues.The basis for our investigation stems from the prospect of obtaining foreground-contaminated tomographic maps of the 21-cm signal from experiments such as SKA.Applying the foreground removal algorithm and subsequent U-Net reconstruction framework we detail in this paper, one can identify the location, morphology, and size of ionized bubbles in 21-cm light-cones.To study the connection between ionized bubbles and photon-producing galax- ies, one can use foreground-decontaminated maps to guide searches for these galaxies and supply information regarding the ionization environment of existing galaxy observations.Because 21-cm intensity mapping surveys such as SKA are designed to image an expansive field with a relatively limited angular resolution, they are unable to resolve individual galaxies at EoR redshifts.The required angular resolution is, however, well within the capabilities of current and next-generation galaxy surveys such as the James Webb Space Telescope (JWST; Gardner et al. 2006) and the Nancy Grace Roman Space Telescope (Roman; Akeson et al. 2019).Each of these instruments have a much higher resolution and smaller field-of-view compared to SKA, possessing sufficient sensitivity to identify and study individual galaxies during the EoR (Beardsley et al. 2015).Depending on the relative timelines of SKA observations and galaxy surveys, the utility of recovered 21-cm light-cones is variable.For galaxy surveys completed prior to the availability of SKA observations, recovered light-cones may provide supplemental information to existing high-redshift galaxy catalogues.Conversely, following the operational window of SKA, recovered light-cones may be used to guide searches for galaxies located in ionized regions.In either case, characterizing the impact an imperfect U-Net recovery will have on the inferred luminosity functions is necessary.
In what follows, we will demonstrate how 21-cm images can be used in cooperation with high-redshift galaxy surveys to identify the ionization environment of galaxies during the EoR.In Section 2 we discuss the generation of 21-cm images, halo catalogues, and outline the introduction of instrumental and foreground effects to corrupt 21-cm images.The network architecture, training procedure, and recovery performance are presented in Section 3. In Section 4 we discuss the halo-to-galaxy connection and explore the efficacy of recovered 21-cm light-cones in the context of high-redshift galaxy mapping.We summarize our conclusions in Section 5.

21-cm Brightness Temperature Fields
To train, validate, and test our network, we generated a suite of cosmological simulations of the 21-cm brightness temperature field (Δ 21 ) over a range of redshifts during the EoR.To generate these fields, we used the Python-wrapped version, py21cmFAST, of the seminumerical cosmological simulation code 21cmFASTv3 (Mesinger et al. 2010;Murray et al. 2020).We generated two primary simulation databases; the first consisting of 21-cm brightness temperature fields evaluated at fixed redshifts (coeval-boxes), and the second consisting of 21-cm brightness temperature fields with light-cone effects (light-cones).We use the coeval-box database as a means of control to guide our intuition regarding our network's performance on the light-cone database, and to demonstrate an improvement in performance relative to GH21.Following GH21, we fix all of our simulations to have a spatial resolution of Δ = 1.5 cMpc.We set the dimensions of our coeval-box simulation volume to be 128×128×128 voxels and our light-cone simulation volume to be 128 × 128 × 768 voxels.We employ 21cmFASTv3's default astrophysical and cosmological parameter values when generating simulations.Importantly, each simulation is generated from a different random seed, ensuring that the initial conditions of the cosmological density field are unique to each realization.It should be noted that 21cmFASTv3 uses the Park et al. (2019) parameterization of galaxy properties, calibrated to high- UV luminosity functions.The details of our galaxy modeling will be described more in Section 4.
Our coeval-box database consists of 780 21-cm brightness temperature boxes, distributed between  = [4.50, 10.75].This redshift interval corresponds to a range in the neutral hydrogen fraction,  HI ≈ [0.00015, 0.95].Of the 780 coeval boxes, 680 are distributed uniformly between  = [6.75,8.50] (or  HI ≈ [0.25, 0.75]) with Δ = 0.25 spacings.The remaining 100 coeval boxes are generated between  = [4.50,6.50] and  = [8.75,10.75] with Δ = 0.50 spacings, and are reserved exclusively for post-training out-of-distribution testing (see Section 3.2 for details).Our light-cone database consists of 350 21-cm brightness temperature light-cones, each extending from  = 5.577 to  = 8.943 along the LoS direction.In addition to our database of "true" 21cmFAST light-cones, we also generate a secondary database of 50 approximate light-cones.Approximate light-cones are produced by concatenating coeval-boxes generated from the same initial conditions (random seed) at a series of increasing redshifts.The resulting light-cones evolve very coarsely along the LoS.The coeval-boxes comprising these light-cones have the same shape and physical dimension as those in the coeval-box database (128 × 128 × 128 voxels, 192 × 192 × 192 cMpc 3 ).To produce each approximate light-cone we concatenate 6 coeval-boxes along the LoS axis, such that the final shape agrees with those in our "true" light-cone database (128 × 128 × 768 voxels).The constituent coeval-boxes are generated at 6 intermediate redshifts within the "true" light-cones, selected at 128-voxel increments along the LoS starting from  = 5.786.This interpolation method was chosen because the resulting light-cones were found to have a  HI that matched closely with the  HI of "true" light-cones.The approximate light-cone database is necessary to facilitate the high-redshift galaxy recovery analysis discussed in Section 4 due to the intrinsic limitations of the halo-finding algorithm that is employed.We elaborate on this further in Section 2.2.In Section 3.3, we verify the validity of this approximation, confirming that the network's performance is robust to "true" and approximate light-cones.

Halo Catalogues
While the previous section addressed the generation of 21-cm images, we must also generate a corresponding database of galaxies, such that we can establish the desired connection between 21-cm imaging experiments and galaxy surveys.To facilitate this, we first consider the dark matter halos within which galaxies form (e.g.Wechsler & Tinker 2018).To generate dark matter halo catalogues corresponding to our 21-cm brightness temperature fields, we use the 21cmFAST halo finder.The halo finder is run on each coeval box's density field to return a list of the masses and spatial coordinates of halos within the simulation volume.By default, the 21cmFAST halo finder generates halo masses consistent with the Sheth-Tormen mass function (Sheth et al. 2001), and assumes a turnover mass  Turn = 10 8.7  ⊙ .This quantity determines the minimum mass and total number of halos identified in the simulation volume.A more in-depth explanation of the 21cmFAST halo finder is provided in Mesinger & Furlanetto (2007).If we consider only Population II galaxies, the reionization history is weakly sensitive to  Turn , because the star formation efficiency inferred from UV luminosity functions falls steeply as halo mass decreases (Mirocha et al. 2016;Park et al. 2019).If, however, the physics of star formation changes in low-mass halos, and/or there are new source populations at high-, then our model is subject to change (e.g.Qin et al. 2020;Muñoz et al. 2022;Gessey-Jones et al. 2022).Indeed, early JWST results hint at departures from Hubble-era model predictions, at least at  ≳ 10.
The motivation for generating the secondary approximate lightcone database discussed in the previous section is because the 21cmFAST halo finder is configured to process coeval-boxes.Thus, we run the halo finder on an approximate light-cone's composite coeval-boxes to generate an effective light-cone halo catalogue.Figure 2 illustrates the halo mass functions (HMFs) calculated from each of these halo catalogues, where each curve denotes a different random realization.The HMFs are grouped by the redshift of the coeval-box on which the halo finder was run.A transverse slice of a light-cone is shown in Figure 2 overlayed with the corresponding halo catalogue.It is immediately obvious that nearly all halos fall within the ionized (white,  HI = 0) regions of the binarized 21cmFAST brightness temperature field.This provides a preliminary indication that our methodology aimed at using ionization maps as guides for high-redshift galaxy observations is promising.The galaxy-recovery analysis we conduct in Section 4 investigates the specific nature of this relationship for U-Net recovered foreground wedge-removed light-cones.

Instrumental Noise and Foreground Removal
Having generated noiseless and foreground-free coeval box and lightcone databases, we corrupt these images by introducing real-world instrumental and foreground effects.To begin, we subtract off the mean of the signal for each two-dimensional slice transverse to the LoS-axis, Δ T21 ().Doing so simulates the effect of observing the sky with an interferometer, whereby we measure fluctuations in the mean brightness temperature after discarding the null ( = 0) baseline that corresponds to the u = 0 mode, where u ≡ (, ) is the Fourier dual to angular coordinates on the sky.Adopting SKA1-Low as our fiducial instrument, we calculate its uv-coverage and instrumental noise using the tools21cm Python package (Giri et al. 2020).The uv-coverage is computed from the specific antenna configuration of  [6.24, 6.74, 7.29, 7.90].Right: A two-dimensional slice of a binarized 21-cm brightness temperature field at  = 7.29 overlayed with the corresponding halo field (orange).When binarizing the 21cmFAST brightness temperature field, we apply a strict binarization threshold of 0 mK, mapping all pixels with Δ 21 > 0 to  HI = 1.Black pixels represent regions that are completely neutral or partially ionized in the original 21-cm brightness temperature field, while white regions represent completely ionized regions in the original field.Note that nearly all halos fall in ionized regions of the map.
the interferometer (SKA1-Low 20161 ) after assuming a particular observation strategy.In this work we use an integration time of  int = 10 s, a daily observation time of  obs,day = 6 hrs, and a total observation time of  obs,tot = 2000 hrs.We multiply the Fourier transform of each LoS slice, Δ T21 (, ), by the instrument's binarized -coverage (see Figure 3) at the corresponding redshift ,  bin  (), to filter out any -modes that are not sampled by the interferometer, producing the -coverage limited 21-cm brightness temperature field Δ 21,uv-lim (, ), where x is the transverse position vector.
The instrumental noise for each slice along the LoS-axis is computed using a system temperature In -space we generate zero-mean complex Gaussian random noise with standard deviation [Jy] (3) on the sampled -modes in each LoS-slice, Δ T (, ).In Equation (3),   is Boltzmann's constant,  ant is the effective antenna area, Δ the frequency depth of each LoS-voxel, and   is the total number of times a given -mode is sampled over  obs,tot due to both redundant baselines and rotation synthesis.We then add the inverse-Fourier transform of this noise-realization, Δ  (, ), to Δ 21,uv-lim (, ), , -modes with higher   will have a higher signal-tonoise.Considering the concentration of   at lower  in the above plot, lower -modes are less noisy than higher -modes.
producing the final noisy 21-cm brightness temperature field Δ 21,noisy (, ) = Δ 21,uv-lim (, ) + Δ  (, ). (4) With the noiseless and instrument-affected coeval-boxes and lightcones on hand, we simulate the removal of foreground-contaminated Fourier modes using two different algorithms.For our coeval-box database, we follow the wedge-removal procedure outlined in GH21 and for our light-cone database, we follow the wedge-removal procedure outlined in Prelogović et al. (2021).The former algorithm is as follows: (i) Fourier transform all three axes of a coeval-box, (ii) zero out Fourier modes located outside of the EoR Window (see Figure 1), (iii) inverse Fourier transform the result.
The latter algorithm is applied to our light-cone database (both "true" and approximate light-cones) and accounts for the redshiftdependence inherent to the wedge-boundary definition in Equation ( 1).This requires a recalculation of the wedge's footprint along the LoS-axis of the light-cone.Thus, for each slice along the LoS at comoving distance  ∥ ; (i) select the section of the light-cone in the range  ∥ ± Δ/2 for Δ = 192 cMpc, (ii) multiply the selected light-cone by the Blackman-Harris (BH) tapering function along the LoS-axis, (iii) 3D Fourier transform the product and zero-out all Fourier modes located outside of the EoR window, (iv) inverse-Fourier transform the result, saving only the central slice.
The foreground wedge removal algorithm outlined above reduces the dimension of light-cones along the LoS because the required Δ/2 buffer in step (i) causes an edge effect.Thus, the foreground wedge-removed light-cones that are passed to our neural network have dimension 128 × 128 × 512 voxels (corresponding to a volume of 192 × 192 × 768 cMpc 3 ).Each of the aforementioned algorithms calculate the wedge-boundary using the pessimistic assumption,  FoV = /2.This choice of  FoV maximizes the footprint of the wedge.Additionally, intrinsic foregrounds (see Figure 1) are assumed to span a width where  0 is Hubble's constant,  21 is the rest-frame emission frequency of the 21-cm line, and   min and   max are the redshifted 21-cm frequencies evaluated at the minimum and maximum redshifts  min and  max of the light-cone section considered, respectively.For the redshifts considered in this work, 0.05 ⪆ Δ ∥ ⪆ 0.03 cMpc −1 .In the case of coeval-boxes, where the redshift is constant across the entirety of the box, Δ ∥ is assumed to be 0.03 cMpc −1 .For the particular coeval-box volumes we consider, this amounts to removing the first  ∥ mode.Given a noisy 21cmFAST coeval-box or light-cone, the above algorithms produce what SKA1-Low would see after foregroundcontaminated wedge modes have been excised.Figure 4 illustrates the distortions introduced by instrumental effects and the foreground removal procedure on 21cmFAST light-cones.Evidently, the morphology of structures both along the LoS-axis and transverse to the LoS-axis are considerably deformed.

U-Net Architecture
The neural network architecture employed in this work is the same as the U-Net presented in GH21, which in turn draws heavily from the architecture presented in Isensee et al. (2019).A schematic of the U-Net we use is shown in Figure 5.The context modules (shown in light blue) in the downwards (left) path of the U-Net consist of successive 3 × 3 × 3 convolutional layers.In the upwards (right) path of the U-Net, upsampling modules (shown in orange) consist of a threedimensional upsampling layer followed by a 3 × 3 × 3 convolutional layer, localization modules (shown in dark blue) consist of a 3 × 3 × 3 convolutional layer followed by a 1 × 1 × 1 convolutional layer, and segmentation layers (shown in grey) consist of a three-dimensional upsampling layer.
As a reminder, the problem our U-Net is configured to solve is one of two-class image segmentation.A well-trained network should thus be able to identify each voxel within a foreground wedge removed 21-cm map as neutral or ionized (1 or 0).While our network is not designed to produce exact binary outputs, a binarization filter is used when evaluating predictions on the testing set.Such external binarization was implemented to incentivize the network to produce near-binary outputs.We emphasize that this binarization filter is not part of the U-Net architecture, and is therefore not present in Figure 5.As in GH21, we find that recovery performance is insensitive to the binarization threshold (changing minimally when varying the threshold from 0 to 1), confirming that our network has been successfully incentivized.We implement the same binarization threshold of 0.9 as GH21 on the basis that it classifies even slightly ionized regions (as determined by our U-Net) as completely ionized.This process maps all voxels in the prediction with a value greater than or equal to 0.9 to 1 and less than 0.9 to 0.
During training, we pass normalized (between 0 and 1) foreground wedge removed 21-cm maps as inputs and binarized 21-cm maps as labels to our network.For this binarization of labels (i.e., separate from the binarization of the predictions discussed previously), we use a threshold of Δ 21 = 0 mK, such that neutral voxels, where Δ 21 > 0, have a value of 1 and ionized voxels, where Δ 21 = 0, have a value of 0. These binarized 21-cm maps act as the groundtruth comparison for the network, and are used to compute the loss during training and the recovery prediction statistics after training.We train our network using a modified binary dice coefficient (BDC) loss function where  and  are the ground-truth and prediction arrays,  is an additive parameter for numerical stability, and | . . .| denotes the cardinality of the respective array.During training  is set to 1, which is comparatively small relative to the size of the arrays we are working with (e.g.| | ≈ 10 7 ).The BDC measures the voxelwise agreement between the ground-truth and prediction and has demonstrated considerable utility in the fields of computer vision and biomedical image segmentation (Milletari et al. 2016;Jadon 2020).
As a departure from the network configuration presented in GH21, we introduce minor modifications after performing hyperparameter tuning on a few key network parameters.These included the rate of spatial dropout in context modules, the degree of  2 regularization in convolutional layers, and the batch size.While the effects of spatial dropout were also explored in GH21, we introduced  2 regularization to combat model overfitting and improve training sta-   6) is L = L BDC + Ω, and is therefore no longer bounded above by 0. At early epochs, a large Ω can thus overwhelm L BDC , producing a large positive L. We found that applying  2 regularization with   2 = 0.01 only to the kernel (not the bias) of our U-Net's convolutional layers, implementing a spatial dropout rate of 0.3, and a batch size of 3, produced the best results, minimizing the loss function and stabilizing the learning process.

U-Net Predictions
Sample two-dimensional slices of the U-Net's three-dimensional predictions on the noiseless and SKA-noisy approximate light-cone test sets are displayed in Figure 7.One should note that the noiseless and noisy datasets have the same noiseless ground-truth binarized 21-cm brightness temperature maps during training.Thus, all comparisons during training are made in reference to the same ground-truth, independent of the noise characteristics of the particular dataset.
To quantify the U-Net's recovery performance on each test set we use accuracy, precision, recall, and mean intersection-over-union (mean IoU) as metrics.To compute these statistics, we apply the binary classification scheme in Table 1 to the voxels of our predicted maps.
Using the classification presented in The final statistic, mean IoU, quantifies the degree of overlap between the ground-truth and prediction mean IoU = 1 2 We compute these statistics over each of the four test sets described in Section 3.2 as well as the two approximate light-cone datasets (noiseless and noisy).Figures 8 and 9 tabulate the value of each statistic as a function of neutral fraction and redshift.Given the steadily increasing accuracy, precision, recall, and mean IoU curves in Figures 8 and 9, it is evident that the overall recovery performance of our network increases as redshift decreases, a trend consistent with that reported in Hassan et al. (2020).In Figure 8  loss curves for each of the light-cone datasets.The lack of a significant offset between any pair of training and validation curves is indicative of a welltrained, generalized model.The lack of any appreciable gradient in the slope of the loss curves at later epochs suggests we have hit the limitations of our network architecture and training regimen.Note that we have implemented an upper limit on the -axes that masks the vertical extension of the learning curves where L > 0. This was done to reduce the dynamic range of the plot, increasing the scale of the relevant behaviour visible at later epochs.
at the beginning of reionization is very difficult (and similarly for tiny neutral islands when reionization is almost entirely complete).
The redshift-dependent performance between these extremes can be explained intuitively by considering how the structures of interest in binarized 21-cm maps evolve with redshift.As reionization progresses, smaller, isolated bubbles merge to form larger ionized bubbles.These larger structures are more easily learned by our network, and thus are more consistently reproduced in predicted maps.Therefore, our network is well-suited to identify and reconstruct the location and general morphology of the largest ionized bubbles in the simulation volumes.This conclusion is quantitatively supported by considering the normalized cross-power spectra of the ground-truth and prediction arrays, defined as where G and P are the three-dimensional Fourier transforms of the ground-truth and prediction arrays  and , respectively.Implicit in this equation is the binning of each product (whether GP * , GG * , or PP * ) into ( ⊥ ,  ∥ ) bins.In the case of a perfect reconstruction, where  = , N = 1 for all ( ⊥ ,  ∥ ).The two-dimensional normalized cross-power spectrum provides a means of quantifying the fidelity of the network's recovery on different spatial scales along different axes.Further, because we excise all modes lying within the foreground wedge, N ( ⊥ ,  ∥ ) demonstrates explicitly how well our network is able to recover excised modes.Figures 10 and 11 show the normalized cross-power spectra for the noiseless and SKA-noisy coeval-box and light-cone predictions, respectively.The boundary of the foreground wedge is plotted overtop the spectra, such that all ( ⊥ ,  ∥ )-modes lying between the boundary and the  ⊥ -axis are reconstructed by the U-Net.There is clearly a successful reconstruction at some level (especially at low  ≡ √︃  2 ⊥ +  2 ∥ ).While the recovered modes certainly are not perfect (and at very high  there is essentially no recovery at all), Figure 7 shows that the limited recovery is at least enough to accurately reconstruct the larger ionized regions.
The accentuated drop-off in recovery performance towards higher redshifts for the noisy datasets in Figures 8 and 9, is also noticeable in Figure 10, and is due in part to the compounding effect of a declining signal to noise ratio (SNR).This is a result of the redshift dependence inherent to Equation (3), and further explains the expanding offset between the performance statistics of the noiseless and noisy test sets in Figures 8 and 9, and the larger discrepancies between the noiseless and noisy spectra in Figure 10 at higher redshifts.
One should note that the prominent sawtooth structure apparent in Figure 9 arises due to the effects of binarizing light-cone slices that are generated by interpolating between successive coeval-boxes within the 21cmFAST light-cone generation algorithm.This structure is therefore not an artefact of the U-Net recovery, but rather a product of the binarization filter necessary to compute the recovery statistics.
One important takeaway from Figures 9 and 11 is that the network's recovery performance over the approximate light-cone datasets is consistent with the "true" light-cone datasets.This indicates that we have reasonably captured some of the more salient characteristics of the standard 21cmFAST light-cone when constructing light-cones from coeval-boxes with very coarse redshift spacings.
In summary, we have demonstrated that our U-Net can successfully reconstruct foreground wedge-removed coeval boxes across a wide range of redshifts.As a notable advancement to GH21, we have extended the U-Net's capability to process light-cones, which incorporate the redshift evolution that will be implicit in real observations.Further, we have demonstrated that the U-Net's recovery is still reliable when instrumental limitations and noise are accounted for.

GALAXY RECOVERY
Using the recovered approximate light-cones along with their corresponding halo catalogues, we now shift to a discussion of galaxy recovery.As alluded to previously, the utility of recovered light-cones is two-fold; (1) for galaxy surveys completed prior to the availability of tomographic 21-cm datasets, recovered light-cones will supplement existing galaxy catalogues with information regarding the ionization state of the surveyed regions, (2) once tomographic 21-cm datasets are available, light-cones may guide searches for galaxies in ionized regions.For example, if a collection of ionized regions are identified where galaxies have yet to be detected, proposals for follow-up observations with higher sensitivity may be scheduled to  Notably, the network is not able to reconstruct the correct shape and location of smaller scale ionized bubbles.
probe for lower luminosity galaxies.The size of recovered ionized regions may also be used to prioritize spectroscopic follow-up.For example, given Lyman- emitters are expected to reside in large ionized regions during the EoR (e.g.Furlanetto et al. 2004), one may prioritize these regions for follow-up given their importance as a highly sensitive probe of reionization (e.g.Haiman 2002;McQuinn et al. 2007).
With respect to (1), it is of interest to quantify how well the galaxy luminosity function (LF) measured in the predicted ionized regions matches the LF of galaxies located in the ground-truth ionized regions.To summarize the implications an imperfect light-cone recovery will induce on the inferred galaxy LF, we compute a multiplicative correction factor to map from the inferred to the true galaxy LF.
To perform this analysis, we adopt the relationship between halo mass and rest-UV luminosity in Mirocha et al. (2016) (see Figure 12), a semi-empirical model calibrated to high-z luminosity functions from Bouwens et al. (2015), consistent with the Park et al. (2019) parameterization employed in 21cmFASTv3.The model assumes a double power-law relationship between halo mass and the efficiency of star formation, and a constant conversion factor between star formation rate and UV luminosity from the BPASS version 1.0 models (Eldridge & Stanway 2009) for solar metallicity stars forming at a constant rate for 100 Myr.Dust reddening is neglected in these models for simplicity.
Having converted the halo catalogues into galaxy catalogues using the aforementioned relation, we sort galaxies using an analogous classification scheme to that presented in Table 1: (i) true positive galaxies (TP gal ) are located in ionized voxels of the ground-truth and ionized voxels of the prediction, (ii) false negative galaxies (FN gal ) are located in ionized voxels of the ground-truth and neutral voxels of the prediction, (iii) false positive galaxies (FP gal ) are located in neutral voxels of the ground-truth and ionized voxels of the prediction, (iv) true negative galaxies (TN gal ) are located in neutral voxels of the ground-truth and neutral voxels of the prediction.
Using this classification scheme, we define the following LFs: (i) the ground truth ionized (GTI) LF, Φ GTI , as the LF of galaxies located in ionized voxels of the ground-truth (TP gal +FN gal ), (ii) the ground truth neutral (GTN) LF, Φ GTN , as the LF of galaxies located in neutral voxels of the ground-truth (TN gal +FP gal ), (iii) the predicted ionized (PI) LF, Φ PI , as the LF of galaxies located in ionized voxels of the prediction (TP gal +FP gal ), (iv) the predicted neutral (PN) LF, Φ PN , as the LF of galaxies located in neutral voxels of the ground truth (TN gal +FN gal ), (v) the global LF, Φ Global , to be the LF of all galaxies x HI z z 4.5 6.5 7.5 8.5 10.5

SKA-noisy
Figure 8.The accuracy, precision, recall, and mean intersection-over-union (mean IoU) computed using Equations ( 7)-( 10) for the noiseless and SKA-noisy coeval-box test sets as a function of neutral fraction (and redshift).The green shaded region denotes the neutral fraction interval on which the network was trained, while data points outside of this region represent out-of-distribution samples used to evaluate the generalizability of the U-Net.Each data point represents the mean over 10 coeval-box realizations at the same redshift.Each data point is plotted with 1 error bars.There is an evident drop-off in recovery performance as a function of increasing neutral fraction for both the noiseless and SKA-noisy test-sets. .The accuracy, precision, recall, and mean intersection-over-union (mean IoU) computed using Equations ( 7)-( 10) for the noiseless and SKA-noisy light-cone "true" (solid curves) and approximate (circular markers) test sets as a function of neutral fraction.The shaded regions indicate the 1 error bands for each "true" statistic (computed over 50 light-cones).The sawtooth structure of the curves arises due to the effects of binarizing light-cone slices that are generated by interpolating between successive coeval-boxes in 21cmFAST.
(TP gal +TN gal +FP gal +FN gal ), irrespective of their location in an ionized or neutral voxel.
The mean Φ GTI , Φ GTN , Φ PI , Φ PN , and Φ Global computed over 50 noiseless and noisy approximate light-cone galaxy catalogues are presented in Figure 13.There is a notable discrepancy between the magnitudes of Φ GTN and Φ GTI (Φ GTI ≫ Φ GTN ), given the vast majority of galaxies reside in ionized regions of the ground-truth maps (see Figure 2).While the gap between Φ PN and Φ PI is noticeably smaller, across both the noiseless and SKA-noisy datasets the majority of galaxies are still located in ionized regions of the recovered light-cones.This suggests that a galaxy search limited to the ionized regions of recovered light-cones will yield the largest fraction of the total galaxy population.Thus, we may optimize follow-up observations by allocating less observing time to regions with a lower probability of containing galaxies (neutral regions).To determine the redshift at which a targeted search is most efficient, we consider The boundary of the foreground-wedge for redshifts 6.00, 7.11, and 8.22 are overlayed in red, white, and orange, respectively.These redshifts represent the minimum, median, and maximum wedge-angle used to excise the foreground wedge from light-cones using the algorithm described Section 2.3.As in Figure 10, we have again limited the extent of the  ⊥ -axis to ∼ 0.2 cMpc −1 given N ∼ 0 for all modes beyond this cutoff.our proposed scheme of using recovered ionization maps as guides for galaxy searches is most efficient at redshift 7.90 for the noiseless dataset and redshift 7.29 for the SKA-noisy dataset (in the sense that these would maximize the number of galaxies found per volume searched).
If one is to use the ionized regions of recovered light-cones to determine the LF of galaxies in ionized regions during the EoR, the inferred LF would follow the definition of Φ PI .Given we are interested in Φ GTI , we define an absolute magnitude-dependent correction factor Θ( UV ) to provide a mapping from the inferred LF to the true LF.This multiplicative correction factor is defined as where ⟨. . .⟩ denotes an average over our ensemble of simulations.This correction factor, computed using the LFs in Figure 13  Φ GTI , we compute the relative error For example, given Roman is projected to observe only the brightest galaxies at EoR redshifts, this coincides with the domain where 1.2 ⪆ Θ( UV ) ⪆ 1.0 and 1.7 ⪆ Θ( UV ) ⪆ 1.0, for the noiseless and noisy light-cone galaxy catalogues, respectively.Conversely, surveys such as JWST-UD will be able to observe nearly the entire LF, requiring the application of a larger correction factor to the fainter end.
As redshift decreases, the vertical offset between Φ GTI and Φ PI in Figure 15 decreases as well, resulting in curves that nearly completely overlap by redshift 6.24.As a result, Θ( UV ,  = 6.24) ≈ 1 across nearly the entire  UV range.This trend is consistent across both the noiseless and SKA-noisy galaxy catalogues, and may be attributed to the overall improvement in U-Net recovery performance at lower redshifts.It is important to note that Θ( UV ) > 1 for all four redshifts we consider.This is due to our network "under-ionizing": producing predicted light-cones with a greater neutral fraction than the groundtruth light-cones.Figure 7 demonstrates this explicitly, whereby there  are fewer ionized (white) regions in the predictions compared to the ground-truth.As a result, fewer galaxies are located in predicted ionized regions.This discrepancy is increasingly prevalent at higher redshifts, explaining the growing amplitude of Θ( UV ) in Figure 15 as a function of redshift.The underlying  UV -evolution of Θ( UV ) at higher redshifts suggests that our network more reliably recovers the ionized regions containing brighter galaxies (or more massive halos).Given our U-Net is well-suited to consistently identify the largest ionized regions, this indicates a relationship between galaxy luminosity and ionized bubble size, whereby the brightest galaxies (or most massive halos) reside in the largest ionized bubbles (although in more detailed models there can be situations where the brightest galaxies do not always reside in the centres of large ionized regions; see Mirocha et al. 2021).In principle, this relationship can be leveraged to further improve the performance of our network, but we leave this possibility to future study.The relative error between Φ GTI and Φ CorrPI in Figure 15 varies considerably at lower  UV .This may be attributed to a higher variance in the number of extremely bright galaxies (or the most massive halos) present in each simulation volume.The growth of ΔΦ( UV ) with increasing redshift may also be attributed to the higher variance in recovery performance presented in Figure 9.

CONCLUSIONS
In this paper, we have successfully expanded upon the U-Net recovery framework originally presented in GH21.As an important development to the work of GH21, we extend the U-Net's capability  [6.24, 6.74, 7.29, 7.90].The range of limiting absolute UV-magnitudes for representative Roman, JWST-WF, JWST-MD, and JWST-UD galaxy surveys are plotted as grey bars.Their width is determined by the minimum  UV that is detectable at the minimum and maximum redshifts we consider, provided the fiducial limiting apparent magnitudes of 32, 30.6, 29.3, and 26.5 for the JWST-UD, JWST-MD, JWST-WF, and Roman galaxy surveys, respectively (Mason et al. 2015).All galaxies with a  UV less than the left edge of each survey's grey bar are luminous enough to be detected by the survey at all redshifts we consider.Second row: Noiseless and SKA-noisy GTI (dashed line) and PI (dot-dashed line) LFs from an additional set of 50 light-cone galaxy catalogues, plotted alongside the corrected PI (CorrPI, solid line) LF computed using the correction factors in the first row.Third row: The relative error between Φ GTI and Φ CorrPI for the noiseless and SKA-noisy datasets.
to process 21-cm light-cones, more closely resembling the observations that will be available from interferometers such as SKA1-Low in the near future.In parallel with this, we perform hyperparameter optimization to improve the overall predictive performance of the U-Net algorithm.We demonstrate that our network is able to reliably identify the location and morphology of the largest ionized regions in foreground wedge removed 21-cm images at EoR redshifts (see Figure 7).Our investigations underline that the U-Net recovery retains some level of reliability even when the instrumental limitations and noise of SKA1-Low are considered, exhibiting a manageable redshift-dependent downturn in predictive performance.We detail the U-Net's redshift-dependent performance as a function of various binary classification metrics and outline the extent of the U-Net's reconstruction effort across different spatial scales using the normalized cross-power spectrum of the ground-truth and prediction light-cones (see Figures 8,9,10 and 11).
As the principal advancement of this work, we establish a connection between the U-Net recovery framework and high-redshift galaxy catalogues.In doing so, we illustrate how U-Net recovered light-cones can provide information regarding the ionization state of the IGM surrounding the galaxies that will be surveyed by current and next-generation instruments such as JWST and Roman.We subsequently outline how the ionized regions of recovered lightcones may be used as a guide for follow-up observations of high-redshift galaxies.We additionally demonstrate how the luminosity function of galaxies located in the ionized bubbles of U-Net recovered light-cones can be corrected to recover the true LF of galaxies in ground-truth ionized regions.We provide estimates of the luminositydependent correction factor and evaluate the efficacy of a targeted galaxy search over a range of EoR redshifts (see Figures 15 and 14).
In future work, comparing the distribution of ionized bubble radii in the ground-truth and recovered light-cones will provide further quantitative insight into the U-Net recovery effort.This will allow for the practical study of the relationship between galaxies (and their properties) and the radii of the ionized regions the galaxies reside in.Given this statement and the simulation model we employ conventionally assumes an inside-out model of reionization, we acknowledge that alternative outside-in models are not considered in this work.Modifying the underlying simulation architecture to account for outside-in reionization (see Pagano & Liu 2020;Pagano & Liu 2021) may therefore drastically change the conclusions of our galaxy survey-related investigations (given galaxies would now preferentially reside in neutral regions).In this direction, future work may also benefit from a generalization to a variable set of astrophysical and cosmological parameters that more accurately reflects our current understanding of reionization.As such, we recognize the validity of our results in this proof-of-concept study are indeed somewhat limited to the fiducial set of parameters and processes assumed in our suite of 21cmFASTv3 simulations.Subsequent analysis may also benefit from incorporating the information present in high-redshift galaxy catalogues into the existing U-Net framework.Providing galaxy location information alongside foreground wedge-removed 21-cm images may serve to improve the overall reconstruction fidelity and is the subject of future work.The implementation of alternative machine learning-based models may also result in better reconstructions.In particular, the use of a probabilistic model (e.g.denoising U-Nets; Masipa et al. 2023) may improve the recovery of small scale power where our deterministic U-Net is currently lacking.
Coupling next-generation 21-cm interferometers with upcoming high-redshift galaxy surveys will enable further insight into how the high-redshift galaxy luminosity function varies across ionization environments during the EoR.Developing novel data analysis frameworks that both mitigate astrophysical foreground contamination and exploit the complementarity of these two classes of observations will ultimately sharpen our understanding of the EoR and the sources that drive its evolution.

Figure 2 .
Figure 2. Left: Halo mass functions computed after running the 21cmFAST halo finder on coeval-boxes comprising the approximate light-cones at each of the composite redshifts  =[6.24, 6.74, 7.29, 7.90].Right: A two-dimensional slice of a binarized 21-cm brightness temperature field at  = 7.29 overlayed with the corresponding halo field (orange).When binarizing the 21cmFAST brightness temperature field, we apply a strict binarization threshold of 0 mK, mapping all pixels with Δ 21 > 0 to  HI = 1.Black pixels represent regions that are completely neutral or partially ionized in the original 21-cm brightness temperature field, while white regions represent completely ionized regions in the original field.Note that nearly all halos fall in ionized regions of the map.

Figure 3 .
Figure 3. SKA1-Low -coverage at 167 MHz ( = 7.5), for 2000 hrs of total observation.The pixel intensity   denotes the number of times a particular -mode is observed.White regions indicate -modes that are not observed by the interferometer (  = 0).The baselines of SKA1-Low span a vast dynamic range and can thus probe a variety of -modes.Given

Figure 4 .
Figure 4. Sample noiseless and noisy 21-cm brightness temperature light-cones before and after foreground wedge removal.The "WF" superscript in the second and fourth rows denote the foreground wedge-removed field.The left column displays a slice taken along the LoS-axis of the light-cone extending from  = 6.01 to  = 8.22.The right column displays a slice taken transverse to the LoS-axis at  = 7.5.

Figure 5 .
Figure 5. Block diagram of the U-Net.The dimension of the input image D is reduced in the downwards (left) path and increased in the upwards (right) path.

Figure 6 .
Figure6.Upper: Training (solid)  and validation (dashed) loss curves for each of the coeval-box datasets.Lower: Training (solid) and validation (dashed) loss curves for each of the light-cone datasets.The lack of a significant offset between any pair of training and validation curves is indicative of a welltrained, generalized model.The lack of any appreciable gradient in the slope of the loss curves at later epochs suggests we have hit the limitations of our network architecture and training regimen.Note that we have implemented an upper limit on the -axes that masks the vertical extension of the learning curves where L > 0. This was done to reduce the dynamic range of the plot, increasing the scale of the relevant behaviour visible at later epochs.

Figure 7 .
Figure 7. Example network predictions along the LoS-axis (first column) and transverse to the LoS-axis (second column) for noiseless and SKA-noisy approximate light-cone datasets.The second and third rows show the U-Net's binarized predictions for the foreground-wedge removed noiseless and SKA-noisy light-cones, respectively.It is clear that our network has successfully reconstructed the general form of the large-scale structure in the ground-truth light-cone.Notably, the network is not able to reconstruct the correct shape and location of smaller scale ionized bubbles.

Figure 10 .
Figure 10.Mean normalized cross-power of the binarized ground-truth and U-Net predicted coeval-boxes at redshifts 6.75, 7.50, and 8.25.The boundary of the foreground-wedge computed at each redshift using Equation (1) is overlayed in red.Note that we have limited the extent of the  ⊥ -axis to ∼ 0.2 cMpc −1 given N ∼ 0 for all modes beyond this cutoff.

Figure 14 .Figure 11 .
Figure14.There, we plot on the horizontal axis the fraction of the simulated volume that is labelled as ionized by our network.On the vertical axis we show the fraction of total galaxy counts that are located in the predicted ionized regions.The grey-dashed line represents the scenario where galaxies are randomly distributed in the simulation volume.Of the four redshifts considered in Figure14,

Figure 12 .
Figure 12.UV luminosity to halo mass relation used to convert halo catalogues into galaxy catalogues.The dashed vertical line denotes the turnover mass used by the 21cmFAST halo finder (10 8.7  ⊙ ).
, is shown in the first row of Figure 15.To evaluate the generality of this correction factor, we compute the mean Φ GTI and Φ PI over an additional set of 50 light-cone galaxy catalogues separate from those used to compute Θ( UV ).These are shown in the second row of Figure 15, alongside the corrected PI LF, Φ CorrPI ( UV ) = Θ( UV ) × Φ PI ( UV ).To evaluate how well Φ CorrPI agrees with
)in the third row of Figure15.The range of limiting absolute magnitudes of various present and upcoming galaxy surveys at the relevant redshifts are plotted as vertical bars in all subplots Figure15.In this work, we consider representative JWST ultra-deep (UD), mediumdeep (MD), wide-field (WF), and Roman galaxy surveys.Observing where the different survey thresholds intersect the curves in Figure15provides an indication of how significant a survey's theoretically observable galaxy population will be impacted by Θ( UV ).

Figure 14 .
Figure14.The mean fraction of the total galaxy population present in the recovered ionized regions of noiseless (top panel) and SKA-noisy (bottom panel) approximate light-cones at redshifts 6.24, 6.74, 7.29 and 7.90.

Figure 15 .
Figure15.First row: The ratios of Φ GTI to Φ PI for the noiseless and SKA-noisy datasets at  = [6.24,6.74, 7.29, 7.90].The range of limiting absolute UV-magnitudes for representative Roman, JWST-WF, JWST-MD, and JWST-UD galaxy surveys are plotted as grey bars.Their width is determined by the minimum  UV that is detectable at the minimum and maximum redshifts we consider, provided the fiducial limiting apparent magnitudes of 32, 30.6, 29.3, and 26.5 for the JWST-UD, JWST-MD, JWST-WF, and Roman galaxy surveys, respectively(Mason et al. 2015).All galaxies with a  UV less than the left edge of each survey's grey bar are luminous enough to be detected by the survey at all redshifts we consider.Second row: Noiseless and SKA-noisy GTI (dashed line) and PI (dot-dashed line) LFs from an additional set of 50 light-cone galaxy catalogues, plotted alongside the corrected PI (CorrPI, solid line) LF computed using the correction factors in the first row.Third row: The relative error between Φ GTI and Φ CorrPI for the noiseless and SKA-noisy datasets.
A qualitative picture of the footprint of foreground contamination relevant to 21-cm radio interferometers.While intrinsic foregrounds uniformly contaminate low  ∥ modes, foreground leakage beyond this region leads to the formation of the foreground wedge, parameterized by the wedge angle .The EoR Window denotes the region of Fourier space where foregrounds are suppressed, in principle allowing for a clean measurement of the 21-cm signal.

Table 1
and indicates how often the predictions match their labels, where  vox is the number of voxels.Precision is defined as Precision = TP TP + FP =  vox correctly labelled as ionized  vox ionized in prediction (8)and measures how many voxels the network labels as ionized are

Table 1 .
Voxel classification scheme used to compute performance statistics.