Deep learning approach for identification of HII regions during reionization in 21-cm observations

The upcoming Square Kilometre Array (SKA-Low) will map the distribution of neutral hydrogen during reionization, and produce a tremendous amount of 3D tomographic data. These images cubes will be subject to instrumental limitations, such as noise and limited resolution. Here we present SegU-Net, a stable and reliable method for identification of neutral and ionized regions in these images. SegU-Net is a U-Net architecture based convolutional neural network (CNN) for image segmentation. It is capable of segmenting our image data into meaningful features (ionized and neutral regions) with greater accuracy compared to previous methods. We can estimate the true ionization history from our mock observation of SKA with an observation time of 1000 h with more than 87 per cent accuracy. We also show that SegU-Net can be used to recover various topological summary statistics, such as size distributions and Betti numbers, with a relative difference of only a few per cent. These summary statistics characterise the non-Gaussian nature of the reionization process.


INTRODUCTION
The Epoch of reionization (EoR) is a period of great importance in the study of structure formation and evolution in the Universe. During this period, the predominately cold and neutral intergalactic medium (IGM) transitioned to a hot and ionized state due to the appearance of the first luminous cosmic sources. These sources, which may have been star-forming galaxies and quasi-stellar objects (QSOs), produced the ionizing photons, which over a period of approximately 500 million years completed the reionization of the Universe Zaroubi 2012;Ferrara & Pandolfi 2014).
This period is one of the least understood epochs in the history of the Universe, due to the lack of direct observations. Indirect constraints have been put on the reionization process based on observations of the Lyman-forest (e.g. Fan et al. 2006;McGreer et al. 2011McGreer et al. , 2014, the number density of Lyman-emitters (e.g. Ota et al. 2008;Ouchi et al. 2010;Robertson et al. 2015), high-z quasar spectra (e.g. Schroeder et al. 2013;Totani et al. 2016;Davies et al. 2018;Greig et al. 2019) and the measurement of the Thomson scattering optical depth towards the cosmic microwave background (CMB) (e.g. Komatsu et al. 2011;Aghanim et al. 2020).
The ground state of neutral hydrogen atom can produce a signal through a spin-flip transition, which is known as the 21-cm signal. This signal will be a unique signature of EoR (e.g. Madau et al. 1997;Furlanetto et al. 2006). When observed, this 21-cm signal would have redshifted to radio band of the electromagnetic spectrum. Various ★ Contact e-mail: M.Bianco@sussex.ac.uk radio experiments, such as Low Frequency Array 1 (LOFAR; e.g. van Haarlem et al. 2013), Murchison Widefield Array 2 (MWA; e.g. Tingay et al. 2013) and the Hydrogen Epoch of reionization Array 3 (HERA; e.g. DeBoer et al. 2017), have been trying to detect this signal. Recently these facilities have provided useful upper limits on the 21-cm power spectrum (e.g. Mertens et al. 2020;Trott et al. 2020) that have been used to derive constraints on the properties of reionization (e.g. Ghara et al. 2020;Ghara et al. 2021;Mondal et al. 2020;Greig et al. 2020a,b).
The 21-cm signal during the EoR will be highly non-Gaussian and therefore the power spectrum will not give a full statistical characterisation of it (e.g. Mellema et al. 2006;Ichikawa et al. 2010;Watkinson & Pritchard 2015;Majumdar et al. 2018;Giri et al. 2019c). In the coming years, the Square Kilometre Array 4 (SKA) will be built. The low-frequency component of the SKA will be sensitive enough to detect the 21-cm signal produced during EoR and create images of its distribution on the sky (Mellema et al. 2013;Wyithe et al. 2015;Koopmans et al. 2015). These images contain more information about our Universe as the detection of the signal at different observed frequencies depict the distribution of neutral hydrogen at a given time during the EoR.
SKA-Low will observe a sequence of such 21-cm images from different redshifts that will constitute a three-dimensional set of data known as a tomographic dataset. The evolution of the 21-cm signal can be seen along the redshift axis. See for example Giri (2019) for more description about tomographic 21-cm images. The reionization process is driven by growing H regions, often referred to as bubbles (e.g. Furlanetto et al. 2004). As the sources of ionizing photons reside inside them, observing these bubbles and their evolution will be interesting. Numerous studies have provided various methods to detect and study properties of H bubbles (e.g. Datta et al. 2007;Zackrisson et al. 2020;Mason & Gronke 2020). We can also study the properties of reionization with 21-cm images (Giri et al. 2018(Giri et al. , 2019a. However, tomographic images from SKA-Low will be prone to instrumental limitations, such as noise, limited resolution and foreground contamination (e.g. Koopmans et al. 2015;Ghara et al. 2016). In the field of image processing, methods that can classify objects or features in images into meaningful segments are known as 'image segmentation' methods. Giri et al. (2018) implemented an image segmentation method to classify neutral and ionized regions in 21-cm images in the presence of instrumental limitations and demonstrated that key properties of reionization can be derived from such observations. Artificial intelligence (AI) and deep learning methods are capable of learning patterns in image data and identifying interesting regions. Image segmentation based on AI is quite popular in the field of data analysis and has been applied to study objects with complex visual form contained in big data (Long et al. 2014). In recent years, several papers made use of machine learning techniques for a range of problems in astrophysics (e.g. Lee 2019; Giri et al. 2019b;Yoshiura et al. 2020;Chen et al. 2020) and cosmology (e.g. Jeffrey et al. 2020;Sadr & Farsian 2020;Guzman & Meyers 2021). In the case of reionization, several of these methods are aimed to either remove foreground emission (Li et al. 2019;Makinen et al. 2021;Villanueva-Domingo & Villaescusa-Navarro 2021), emulate reionization simulations (e.g. Kern et al. 2017;Schmit & Pritchard 2018;Jennings et al. 2018;Cohen et al. 2020;Ghara et al. 2020) or constrain reionization history (e.g. Shimabukuro & Semelin 2017;Chardin et al. 2019;Mangena et al. 2020;Shimabukuro et al. 2020) and its astrophysical inputs (e.g. Sullivan et al. 2017;Gillet et al. 2019;Hassan et al. 2020).
In this work, we present a new approach for the identification of the distribution of H regions in 21-cm images using a deep learning method named U-shaped convolutional neural network (U-Net), which is specially designed for image segmentation and feature extraction (Ronneberger et al. 2015). In our case, we adapt this network for processing our image data, which are mock observations of the 21-cm signal during the EoR. The method will segment the images into ionized and neutral regions. We call this framework SegU-Net.
This paper is organised as follows. In § 2 we present how we generate the simulated data sets used for this work. In § 3 we describe the design of our neural network, including the error estimation. In § 4 we discuss its application to our simulated SKA-Low data sets, considering a range of summary statistics such as the mean ionization fraction, power spectra and topological quantities such as size distributions and Betti numbers. In § 5 we test our framework on various instrumental noise levels, and in § 6 we test it on a data set produced from a fully numerical reionization simulation. We discuss and summarize our conclusions in § 7.

21-CM SIGNAL
For any deep learning based method, we need a data set containing a sample of all the possible scenarios, known as the training set. In § 2.1, we describe the reionization simulation code that we use to create the training set. The observable for radio telescopes observing

Reionization simulation
To train our network, we require a large set of simulations that represent the 21-cm radio signal for a wide range of redshift during reionization and different assumptions about the astrophysical sources of ionizing radiation. To do so we employ py21cmFAST, the Python wrapped version of the semi-numerical cosmological simulation code 21cmFAST Murray et al. 2020). The code computes the evolution of the matter density field using the Zel'dovich approximation (Zel'Dovich 1970). The ionization field and the corresponding 21-cm differential brightness temperature are then calculated from the matter density distribution based on the excursion set formalism (Furlanetto et al. 2004;Mesinger & Furlanetto 2007), which considers a region to be ionized when the fraction of collapsed matter fluctuation exceeds a mass threshold. The ionization fraction HII ( ) at a position is given as, where is the ionizing efficiency of high redshift galaxies and coll ( s , min ) is the fraction of collapsed matter within radius s that can form haloes with mass greater than min . coll is calculated at every pixel varying s within 0 and mfp . The maximum value of coll is used in Equation 1. mfp implements the effect of a finite mean free path for ionizing photons in the ionized IGM.
The cosmological parameters considered in this work are based on WMAP 5 years data observation (Komatsu et al. 2009) and consistent with Aghanim et al. (2020) results. We assume a flat ΛCDM cosmology with the following parameters, Ω Λ = 0.73, Ω = 0.27,

Differential brightness temperature
Radio interferometry based telescopes record the differential brightness temperature b while observing the redshifted 21-cm signal. b depends on position on the sky and redshift and can be given as (e.g. Mellema et al. 2013 where HI , b , CMB and s are neutral fraction, baryon density contrast, CMB temperature and spin temperature respectively. Previous studies have shown that our Universe will be heated before reionization begins (e.g. Pritchard & Furlanetto 2007; Ross Green contours indicate the boundary between neutral and ionized regions after reducing the resolution to an observation with a maximum baseline of = 2 km and matching frequency resolution. Bottom left: the 21-cm signal at simulation resolution. Top right: The 21-cm signal plus noise realisation at simulation resolution for an observing time of 1000 hours. To mimic the effect of the lack of a zero baseline, the mean signal has been subtracted. Bottom right: The noisy 21-cm image after smoothing to the resolution to an observation with a maximum baseline of = 2 km and matching frequency resolution. This is an example of a smoothed box slice used during the network training. The solid black line shows the same contour as in the top left panel. et al. 2017, 2019). Therefore we assume s CMB throughout this work, which is known as the spin saturated approximation and is relevant at lower redshift 12 (e.g. Furlanetto et al. 2004;Furlanetto 2006). In the spin saturated approximation scenario, the differential brightness signal is always in emission ( b ≥ 0 mK) and locations with b = 0 mK correspond to H regions.

Mock 21 cm observation
In order to train SegU-Net for application to actual observations, we need a training set of mock observations. We create these mock observations by simulating the b using the methods described in previous sections and adding instrumental effects, such as the absence of zero baselines, limited resolution and noise. We follow the methods in Ghara et al. (2016) and Giri et al. (2018) for mimicking the expected effects of SKA1-Low.
We consider a simulation volume of (256 Mpc) 3 and an intrinsic resolution of Δ = 2 Mpc for simulating the signal. This intrinsic resolution corresponds to an angular aperture of Δ = 0.777 arcmin and a frequency depth of Δ = 0.124 MHz along the line of sight at = 7. As an example, in Figure 1, we show a coeval cube slice of the neutral fraction field and b field in the top left and bottom left panels, respectively. These slices are taken from the epoch when the universe was about 50 per cent ionized. For each b coeval cube, we assume one axis as the line of sight or frequency direction and subtract the mean signal from each frequency channel, such that this could be considered as a sub-volume from the 3D tomographic data set. We consider this simulation as our reference throughout the results analysis in §4, its astrophysical parameters are given in Table 2.
We simulate the instrumental noise using the method given in Giri et al. (2018) and implemented in Tools21cm 5 (Giri et al. 2020). We change the noise seed for each new member of the training set so that the network is trained on different noise realisations and we list our assumed parameters for the telescope setup in Table 1. In the top right panel of Figure 1, we show a slice from the simulated noise cube produced from 1000 hours of observation with SKA1-Low at simulation resolution. When we add this noise to our simulated signal at the simulation resolution, we cannot discern any feature of the signal as the noise is several orders of magnitude higher than the signal. Therefore we reduce the resolution of the noisy signal in the field-of-view direction by smoothing with a Gaussian kernel with full-width at half maximum (FWHM) of 0 (1 + )/ , where is the maximum baseline. For example, = 2 km corresponds to a resolution of 2.905 arcmins at redshift ≈ 7 and 3.631 arcmins at redshift ≈ 9 respectively. In the frequency direction we reduce the resolution by convolving with a top-hat bandwidth filter of a width matching the FWHM of the angular smoothing in comoving units. This width corresponds to 0.462 MHz at redshift ≈ 7 and 0.551 MHz at redshift ≈ 9 respectively. In the bottom right panel of Figure 1 we show a slice from our noisy signal at this reduced resolution. At this resolution, the smallest H regions seen in the top left panel of Figure 1 can no longer be discerned. However, we can still identify the larger H regions.
To illustrate what we can achieve with these images, we apply the same smoothing to the neutral fraction field and apply a threshold of th = 0.5 to label neutral/ionized regions. We refer to the smoothed and then binarised neutral fraction field as the ground truth. We use this field to compare the accuracy of the recovered binary field throughout our paper. We want to point out that this is different from the ground-truth of the original reionization simulation as the limited resolution of the radio telescope will limit the observation of small scale features. Then, we over-plot the boundaries of these ionized regions, the neutral fraction slice and signal slice in top-left and bottom-right panels of Figure 1 respectively.

Training and testing set
For our training set, we randomly sample the astrophysical simulation parameters by a normal distribution, such that the ionizing efficiency of high-redshift galaxies is sampled with N ∼ (52.5, 20), the mean free path of ionizing photons mfp with N ∼ (12.5 Mpc, 5 Mpc) and the (logarithmically-spaced) minimum virial temperature for halos to host star-forming galaxies min vir with N ∼ (4.65, 0.5). The choice of these values is such that for a majority of the samples most of the reionization history ( V HI from 0.9 to 0.1) falls within the redshift interval 9 to 7. The redshift is randomly sampled with a uniform distribution U ∼ [7,9]. The initial conditions of the cosmological density field are changed for each simulation. This helps us avoid the impact of cosmic variance on our trained model. With the list of all the parameter values, we produce 10,000 mock observations of the 21-cm signal. Out of these mock observations, we use 15 per cent as the so-called network validation set. This validation set is used during the training method to provides an unbiased evaluation of the network model fit.
Eventually, we will use SegU-Net on actual 21-cm image observations. Here we rely on an additional 300 mock observations as the testing or prediction set. Just as for the training set, the parameter values are randomly chosen. We call this the 'random' testing set. The training process is blind to the prediction set. Apart from the above testing set, we create an additional simulation with fixed values of astrophysical parameters (given in Table 2). We have chosen these values such that between = 9 and 7 reionization proceeds from V HI ≈ 0.9 to 0.1. We call this set the 'fiducial' testing set. Since the signal evolves as reionization progresses in this testing set, it better An example of a smoothed cube slice from the C 2 RAY simulation on the right employed to test the stability and reliability of the network. This slice is for = 8.06 and corresponds to a volume-averaged neutral fraction of V HI = 0.38. As for the training set, at simulation resolution, we subtracted the mean signal in the frequency direction from the differential brightness temperature. We then added simulated instrumental noise for the observed time of 1000 hours and smoothed the signal with the same baseline as SKA1-Low. Left panel: the binary field recovered with our neural network. In red/blue, the prediction performed with our network and the green contour shows the boundary between neutral and ionized region. The same contour is shown with a solid black line on the right panel for comparison. mimics the upcoming 21-cm observations. With this testing set, we will test SegU-Net's capability to capture the evolution of structures and recover the binary field from untrained data in § 4.

Fully-numerical simulations testing set
To train SegU-Net, we relied on 21cmFAST for creating the training set. However, our Universe may not exactly be described by this semi-numerical code. If our neural network has learnt to find structures in 21cmFAST simulations only, then we cannot use it for SKA observations. To ensure that the neural network is not over-fitted, we consider a different reionization simulation code to build the mock observations.
We first simulate the matter density field and track the evolution of cosmic structures by using the CUBEP 3 M -body code (Harnois-Déraps et al. 2013). The simulation is carried out in a volume of (349Mpc) 3 with 64 billion particles. Dark matter haloes down to a mass of 10 9 is found at various redshift using the spherical average halo-finder (Watson et al. 2013), meanwhile haloes with masses between 10 8 and 10 9 are implemented with a sub-grid method (Ahn et al. 2015). We use the same cosmology that is given in § 2.1.
We then employ the C 2 RAY radiative transfer (RT) code (Mellema et al. 2006) to simulate the cosmic reionization. C 2 RAY requires the matter density field in a 3D grid. Therefore the distribution -body particles are put in 3D grids with a smoothed particle hydrodynamic method (e.g. Shapiro et al. 1996;Mao et al. 2019). This grid has spatial resolution of Δ = 2.1 Mpc and a 166 3 mesh-grid. Sources ionizing photon production rate per unit time is proportional to the mass of the hosting halo halo such that.
where is the proton mas and Δ = 11.53 Myr is the stars lifetime. The efficiency factor of sources is defined as = ★ esc where ★ is the star formation efficiency, esc is the photons escape fraction and is the stars ionizing photon production efficiency per stellar atom. The efficiency factor for halos with masses halo < 10 9 is set to = 2. For the the lower mass halos it is initially set to = 8.2. When their environment becomes ionized (above 10 percent), their The orange arrow indicates a 2D convolutional layer, followed by batch normalization and ReLU activation. Pooling operations followed by dropout layer are indicated with green arrows. The blue arrow indicates an up-sampling layer by transposed 2D convolutional layer and with a red arrow the closing layer, a 2D convolution followed by a sigmoid activation function. The descending path on the left side divides the resolution of the image after each pooling operation and doubles the channel dimension after each convolution. On the other hand, the expansion path doubles the spatial dimension at each up-sampling operation and decreases the channel dimension after concatenation with its counterpart layer in the descending path. efficiency is reduced to = 2 to account for radiative feedback. C 2 RAY outputs the hydrogen ionization field at a time interval of 11.5 million years. For more details on the RT and -body simulations methods, see Iliev et al. (2012) and Bianco et al. (2021).
We derive the differential brightness temperature b from the ionization field and the density using Equation 2. We select four outputs, which are at redshifts = 7. 96, 8.06, 8.17, 8.28, 8.40, 8.52, 8.64, corresponding to a volume averaged neutral fraction of V HI = 0.17, 0.29, 0.42, 0.57, 0.70, 0.81, 0.90, respectively. The simulated b from these epochs are converted into mock observations using the procedure outlines in § 2.3. We use these mock observations as a testing set.
The right panel of Figure 2 shows a slice of the calculated b for redshift = 8.06 ( V HI = 0.38 at simulation resolution). Similar to the bottom right panel of Figure 1, we add the instrumental noise corresponding to a 1000 h observation and smooth the signal to a resolution corresponding to a maximum baseline = 2 km. The black contours correspond to the boundary between neutral and ionized regions. These boundaries are derived from the simulated neutral fraction field at the same resolution as the b data set.

U-NET FOR 21-CM IMAGE SEGMENTATION
Here we describe our machine learning method for identifying ionized and neutral regions in noisy 21-cm images and our approach to estimate the uncertainty of its results in § 3.1 and § 3.2 respectively.

Our network, SegU-Net
Our segmentation network 6 is based on the U-Net framework first introduced by Ronneberger et al. (2015). U-Net consists of two likewise symmetric paths, an encoder operator that contracts the image and a decoder operator that expands the extracted features. The encoder corresponds to a classical convolutional neural network (CNN). This 6 https://github.com/micbia/SegU-Net CNN aims to reduce the size of the input image in such a way that only information of the most interesting features remains. A series of concatenated convolution operations (layers) returns a low dimensional latent space (or latent vector) that contains information about these extracted features. In Appendix A we provide a visual representation of the low dimensional latent space for the example case of a sphere. We show a schematic representation of the U-Net in Figure 3. The left part of the U-shape in the diagram and the bottom layer represents the encoder and the low dimensional latent space respectively. For a detailed discussion of CNNs, we refer the reader to Mehta et al. (2019), and for examples of employing CNNs to infer cosmological and astrophysical parameters in the context of reionization to Gillet et al. (2019) and Hassan et al. (2020). In our case, the information in the latent space (or latent vector) of U-Net (bottom layer) is expanded by a decoder into a binary map of the same size as the input image. The right part of the U-shape of the diagram in Figure 3 represents the decoder. The decoder gradually increases the spatial resolution of the latent vector with an up-sampling operation (transposed convolution) until we obtain the same dimension of the input image. After each up-sampling step, the output is combined with the corresponding encoder layer with the same dimension. We illustrate this further in Appendix B with an example.
Even though each of our image data sets is 3D, SegU-Net is trained on 2D slices. We identify structures in 3D image data by running on every slice along the third axis. Tests show that the method is not sensitive to the choice of the third axis. When compared to a neural network trained on 3D data, we found that our approach is computationally less expensive without loss of accuracy. Therefore the U-Net architecture described in this work is only applied to 2D image data.
The structure of the encoder layers consists of two convolutional blocks followed by a 2D max-pooling layer (MaxPool) of size 2x2 and a 5 per cent rate dropout layer (Drop). This regularization technique randomly shuts down a portion of the layer neurons to avoid over-fitting (Hinton et al. 2012;Srivastava et al. 2014). The convolutional block (ConvBlock) consists of a 2D convolution layer (Conv2D) with 3x3 kernel size. We add a layer that normalizes the previous input layer over the batch sample to avoid over-fitting (BN) (Ioffe & Szegedy 2015) and as an activation function we employ a Rectified Linear Unit (ReLU) activator (Jarrett et al. 2009;Glorot et al. 2011), ConvBlock=Conv2D+BN+ReLU. This layer structure is repeated for a total of four levels (Encoder-Level). At each step, the dimension of the input image is halved by the max-pooling operation. The number of feature channels is doubled by the convolutional layer, Encoder-Level=2*ConvBlock+MaxPool+Drop. The decoder structure is somewhat similar to the encoder. We replace the pooling operation with a transposed 2D convolution (TConv2D) (Dumoulin & Visin 2016; Zeiler & Fergus 2013), that has an opposite scaling effect on the resolution and channel size. This layer output is then concatenated (CC) with the corresponding encoder level to preserve the features extracted in the contracting path. This step is followed by a dropout layer and two convolutional blocks, Decoder-Level=TConv2D+CC+Drop+2*ConvBlock. The final output consists of a 2D convolutional layer followed by a sigmoid activation. Our network has a total of 23 2D convolutional layers distributed on four down-and up-sampling scaling levels and a bottom layer, for a total of approximately 2.5 million trainable parameters. In Figure 3, we show our best performing network and label the shape of the output from each intermediate hidden layer of this network. More details are provided in Appendix A and B.
During our training process, the hyperparameters of the network are learnt by minimizing a loss function. We employ the balanced cross-entropy (BCE) (Salehi et al. 2017), where ∈ {0; 1} is the pixel-wise ground truth,ˆthe predicted value, the batch size, which is our case is of size 32 and the parameter = =0 is the average volume neutral fraction of the batch. In our context, at early/late stage of reionization the statistical weight of the ionized/neutral pixels are under-represented. This situation is known in data science as a problem affected by "class unbalanced" data. To deal with this we use the above loss function which has been shown to be well suited for segmentation problems that are affected by class unbalanced data (Cui et al. 2019). We further used the Adaptive Moment Estimator Adam (Kingma & Ba 2014), an optimized stochastic gradient descent algorithm for error minimization. The initial learning rate, the step size of the rate of convergence that minimizes the loss function, is set to 10 −3 . We trained the network using 2 GPUs, and it took approximately 1,500 wall clock hours.

Uncertainty estimation on SegU-Net
One of the main drawbacks of machine learning is that it is unable to quantify uncertainties and confidence intervals for its predictions, and only recently attempts have been made (Charnock et al. 2020;Hortúa et al. 2020) to include error estimation. However, this has not yet been generally implemented for U-Nets. Additionally, if not well optimized, neural networks are prone to over-fitting and tend to be biased. Therefore, we have developed an error estimation procedure to be used during the prediction process. This procedure gives our network additional power by providing a pixel by pixel error map.
Image manipulations, such as zooming, shifting along an axis, flipping axes and rotation along an axis, are commonly performed on 2D or 3D image training data to increase the number of samples (Simonyan & Zisserman 2015;Szegedy et al. 2015). This technique is known as time-test augmentation (TTA) of data (Perez & Wang 2017;Wang et al. 2020). Here we use this approach to estimate the error on the final result. We perform several copies (∼ 100) of the same sample during the prediction process through image manipulations. These manipulated copies are then independently processed by SegU-Net. Each of the recovered binary fields is transformed back. We calculate the final result as the average of these fields and the per-pixel standard deviation to estimate the error for each pixel.
An example of the pixel per pixel error map can be seen in Figure 4 (right-most panel). We will discuss this figure further in § 4.1. This simple method provides our neural network with an uncertainty estimation for each labelled pixel.

RESULTS
Once the network is trained, we want to estimate how well it recovers the binary field from noisy 21-cm images. To do so, we include in our analysis the state-of-the-art Super-Pixel method presented in Giri . SLIC groups similar pixels in images into "super-pixels". These Super-Pixels are then classified into neutral and ionized ones to get the final map containing the identified features. In previous studies, this method has been shown to be superior compared to other methods, such as putting a simple threshold to the mean signal (used in e.g. Kakiichi et al. 2017), the k-means method (used in e.g. Giri et al. 2018) or the maximum deviation method (used in e.g. Gazagnes et al. 2021). The Super-Pixel method proves to be quite efficient in recovering the binary fields from noisy 21-cm images. The summary statistics extracted from those are accurately reproducing the ones obtained using the simulation data sets. As shown by Giri et al. (2018), the choice for the number of super-pixels depends on the simulation box size and resolution. In our case, we tested for a few values between 500 and 7000. We noticed that above the value of 5000, the algorithm becomes more computationally expensive without yielding a substantial increase in the segmentation accuracy. Hence we employ 5000 super-pixels.

Visual comparison
To start, we show a visual comparison of slices in Figure 4. We compare the predicted binary field recovered by the Super-Pixel method (left-most panel) and SegU-Net (central panel) with the ground truth (green contours in both panels). As explained in § 2.3 the ground truth is the boundary of ionized regions extracted from the simulation neutral fraction field at the same resolution by putting a threshold of 0.5. The red and blue pixels represent neutral and ionized pixels, respectively. In the right-most panel, we show the pixel-error estimated from SegU-Net with a colour bar. The error is determined by calculating the standard deviation of the same pixel from the differ- Table 3. Summary of the Matthews correlation coefficient score (in per cent) of our two test sets for the two feature identification methods.

2).
SegU-Net shows better precision in recovering shapes of the ionized regions compared to the Super-Pixel method. As expected, most of the network uncertainty is located at the boundaries of neutral regions or between two large ionized bubbles when these are percolating, and the gap is getting narrower. This uncertainty has a direct bearing on small neutral islands of a few Mpc scale, residing in vast ionized regions. Moreover, larger uncertainties, std ≥ 0.25 are located around narrow ionized regions protruding into large neutral regions (e.g. in Figure 4 right-most panel, at coordinates ∼ 140 Mpc and ∼ 125 Mpc). This behaviour suggests that the uncertainty mainly depends on the contrast between the local neutral and ionized regions. The network selects regions in the image based on the largest gradient in the 21-cm signal intensities to recover the binary field. Therefore, we expect larger uncertainties for reionization scenarios in which the contrast in the 21-cm intensities are relatively small.

Correlation coefficient
To compare the predicted ionized fields from the 21-cm images mathematically, we use the Matthews correlation coefficient (MCC) (also known as coefficient) defined as: where TP and TN are the total numbers of neutral and ionized pixels recovered correctly, respectively. FP is the total numbers of pixels incorrectly guessed as neutral and FN is the total numbers of pixels incorrectly guessed as ionized. In our case, a positive/negative result corresponds to the neutral/ionized case since the quantity 1 in our binary fields indicates the neutral condition and 0 the ionized. Thus, MCC is a useful metric to correlate binary fields. In Figure 5 we show the MCC estimated from the fields segmented into ionized and neutral regions in our testing sets. In the left panel, we provide a scatter plot of MCC values against the reionization history ( v HI ) for the 'random' testing set. We indicate the redshift of the realization by the colour of the points and respective confidence interval with an error bar. We show the number of samples in our training set at a different neutral fraction in an inset panel. After a first attempt, we realized that to overcome the unbalanced class problem requires a better representation of the early ( v HI ≈ 1) and late stages of reionization ( v HI ≈ 0). For this reason, we increased the number of training samples for these stages. Therefore the distribution of samples against neutral fraction has a bimodal shape with peaks at approximately v HI ≈ 0.1 and 0.9. As a result, the value for the overall prediction data set ( Figure 5, left panel) is about 87 per cent for SegU-Net (blue dashed line) and 62 per cent in the case of the Super-Pixel method (orange dashed line). The noise level increases with redshift. Therefore the score is slightly less accurate for redshift ≥ 8.25 with an 85 per cent accuracy, meanwhile higher for lower redshift ≤ 7.75 with 88 per cent. In the future, we consider increasing the proportion of the training data with high redshift to decrease this performance dissimilarity. The same trend is present in the case of the Super-Pixel method, with an accuracy of 60 per cent and 63 per cent, respectively.
In the right panel of Figure 5, we compare the MCC values from SegU-Net (blue line with circles) with that from the Super-Pixel method (orange line with squares) for our 'fiducial' simulation. As we already know from Giri et al. (2018), the Super-Pixel method performs best for V HI ≈ 0.5 and deteriorates towards earlier and later stages of reionization. The reason for this behaviour is that during these stages, structures are usually smaller and, therefore, more difficult to identify. With SegU-Net we are able to overcome this problem by employing a specifically designed BCE loss function (Equation 4) during the validation process after each training epoch. Therefore, the average value for the 'fiducial' simulation is about 91 per cent for SegU-Net (blue dashed line) and 70 per cent in the case of the Super-Pixel method (orange dashed line). In Table 3, we summarise the score for the two test sets.

Average neutral fraction
After identifying the ionized regions, we can determine the volumeaveraged neutral fraction v HI , which quantifies the reionization history. In Figure 6 we show the volume-averaged neutral fraction v HI, predicted as calculated from the recovered binary fields extracted by the two methods. In the left panel we show the v HI, predicted from the SegU-Net outputs against the true volume-averaged neutral fraction v HI, true for our 'random' testing set. The colour of the points indicates the redshifts. The black dashed line indicates v HI, predicted = v HI, true . Except for a few points, all the points lie on or near the black dashed line.
In the right panel of Figure 6 we compare the results of V HI, predicted derived with the Super-Pixel method (orange line with squares) and SegU-Net (blue line with circles) for our 'fiducial' simulation. Again, the black dashed line represent V HI, predicted = V HI, true . In the case of our neural network all results lie within the half standard deviation (0.5-) of the true value (gray dashed lines). With Super-Pixel 2) stages of reionization respectively. On each panel, we show the size distributions from the binary fields of the 'fiducial' simulation recovered by SegU-Net (blue line) and its respective confidence interval (blue shadow). Black dashed lines and orange lines give the size distributions of the ground truth and binary field recovered by the Super-Pixel method. At the bottom of each size distribution panel, we show the relative deviation from the true binary field distribution. method, this is true only from V HI ≈ 0.5 to 0.85. The recovered neutral fraction is either underestimated at V HI > 0.6 or largely overestimate for V HI < 0.4.

Size Distributions
From the 3D tomographic data that will be produced with the upcoming SKA experiment, we will be able to study the size distribution of neutral or ionized region during the EoR. ionized regions are often called bubbles and whereas neutral regions are referred to as islands. The Bubble and Island size distributions (BSDs and ISDs) are useful to derive the properties of reionization and its evolution (Xu et al. 2017;Giri et al. 2019a). Several approaches were presented to calculate this distribution (Lin et al. 2016;Kakiichi et al. 2017;Giri et al. 2018). In this work, we employ the Mean-Free-Path method (MFP; Mesinger & Furlanetto 2007;Giri et al. 2018) to calculate the size distribution ( d d ) of recovered neutral (ISD) and ionized field (BSD). Previous works have demonstrated that this method should be preferred since the calculated size distributions are almost unbiased (Lin et al. 2016;Giri et al. 2018).
In the left and right columns of Figure 7 we show the ISDs, respectively BSDs of the binary fields recovered with SegU-Net (blue line) and Super-Pixel method (orange line) compared to the ground truth (black dashed line). The Super-Pixel method performs best when the simulation is halfway through the reionization process v HI = 0.5 (central panel). However, it is considerably less accurate compared to SegU-Net. We show the relative difference with the ground truth in the plots below the ISDs and BSDs. The blue shaded region shows the error on each of the size distributions determined by SegU-Net. In both the ISD and BSD case, the main difference between the two recovered distribution occurs at the earlier v HI = 0.8 (top) and later v HI = 0.2 (bottom) stages of reionization. SegU-Net shows a relative difference of a few per cent while the distributions determined from the Super-Pixel segmentations show relative differences up to 10 per cent for large sizes.

Dimensionless Power Spectra
The dimensionless power spectrum of the neutral field is defined as Δ 2 x x = 3 x x ( )/2 2 , where x x is the auto-power spectrum that quantifies the fluctuations due to the distribution of neutral regions. These fluctuations contribute to the 21-cm power spectrum that is observed with radio interferometric telescopes. See for example Furlanetto et al. (2006) and Lidz et al. (2007) for descriptions of the fluctuations of the 21-cm signal. In this section, we study the Δ 2 x x estimated from the neutral fields recovered from various methods.
In Figure 8, we consider the 'fiducial' simulation at three stages of reionization, which are V HI = 0.8 (top panel), V HI = 0.5 (central panel) and V HI = 0.2 (bottom panel). At the mid-point of reionization (central panel), the Super-Pixel method performs well at large scales < 0.2 Mpc −1 with a relative difference within 25 per cent for lower k-values. The Δ 2 x x of the neutral field recovered by the Super-Pixel method at early and late times have the correct shape but differ in magnitude. The Δ 2 x x of the neutral field recovered by SegU-Net match the ground truth well at all three stages of reionization. The network maintains a maximum difference compared with the ground truth, of a few tens of per cent at all scales. For 0.5 Mpc −1 , the network uncertainty interval grows to 25-50 per cent relative difference. Figure 8. Dimensionless power spectra of the neutral field from the fiducial simulation as recovered by our network (blue line) and its respective confidence interval (blue shadow). Compared at early, middle and late stage of reionization (from top to bottom V HI = 0.8, 0.5, 0.2) with the same quantity derived from the ground truth (black dashed line) and the Super-Pixel method (orange line). At the bottom of each panel, we show the relative difference compared to the ground truth for both cases, the network and Super-Pixel method. Figure 9. Comparison of the topology of the identified regions with Betti numbers estimated from the original neutral field (black dashed line), the SegU-Net (blue line with circles) and the Super-Pixel method (orange line with squares), for the case of our 'fiducial' simulation. The top, middle and bottom panels give 0 , 1 and 2 respectively. The Betti numbers recovered with SegU-Net matches the ground truth better than those recovered with the Super-Pixel method.

Betti numbers
During reionization ionized bubbles form, grow and connect with each other to form a complex topology . Various studies have proposed topological descriptors for this distribution, such as Euler characteristics (e.g. Friedrich et al. 2011) and Betti numbers (Elbers & van de Weygaert 2019;Giri & Mellema 2021;Kapahtia et al. 2021). Giri & Mellema (2021) pointed out that Betti numbers contain more information compared to the Euler characteristics. Therefore in this section we study the zeroth 0 , first 1 and second 2 Betti number (Betti 1870) of the binary 3D maps recovered by the two feature identification methods. 0 , 1 and 2 describe the number of isolated ionizing regions, tunnels and isolated neutral regions, respectively. In the top, middle and bottom panels of Figure 9, we show the 0 , 1 and 2 values estimated from the recovered binary fields of our 'fiducial' model at V HI between 0.1 and 0.9. The black, blue and orange curves represent the Betti numbers calculated from the ground truth, recovered field with SegU-Net and Super-Pixel method, respectively. In line with the results for the other quantities discussed above, we find that the topology recovered with SegU-Net is much closer to the ground truth than the one recovered by the Super-Pixel method.

TESTS ON DIFFERENT INSTRUMENTAL NOISE LEVELS
We have trained and tested SegU-Net for one specific noise level, corresponding to the theoretically expected noise for obs = 1000 h with the current design of SKA-Low. However, in practice, the noise level may differ from this, either because the observing time or telescope design is different from our assumptions or simply because the theoretical noise level is not achieved due to complications with the calibration. Therefore, it is important to test to which extent the performance of our network is sensitive to the noise level in the actual data. To change the noise level we choose different observing times, one shorter ( obs = 500 h) and one longer ( obs = 1500 h). The former case corresponds to a noise level √ 2 higher than used in the training set and the latter to a noise level which is √︁ 2/3 lower. In the left panel of Figure 10, we show the coefficient of the recovered binary field against the volume-averaged neutral fraction V HI . We compare the prediction on the reference simulation for the higher ( obs = 500 h, green line with squares) and lower noise case ( obs = 1500 h, red line with triangles) with the one using the noise level employed during the training and validation process ( obs = 1000 h, blue line with circles). It is evident from the plot that although the noise level does impact the accuracy of the results, we still achieve approximately the same level of precision as in our test case, as commented in § 4.2. In fact, the overall average accuracy, indicated with horizontal dashed lines in Figure 10, on the simulation of reference is 89 per cent for the higher noise case (green dashed line) and slightly better, 92 per cent, for the lower noise case (red dashed line). In both cases, there is a drop in performance down to 88 per cent accuracy during the early stages of reionization V HI > 0.7, due to the redshift dependency of the simulated noise.
We also want to test how far we can push our SegU-Net trained on data with obs = 1000 h instrumental noise to identify structures in the presence of a higher or lower noise level. In the right panel of Figure 10, we plot the coefficient at different observation times obs , for three different stages of reionization in our reference simulation, namely for volume-averaged neutral fractions V HI is 0.2 (blue line with squares), 0.5 (orange line with circles) and 0.8 (green line with of the recovered binary field against its volume-averaged neutral fraction. We compare the prediction set for a high noise level ( obs = 500 ℎ, green line with squares) and low noise level ( obs = 1500 ℎ, red line with triangles) against the noise level employed during the training ( obs = 1000 ℎ, blue line with squares). Horizontal dashed lines of the respective colour represent the MCC average score of the reference simulation. Right panel:, the evolution of the MCC score for increasing observation time for a set of mock observations with volume-averaged neutral fraction of V HI = 0.2 ( = 7.310), 0.5 ( = 8.032) and 0.8 ( = 8.720), respectively in blue, orange and green color. In the same panel, an inset plot shows the signal-to-noise ratio (SNR = 21 / noise ) of 21-cm images at a resolution corresponding to a maximum baseline of 2 km we achieve for different observation times.
triangles), corresponding to redshift = 7.310, 8.032 and 8.720. This plot shows that our network performs well for obs 500 h, where 0.85. The spike in the curve for V HI = 0.8 at obs = 1000 h is due to the fact that this is the noise level for which the network was trained.
To put our noise level into perspective, the inset plot in the right panel of Figure 10 shows the signal-to-noise ratio (SNR) achieved for different observation times. The SNR is defined as 21 / noise (e.g. Kakiichi et al. 2017), where 21 and noise are the standard deviations of the 21-cm signal and noise respectively at the resolution corresponding to a maximum baseline of 2 km. From this we conclude that a good performance, with the same accuracy as the 'random' testing set ( 0.85), requires a SNR 3.

TESTS ON A FULLY NUMERICAL SIMULATION
We applied our network to mock cubes calculated with the C 2 RAY code, presented in § 2.3.2, with a spatial resolution close to the 21cmFAST simulations employed in the training process, 2 Mpc, in order to obtain the same level of noise per pixel. A visual comparison of the recovered binary field, similar to the results in § 4.1, is shown in Figure 2. In the left panel, the red/blue colour indicates the network prediction We go through the same process presented in § 4. The score with SegU-Net is represented by the violet dots with error-bars on the right panel of Figure 5, from left to right we have redshift = 7. 96, 8.06, 8.17, 8.28, 8.40, 8.52 and 8.64 corresponding to a universe with volume-averaged neutral fraction of V HI = 0. 17, 0.29, 0.42, 0.57, 0.70, 0.81 and 0.90, green squares represent the score obtained with the Super-Pixel method. As we can see, our neural network is performing with similar accuracy as for the prediction set of semi-numerical simulations as discussed in § 4.1. For V HI ≈ 0.55 SegU-Net performs slightly better than the Super-Pixel method. The Super-Pixel method shows a drop in accuracy at the late ( V HI < 0.5) and early ( V HI > 0.8) stages of reionization.
We do the same comparison with the recovered volume-averaged neutral fraction V HI , in Figure 6 right panel, the green error-bar points are the same data as mentioned above. As we can see, also for the C 2 RAY simulation, SegU-Net recovered quantity resides within the 0.5-confidence interval (violet dots with error-bars). For the Super-Pixel results, this is true only for V HI = 0.57, 0.70 and 0.81, with approximately the same precision as SegU-Net in the case of V HI = 0.57 and slightly better results for V HI = 0.70.

DISCUSSION & CONCLUSIONS
This work has developed a convolutional neural network based on the U-Net architecture, which can be used to segment redshifted 21cm image observations into neutral and ionized regions. We have shown that this application of deep learning provides a fast and stable method that significantly improves the identification of ionized/neutral regions during the epoch of reionization over previously proposed methods. To train our network, we employ a large set of simulated mock observations of the 21-cm signal. Our image segmentation network, SegU-Net, also contains an uncertainty estimator. This uncertainty estimator is a simple but efficient application of the test-time augmented (TTA) technique. With this uncertainty estimator, our network can create a pixel by pixel error map during the prediction process. The pixel by pixel error map can later be used to determine the error in any quantity derived from the segmentation.
Once the network has been trained, the binary field's extraction is swift. In our case for simulations of volume (256 Mpc) 3 and meshgrid of 128 3 , a run in serial on a Intel® Core™i7-6500U CPU @ 2.5 GHz processor and using a 16 Gigabytes of RAM takes between 5 to 10 seconds. Including the pixel-error map calculation increases the computing time to approximately 10 minutes. For comparison, the Super-Pixel method typically requires several minutes to extract the binary field, where the actual time depends on the image pixel resolution and the number of Super-Pixels employed. The computational speed of our network thus makes it particularly useful as a component in a Bayesian statistical inference framework to perform EoR parameter estimation using tomographic statistics (e.g. Gazagnes et al. 2021).
We compare the accuracy of our approach with a feature finding method from the field of image processing, the so-called Super-Pixel method, which Giri et al. (2018) showed to be superior over simple thresholding methods. The results show that our neural network can identify neutral regions in the mock observations at least as well and often much better than the Super-Pixel method. We show a visual comparison and the resulting pixel per pixel error map tested on our 'fiducial' model. This error map gives valuable insight into the parts of the image that are hard to recover and helps us check for over-fitting.
We studied the accuracy of a range of derived quantities from the recovered binary fields, comparing the performance of SegU-Net with the Super-Pixel method. These quantities are the volumeaveraged ionization fraction -the evolution of which provides the reionization history, the size distribution of the ionized (BSD) and neutral (ISD) regions, the dimensionless power spectra of the recovered binary fields and the three Betti numbers, which quantify the topology of the segmented data sets. For all quantities, we find that the SegU-Net results are more accurate than the Super-Pixel results, especially for the early and late stages of reionization, where the Super-Pixel method often struggles to produce accurate results.
Machine learning methods generally are sensitive to the properties of the training set. Therefore, we tested SegU-Net on input data with different properties than the training set. First, we analysed the performance on data sets with different noise levels than the network was trained. We found that SegU-Net yields accurate results for data sets in which the noise level is characterised by an observing time of obs > 500 ℎ, which approximately corresponds to an SNR 3. Second, we applied the network to mock observations calculated from the results of a fully numerical reionization simulation, rather than the semi-numerical simulations used to train the network. We find that SegU-Net achieves the same level of accuracy when applied to this data set and therefore is not sensitive to the type of simulation employed during the training process.
We want to point out that similar efforts are being made by Gagnon-Hartman et al. (2021). They focus on reconstructing the segmented maps of ionized and neutral regions in the context of foreground mitigation using the foreground avoidance method (e.g. Liu & Tegmark 2011;Pober et al. 2014), and also consider the possibility of doing so with instruments that are not optimised for imaging such as HERA. We include the effect of instrumental noise and study in great detail the summary statistics of the reconstructed binary maps and the dependency of the results on the noise level. In the future, we will extend our study to include the impact of foreground mitigation strategies while recovering the summary statistics.
Here we assumed the spin temperature to be saturated ( S CMB ). However, it is possible to have such a scenario where this assumption fails, especially during the time when reionization starts. In the future, we will evolve our SegU-Net to identify H regions in such scenarios. Even though our network is built to identify H regions, U-Net architecture can be trained to identify any interesting feature. Before reionization started, the luminous sources heated the IGM and left its impact on the 21-cm signal (e.g. Ross et al. 2017Ross et al. , 2019. The U-Net architecture can also be trained to identify these heated regions.

APPENDIX A: SEGU-NET HIDDEN LAYER OUTPUTS
We test SegU-Net to see if it can recover the binary field for a simple case, namely a single spherical neutral region. We assume a uniform density field at = 8.032 and calculate the differential brightness temperature with Equation 2, adding noise corresponding to obs = 1000 h and reducing the resolution to correspond to a maximum baseline of = 2 km. In Figure A1, we show the input image of the sphere (left panel) and the corresponding recovered binary field by SegU-Net (right panel). The black contour in the left panel and the green contour in the right panel show the true boundary of the sphere. For this test SegU-Net achieves an accuracy of 98 per cent.
In Figure A2 we show the output of the bottom hidden layer of SegU-Net, which is the last layer of the left part of the U-shaped in Figure 3. The colour coding is such that blue correspond to negative, red to positive and white to zero values. This output gives a visual representation of the low dimensional latent space of our network encoder. In our case, this consists of 256 images, where each corresponds to a convolutional kernel and contains information about the image's extracted features. The encoder path contracts the input image from an original mesh size of 128 2 down to 8 2 . The information contained in the latent space is then expanded by SegU-Net decoder into a binary map of the same size as the input image (see right panel of Figure A1).

APPENDIX B: SKIP CONNECTION BETWEEN ENCODER AND DECODER LEVELS
The main advantage of a U-shaped network is that it overcomes the bottleneck limitation encountered by auto-encoder networks (a classical encoder/decoder architecture) by adding interconnections between the descending (encoder) and ascending (decoder) paths (Long et al. 2014;Ronneberger et al. 2015). These connections allow feature representations to pass through the bottleneck (bottom layer) and avoid loss of information due to contraction.
In Figure A3, we show a visual example of interconnections between the encoder (left part of the U-shape) and the decoder (right part). The top panel shows a schematic representation of our network architecture, and the bottom part displays a visual output of three hidden layers for the test case of a sphere. The left-most panel (connected by a green dashed line) shows the output of the second convolutional block in the encoder's second level. This block consists of 32 kernels with a mesh size of 64 2 . At this level, the shape and form of the input image are still visible. The centre panel (connected by a red dashed line) shows the result of the second to last up-sampling operation of the decoder. The number of kernels and mesh size match with the corresponding encoder layers. The skip connection merges the encoder and decoder-level output for a total of 64 images with mesh 64 2 . The right-most panel (connected with a black dashed line) shows the concatenation after a convolutional block. The effect of the up-sampling operation is still visible. Figure A1. Test SegU-Net on a spherical ionized region. Left panel: slice through the input image. The colour map shows the differential brightness temperature, and the black contour shows the boundary between neutral and ionized regions. Right panel: the recovered binary field with SegU-Net. The green contour represents the same boundary again. The identified neutral and ionized regions are indicated in red and blue, respectively.