Measuring the topology of reionization with Betti numbers

The distribution of ionised hydrogen during the epoch of reionization (EoR) has a complex morphology. We propose to measure the three-dimensional topology of ionised regions using the Betti numbers. These quantify the topology using the number of components, tunnels and cavities in any given field. Based on the results for a set of reionization simulations we find that the Betti numbers of the ionisation field show a characteristic evolution during reionization, with peaks in the different Betti numbers characterising different stages of the process. The shapes of their evolutionary curves can be fitted with simple analytical functions. We also observe that the evolution of the Betti numbers shows a clear connection with the percolation of the ionized and neutral regions and differs between different reionization scenarios. Through these properties, the Betti numbers provide a more useful description of the topology than the widely studied Euler characteristic or genus. The morphology of the ionisation field will be imprinted on the redshifted 21-cm signal from the EoR. We construct mock image cubes using the properties of the low-frequency element of the future Square Kilometre Array and show that we can extract the Betti numbers from such datasets if an observation time of 1000 h is used. Even for a much shorter observation time of 100 h, some topological information can be extracted for the middle and later stages of reionization. We also find that the topological information extracted from the mock 21-cm observations can put constraints on reionization models.


INTRODUCTION
The 21-cm signal produced by the hyperfine transition of neutral hydrogen is one of the most promising signals for probing the period from the history of our Universe when it transitioned from a cold and neutral state to a hot and ionised state (e.g. Furlanetto et al. 2006;Morales & Wyithe 2010;Zaroubi 2013). This transition period is known as the epoch of reionization (EoR), and ended around redshift ≈ 5.5-6 Keating et al. 2019). The 21-cm signal produced during EoR will be redshifted and found at the low frequency end of the radio band. It traces the distribution of neutral hydrogen gas present in the intergalactic medium (IGM). Observing this signal will therefore shed light not only on the progress of reionization but also on a range of other issues such as the formation of first compact structures (e.g. Barkana 2016).
The global signal experiment, Experiment to Detect the Global EoR Signature (EDGES; Bowman & Rogers 2010), has claimed a detection of the sky averaged 21-cm signal from ≈ 17 (Bowman et al. 2018). This detection is debated as its nature questions our current understanding of the early Universe (e.g. Hills et al. 2018;Singh & Subrahmanyan 2019). See Tashiro et al. (2014); Barkana (2018); Fialkov et al. (2018); Muñoz & Loeb (2018); Feng & Holder (2018) and Fialkov & Barkana (2019) for few possible theories that can explain the Bowman et al. (2018) observation. Other global experi-★ E-mail: sambit.giri@gmail.com ments, such as the Large-aperture Experiment to Detect the Dark Age (LEDA; Price et al. 2018) the Dark Ages Radio Explorer (DARE; Burns et al. 2017) and SARAS2 (Singh et al. 2017) are also trying to detect the global 21-cm signal. As we wait for an independent experiment to confirm the Bowman et al. (2018) observation, we will assume the standard cosmology in this study.
There also exists numerous efforts to detect the fluctuations in the signal using interferometric radio telescopes, such as the Low Frequency Array (LOFAR; e.g. van Haarlem et al. 2013), the Murchison Widefield Array (MWA; e.g. Wayth et al. 2018) and Hydrogen Epoch of Reionization Array (HERA; e.g. Deboer et al. 2015). Even though the fluctuations have not been detected, upper limits have been placed on the 21-cm power spectrum (e.g. Mertens et al. 2020;Trott et al. 2020) that have started to put constraints on the reionization process (e.g. Ghara et al. 2020;Mondal et al. 2020;Greig et al. 2020a,b). In the near future, the low frequency element of the Square Kilometre Array (SKA-Low; e.g. Mellema et al. 2013;Koopmans et al. 2015) will start observing the 21-cm signal produced during the EoR. SKA-Low is expected to be sensitive enough to not only detect the 21-cm power spectrum but also to produce images (e.g. Mellema et al. 2015;. The 21-cm signal produced during EoR is highly non-Gaussian and therefore cannot be completely characterised by the 21-cm power spectrum (e.g. Majumdar et al. 2018). Therefore we need to explore other summary statistics that are sensitive to the non-Gaussian information contained in the signal. Examples of such summary statistics that have been studied previously are the bispectrum (Majumdar et al. 2018;Watkinson et al. 2019), the position-dependent power spectrum (Giri et al. 2019b), size distributions (Kakiichi et al. 2017;Giri et al. 2018aGiri et al. , 2019a and fractal dimensions (Bandyopadhyay et al. 2017).
Yet another probe of the non-Gaussian information in the 21-cm signal is a summary statistics that describes the topology of the signal. Already Gnedin (2000) used the topology of the H (or ionised) regions to heuristically describe the EoR to consist of three stages, namely the preoverlap, overlap and post-overlap stage. Subsequent studies quantified the topology using the genus or its equivalent the Euler characteristic ( = 2 − 2 ) of the neutral hydrogen distribution (e.g. Gleser et al. 2006;Lee et al. 2008) or the ionization fraction field (Friedrich et al. 2011). These works could build on the extensive literature about the use of (or ) on the large scale structure (e.g. Gott et al. 1986;Gott et al. 1992;Matsubara 1994;Matsubara 1996;Schmalzing & Buchert 1997;Park et al. 2005).
The Euler characteristic is also equivalent to the third of the Minkowski functionals 3 and some works have considered the full set of these topological quantities ( 0 -3 , Friedrich et al. 2011;Yoshiura et al. 2017;Chen et al. 2019). Based on the evolution of the Minkowski functionals the latter study proposed a five stage description of the reionization process.
The Euler characteristic ( ) for a three-dimensional dataset can be written as where the terms on the right hand side are the number of isolated objects, tunnels and cavities. These terms are also known as the Betti numbers. In this paper we explore the information contained in the Betti numbers of the distribution of ionized regions during reionization. It is obvious that they contain more information compared to (e.g. Van De Weygaert et al. 2011). Specifically, they allow us to study the hierarchical aspect of the topological features of ionised bubbles as reionization proceeds. We consider how the Betti numbers evolve during the different stages of reionization, how they can describe the state of the IGM and whether the additional information they contain makes them more useful than the more commonly studied Euler characteristic/genus.
During reionization the distribution of ionised bubbles will imprint its topology on the observable 21-cm signal. Therefore we will also discuss the prospects of measuring the Betti numbers of 3D structures in the upcoming 21-cm observations with SKA-Low, considering both resolution and noise effects. As far as we know there exists only one previous work that considered the Betti numbers in the context of reionization. Kapahtia et al. (2019) only studied the two Betti numbers for two dimensional (2D) 21-cm images. However, reionization is a three dimensional (3D) process and SKA-Low will produce 3D data cubes consisting of images at a range of frequencies.
We will therefore study the three Betti numbers associated with the 3D topology.
There exists even more metrics to study the topology, such as the persistence homology (e.g. Edelsbrunner & Harer 2008;Zomorodian & Carlsson 2005) that is gaining attention in cosmology (e.g. Sousbie 2011;Sousbie et al. 2011;Pranav et al. 2017;Makarenko et al. 2018). In persistence homology, the evolution of each -dimensional holes in the analysed field is tracked. Elbers & Van De Weygaert (2019) have used persistence homology to study the H bubble network. We defer the exploration of persistence homology of 21-cm datasets to future studies.
The paper is structured as follows. In the next section, we describe the topological concepts used throughout the paper. Section 3 explains explains our large-scale reionization simulations and the = (1, −1, 0) (right), giving Euler characteristics = 2 and = 0, respectively. methods used to construct the 21-cm signal from simulation outputs. In Section 4 and 5, we discuss the properties and evolution of the Betti numbers of the ionisation and 21-cm signal fields respectively. We discuss the detectability of the Betti numbers in 21-cm observations of upcoming radio telescopes in Section 6. In the final section, we summarise our findings.

TOPOLOGICAL MEASURES
In this section, we describe the methods we use to measure the topology of our data. In Section 2.1, we explain the metric we will use to quantify the topology of any field. The fields we analyse in this study are given in form of digital data. Therefore we define an estimator for measuring the topology of digital data in Section 2.2.

Betti numbers
In this study, we quantify the topology of structures in our field with the Betti numbers ( ), which are topological invariant quantities (Betti 1870). This means they remain unchanged under transformations, such as translation, rotation and deformation.
is defined as the number of dimensional holes in the structure we analyse. Our datasets are three-dimensional and therefore their topology can be defined with three Betti numbers where each have a simple and intuitive meaning. The zeroth Betti number ( 0 ), first Betti number ( 1 ) and second Betti number ( 2 ) probe the number of isolated objects, tunnels and cavities, respectively.
In its simplest form, the reionization process can be pictured as a network formed by bubbles of ionised hydrogen gas. In Fig. 1 we show examples of structures formed by spherical bubbles. The left and right panels show projections of 3D datasets containing one and five spherical bubbles respectively. As they both consist of a single connected structure, both datasets have 0 = 1. The dataset in the left panel has no tunnel, hence its 1 = 0. The five bubbles in the right panel form a torus, which contains a tunnel, giving 1 = 1. The projection does not reveal the inner structure of the spheres. If we assume that the single sphere in the left panel has a hollow centre, then this single cavity will give 2 = 1. If we assume that the spheres in the right hand panel are all filled, this gives 2 = 0.
In previous works, the Euler characteristics ( ), which describes the integral of the Gaussian curvature over the surface of the structure, has been used to study the topology of reionization (e.g. Friedrich et al. 2011;Yoshiura et al. 2017). is related to the Betti numbers as (e.g. Pranav et al. 2019), (2) (also see Eq. 1). As the Betti numbers are components of , two datasets can have the same but different sets of Betti numbers. For example, the dataset shown in the right panel of Fig. 1 has = {1, −1, 0} so = 0, which is the same as for the case when there is no structure. Such a situation is quite plausible in the bubble network produced by reionization and therefore Betti numbers should be better than for distinguishing the topologies of different reionization models. We refer the readers to Hatcher (2002) and Edelsbrunner & Harer (2008 for more mathematically rigorous descriptions of these topological quantities.

Estimators for digital data
All the fields that we work with in this study are digital data, which is a discrete, discontinuous representation of a given field. In order to capture the topology of the structures in our field, we need to decompose the structures into well defined geometrical components in such a way that the topological properties of the structure are retained. We achieve this by constructing a cubical complex (e.g. Kaczynski et al. 2004;Wagner et al. 2012), which is composed of points, line segments, squares and cubes, for any given field. We refer the readers to Wagner et al. (2012) and Giri et al. (2019a) for more information about our method to construct and store our cubical complex.

Betti numbers
We follow the method described in Gonzalez-Lorenzo et al. (2016) to estimate the Betti numbers. Each isolated object is a group of connected pixels in our dataset. We use the connected-component labelling module in package 1 (van der Walt et al. 2014). We refer inquisitive readers to Fiorio & Gustedt (1996) and Wu et al. (2005) for the algorithm used to label all the connected regions. This method needs less computation time compared to the classical Hoshen-Kopelman algorithm (Hoshen & Kopelman 1976). It is similar to the friends-of-friend algorithm used to find clusters in N-body simulations (e.g. Press & Davis 1982;Davis et al. 1985). The number of regions labelled gives the value of 0 .
From the Alexander duality of any topological complex of three dimensions, we can find the 2 by estimating the 0 of the complementary field of the dataset (e.g. Hatcher 2002;Gonzalez-Lorenzo et al. 2016). The remaining first Betti number 1 is not calculated directly, but derived from the Euler characteristics ( ) of the dataset (as described in Section 2.2.2) according to 1 = 0 + 2 − . This method of calculating the 1 was introduced by Gonzalez-Lorenzo et al. (2016). We have tested our estimator on a Gaussian random field (GRF). The results are presented in Appendix A.

Euler characteristics
Once we have constructed the cubical complex for any given digital data, we can estimate the Euler characteristic as, where d is the dimension of the dataset. is the number of cubes of dimension in the constructed cubical complex. In our dataset, dimensions of points, line segments, squares and cubes are 0, 1, 2 and 3 respectively. See appendix A of Giri et al. (2019a) for more details.

SIMULATION OF 21 CM SIGNAL
To test the usefulness of the Betti numbers we make use of simulated data. This section describes our methodology for simulating reionization and the 21-cm signal. Section 3.1 describes the large scale reionization simulations we use and Section 3.2 outlines how we calculate the observable 21-cm signal from them. Finally we explain how we include telescope effects, such as the limited angular resolution and telescope noise in Section 3.3.

Reionization simulations
We use two different types of reionization simulations for our study, fully numerical radiative transfer simulations and so-called seminumerical simulations which rely on sophisticated form of photon counting.

Fully numerical simulations
The fully numerical simulations use two steps. We first simulate the evolution of the matter distribution using the -body code CUBEP 3 M (Harnois-Déraps et al. 2013). The -body simulation is run in a cosmological volume of 714 comoving Mpc (cMpc) with 6912 3 particles. We use such a large cosmological volume as they are important to capture the large scale fluctuations (Giri et al. 2019b;Ghara et al. 2020) and the large scale topological structure (Giri et al. 2019a) and are also a match to the Field of View of SKA-Low. See Iliev et al. (2014) for a discussion about the importance of the size of cosmological volumes for reionization simulations. Our -body simulation run outputs snapshots at each 11.5 megayears (Myrs). We obtain the density field by assigning mass to a cubic grid with 300 cells along each direction by smoothing with a smooth-particle-hydrodynamics (SPH) kernel (Monaghan 1992;Iliev et al. 2014). The cosmological parameters used here are Ω m = 0.27, Ω k = 0, Ω b = 0.044, ℎ = 0.7, s = 0.96 and 8 = 0.8. These values are consistent with the results from Wilkinson Microwave Anisotropy Probe (WMAP) (Komatsu et al. 2011) and Planck (Planck Collaboration et al. 2016 collaborations. The reionization process is driven by sources of ionizing photons, which form due to the collapse of baryonic matter in dark matter haloes. We determine the position and mass of dark matter haloes using the spherical overdensity halo-finder discussed in Watson et al. (2013). We identify haloes with masses halo ≥ 10 9 M · . In these haloes, gas can cool through excitation of atomic hydrogen and they are relatively unaffected by radiative feedback (Sullivan et al. 2018). We follow the convention in Dixon et al. (2016) and call these haloes high-mass atomic-cooling haloes (HMACHs). We model the sources by assigning the ionizing photon production rate, , to be proportional to the total mass in haloes within that cell, .
is given as, where , p and coll are the source efficiency, mass of proton and total collapsed mass fraction in HMACHs, respectively. To simulate the state of gas in the intergalactic medium, we next post-process the snapshots from CUBEP 3 M with the fully numerical radiative transfer code C 2 -RAY (Mellema et al. 2006a). C 2 RAY employs the short characteristics ray tracing method to solve the radiative transfer equation (Raga et al. 1999;Lim & Mellema 2003). The ray-tracing is performed up to a comoving distance of 71 Mpc from each source. This limit is meant to approximate the effect of the presence of optically thick absorbers in the IGM that are unresolved in our N-body simulation. This value is consistent with the measurement of the mean free path by Songaila & Cowie (2010). For more details about the simulations we refer the reader to Iliev et al. (2006), Mellema et al. (2006b) and Dixon et al. (2016).
In this work, we have run two scenarios (FN1 and FN2) using the methods described above. FN1 and FN2 use values of 50 and 40 respectively. We show the history of these reionization scenarios in Fig. 2. As the sources are less efficient in FN2 compared to the FN1 scenario, reionization is delayed in the latter. The values of the source parameters are listed in Table 1. The Thompson scattering optical depth calculated from these simulations are within the 95 per cent credible interval of the constraints from Planck Collaboration et al. (2018). The estimated are also given in Table 1.

Semi-numerical simulations
The fully numerical simulations are computationally expensive. Therefore to study the impact of source parameters on the topology measured with the Betti numbers, we use the fast semi-numerical code 21 FAST (Mesinger et al. 2011). In this code the density field is constructed by evolving the initial density field using second-order Lagrangian perturbation theory (2LPT; Scoccimarro 1998). 21 -FAST builds the ionisation field using the excursion-set approach developed in Furlanetto et al. (2004). A cell in the simulation volume is labeled as ionised when it satisfies the following condition, where coll ( , s , min ) is the fraction of matter inside a sphere of radius s that has collapsed into haloes of mass larger than min (e.g. Bond et al. 1991;Sheth & Tormen 1999). is a again the ionizing efficiency describing the amount of ionizing photons produced by the collapsed mass. This efficiency factor is in principle the same as the one defined in Eq. 4.
We have run three semi-numerical reionization scenarios (SN1, SN2 and SN3) by varying the source and sink parameters in 21 -FAST. The values of the parameters are given in Table 1. We also show the reionzation history of of these models in Fig. 2. Scenario SN1 has the same parameters as FN1. Their reionization histories are not identical as SN1 does not explicitly include the photon losses due to recombinations whereas FN1 does. In 21 FAST these losses are assumed to be accounted for in the value of (see for e.g. eq. 2 in Greig & Mesinger 2015).

Redshifted 21 cm signal
When the 21-cm signal is observed with an interferometry based radio telescope, the recorded observable is known as the differential brightness temperature. The observed differential brightness temperature b of the 21-cm signal, which is given as (e.g., Mellema et al. 2013 where HI and are the fraction of neutral hydrogen and the density fluctuation respectively. s is the excitation temperature of the two spin states of the neutral hydrogen, known as the spin temperature. CMB ( ) is the CMB temperature at redshift and p is the primordial helium abundance.
Here we will work with the assumption of high spin temperature s CMB . During the EoR this is a reasonable assumption as even relatively low levels of X-ray radiation can raise the gas temperature above the CMB temperature (e.g., Pritchard & Furlanetto 2007;Pritchard & Loeb 2012). In the high spin temperature limit, b is independent of s .
We construct the 21-cm signal at a given redshift from Eq. 6 using the corresponding three-dimensional neutral fraction and matter density cubes from the C 2 -RAY and CUBEP 3 M simulations. Each of these 3D datasets represents the characteristics of the signal at the corresponding redshift and therefore called coeval cubes (CCs). We transform these CCs into image cubes by assuming one of the axes to be the frequency axis and the other two position in the sky. This procedure implies that we neglect the two line of sight effects: redshift space distortions (see e.g. Jensen et al. 2013a) and the light cone effect (see e.g. Datta et al. 2012) which would introduce minor distortions along the line of sight. We use our publicly available analysis code, T 21 2 (Giri et al. 2020), to simulate the redshifted 21-cm signal cubes.

Mock SKA-Low observation
In this section, we explain our methods for constructing mock observation with SKA-Low.

Limited resolution
As our simulated 21-cm signal cubes are represented as digital data, they have an intrinsic resolution below which we do not any information. The simulation resolution of our dataset is Δ ≈ 2.4 cMpc. Table 1. Reionization simulation parameters of various source models. We give the ionizing source parameters, which are ionizing efficiency ( ), minimum mass ( min ) and mean free path of ionizing photons ( mfp ). The Thompson scattering optical depth ( ) derived from each simulations are also given here. This simulation resolution corresponds to different angular scales on the sky at different , which can be given as, where c ( ) is the comoving distance to redshift . Similarly the frequency scales also vary over redshifts, which can be given as, where 0 is the rest frequency of the 21-cm line, ( ) the Hubble parameter at redshift and the speed of light. For = 8, these equations give values Δ = 0.881 arcmin and Δ = 0.133 MHz. Apart from the simulation resolution, we will also be limited by the resolution of the telescope. As SKA-Low is an interferometry based radio telescope, it observes the sky in so-called space, which corresponds to the Fourier transform of the sky. We refer the readers to Rohlfs & Wilson (2013) for details about radio telescope observations.
A radio interometric telescope comprises of radio antennae placed at different locations. Each antennae pair forms a baseline and it observes the signal at an angular scale proportional to the length of the baseline. For observing the 21-cm signal this angular scale can be written as, where is the length of the baseline in cm. The first generation of SKA-Low will be sparsely filled with antennae outside a diameter of 2 km around the centre (e.g. Mellema et al. 2013). Therefore a 21-cm dataset produced with baselines longer than 2 km will be quite noisy. In this work we will approximate the synthesized beam of SKA-Low by a Gaussian with a full-width half maximum (FWHM) given by Eq. 9 with = 2 km. For = 8 this gives Δ telescope ≈ 3 arcmin. An interferometric radio telescope also has a smallest baseline which sets the largest scales that it can image. To implement this effect, we subtract its average value from each slice of the 21-cm signal dataset along the frequency (or redshift) axis. This effect means that interferometric images lack absolute calibration of the signal. Finally, to attain isotropic resolution in our dataset we convolve the signal along the frequency axis with a tophat filter of the same physical width as the Gaussian beam. For = 8 this gives Δ ≈ 0.5 MHz.

System noise
The 21-cm observations with SKA-Low will also be prone to instrumental noise. In order to simulate this noise, we follow the approach in Ghara et al. (2017) and Giri et al. (2018b). We model the ef- fects using the currently available configuration 3 of the first phase of SKA-Low, which has 512 antennae stations each with a diameter of 35 m. We present the telescope parameters that we use in this study in Table 2. We explore two observation time ( obs ). One is an optimistic case of obs = 1000 h and other is a pessimistic case of obs = 100 h. At simulation resolution, the system noise at 1000 h and 100 h will be very high. As the this noise is uncorrelated while the signal is correlated, the signal-to-noise ratio (SNR) will go up when the resolution is decreased. As discussed earlier, SKA-Low will have a limited resolution defined by the maximum baseline. Previous studies have shown that we can get interesting image data with 1000 h observation at a resolution corresponding to = 2 km (e.g. Mellema et al. 2015;Ghara et al. 2017;Giri et al. 2018bGiri et al. , 2019a. However the SNR will be much lower for a 100 h observation at the same resolution. To achieve a similar SNR, we reduce the resolution by a factor of 2 for the 100 h mock observations.
We would like to note that the noise model we use assumes perfect calibration. In reality it is not always possible to reach this theoretical noise level. Therefore To achieve the same SNR values as in our example with the real SKA-Low might require longer observation times or use of a lower resolution.

BETTI NUMBERS OF IONISATION FIELD
The formation, merger and evolution of ionised bubbles during reionization results in a complex morphology of H regions. In this section we investigate the Betti numbers estimated from the ionisation field in our simulations.  Table 3 for the fit parameters.  Fig. 3 shows the evolution of Betti numbers estimated from the ionisation field of our FN1 simulations by defining structures using a threshold of HII = 0.5. These Betti numbers trace the evolution topology of H regions during reionization. When reionization starts we see an increase in 0 . This increase in 0 is due to the appearance of ionised bubbles around the sources emitting ionising photons. As time passes, more sources form and ionise the hydrogen gas in the IGM around them. In the initial stages of reionization we not only see appearance of new ionised bubbles but also overlap of bubbles. A comparison between the value of 0 and the number of sources shows that such overlap occurs right from the start of reionization. The total number of bubbles will decrease when multiple ionised bubbles 3 The configuration is described in document SKA-TEL-SKO-0000557

Evolution during reionization
Rev 1 and the positions of the individual stations in document SKA-TEL-SKO-0000422 Rev 2, both retrievable at https://astronomers. skatelescope.org/documents/ overlap. When the rate by which H regions start to overlap becomes larger than the formation rate of new isolated ionised bubbles, the value of 0 will start to decrease. In our model, we observe this turn-around at v HII ≈ 0.2. The value of 0 keeps decreasing as reionization proceeds and converges to unity at around v HII ≈ 0.7. At v HII ≈ 0.1, 1 starts to increase. 1 measures the number of tunnels and these tunnels can be formed by overlapping bubbles (see Section 2.1 for a simple example). Therefore the growth of 1 is a sign of substantial bubble overlap modulating the topology of the ionisation field. As reionization progresses, the number of tunnels increase until it reaches a maximum value beyond which 1 starts to decrease. In the FN1 simulation this peak is reached around v HII ≈ 0.55. Unlike 0 , the value of 1 remains much larger than 1 until the end of reionization.
Cavities inside H regions start forming during the middle stages of reionization, as indicated by the slow rise of 2 , starting around v HII ≈ 0.2. We can visualise these cavities as isolated neutral regions. As reionization progresses initially large connected neutral regions break up and form multiple isolated neutral regions. The value of 2 attains a maximum around v HII ≈ 0.9 after which it falls to zero once reionization completes. In any inside-out reionization scenario such as FN1, these later isolated neutral regions trace the cosmic voids. Giri et al. (2019a) noted that the number of isolated neutral regions in the later stages of reionization are relatively rare compared to the number of isolated ionised regions at the start of reionization. This is confirmed by comparing the maximum values of the 0 and 2 curves, which shows that the latter is about 3.5 times lower.
From the three curves we obtain three clear maxima, as well as the crossing points between 0 and 1 as well as between 1 and 2 . As 1 enters the calculation of the Euler characteristic with a different sign than 0 and 2 , the above evolutionary patterns cannot be distinguished when considering . It is for example impossible to separate the early rise of the number of tunnels and the decrease of the number of isolated regions due to overlap. At v HII ≈ 0.35 0 ≈ 1 . However the small positive contribution from 2 at that time will push the location of = 0 to a slightly later stage.

Connection between Betti numbers and percolation
These crossing points of Betti curves appear to be connected to the formation of the percolation clusters. Furlanetto & Oh (2016) showed that a large network of overlapping ionised bubbles spanning the entire simulation volume forms during the early stages of reionization. This network is known as the percolation cluster. A similar percolation event 4 takes place towards the end of reionization when the last network of neutral regions spanning the entire volume disappears. As pointed out by Furlanetto & Oh (2016) this type of percolation behaviour is a hallmvark of phase transitions. Fig. 4 shows the percolation curves, i.e. the volume fraction of H regions contained in the percolation cluster, for all our simulations. Here we focus on the percolation curve of the FN1 scenario, the percolation curves of other scenarios are discussed in Section 4.5. We see that the appearance of the percolation cluster coincides with the time when 1 surpasses 0 . Likewise we find that the disappearance of the neutral percolation cluster coincides with the time when

Parametrization of the evolution of Betti numbers
The evolution curves for each of the Betti numbers for scenario FN1 have strikingly regular shapes. After inspection of the other scenarios we find that the shapes appear to be independent of the source model driving the reionization process. We will discuss these additional reionization scenarios in Section 4.5. The curves for 0 and 2 seem to match a log-normal shape, whereas 1 appears to have a Gaussian shape. We therefore propose the following parametric forms for each Betti number, where , and are the fitting parameters. models the height of the curves. and define the peak position and width of the curves. We show the fits to the Betti numbers curve with black solid lines in Fig. 3. The fit parameters are listed in Table 3.
The fact that simple functional forms can be used to describe the evolution of the Betti numbers for reionization is another advantage over the Euler characteristic for which no simple fit appears to exist.

Dependence on simulation resolution
The topology of H regions is sensitive to the resolution of our simulation volume. Friedrich et al. (2011) showed the impact of simulation resolution on the Euler characteristics. In this section, we study how the Betti numbers depend on the simulation resolution.
We show the Betti numbers curves estimated from the FN1 simulation volumes at different resolutions in Fig. 5. The solid curve represents the Betti numbers estimated from the dataset at the intrinsic resolution of our simulation (Δ ≈ 2.4 Mpc). The dashed and dash-dotted lines represent their values estimated after 4-cell and 8-cell smoothing respectively. The 4-cell and 8-cell smoothing approximately correspond to the resolution of SKA-Low observations at ≈ 9 when maximum baselines of 2 km and 1 km are used respectively.
When we fit these curves with the parametric form given in Eq. 10, 11 and 12, we find that the peak positions and widths of the curve do not change as the resolution degrades. The resolution only affects the amplitude parameter for all the Betti numbers curve. This decrease in amplitude is expected as the number of features contained in the dataset decreases when the resolution is degraded. The fact that the peak positions and widths are less affected adds to the attractiveness of the Betti numbers as topological quantities.

Comparing the topology of different reionization scenarios
In this section we compare the topology of various reionization models using the Betti numbers. The top panels of Fig. 6 show the Betti curves for our fully numerical simulations. The FN1 and FN2 simulations use source efficiencies = 50 and 40 respectively. Therefore reionization is delayed in FN2 (see Fig. 2). However, even though the reionization history is different the peak positions of the Betti curves are quite similar. The 0 gives the number of isolated ionised regions and at early times, this number will be proportional to the number of active ionising sources and not the efficiency of those sources. Note however that 0 will not be exactly equal to the number of sources as some regions are formed by clustered sources. Fig. 2 shows that FN2 reaches a similar stage Δ ≈ 1 later than FN1. The number density of ionising sources will not differ that much between those two epochs. Therefore the 0 curves for both these models are quite similar. However, the topology does differ, as evidenced by the 1 and 2 curves. The FN2 simulation contains more tunnels during the middle stages of reionization and more cavities from about the middle until near the end of reionization. Using all the curves together, we can distinguish between these two models.
To test the robustness of our conclusions, we also study the topology of reionization in a set of semi-numerical simulations. As seminumerical simulations are fast, we can easily vary the simulation parameters and observe its impact on the Betti numbers. In the bottom panel of Fig. 6, we show the Betti numbers curves for our three semi-numerical simulations (see Table 1). SN1 and SN3 use the same min value (10 9 ) for dark matter halos, which means that reionization is driven by the same source population. Therefore the 0 curves for SN1 and SN3 overlap with each other. The min used in SN2 is 10 8 . As a result SN2 contains more ionising sources compared to SN1 and SN3. We see the imprint of this on the 0 curve, which reaches a higher amplitude for the SN2 simulation.
The SN1 and SN2 simulations assume mfp to be 71 Mpc where as SN3 simulation assumes it to be 20 Mpc. SN1 and SN3 otherwise have the same source parameters. mfp describes the largest distance ionising photons can travel from a source. At the early times, the value of mfp does not have a significant impact on the topology as most H regions are smaller than mfp . However, once larger regions form during the middle and late stages of reionization, a smaller value for mfp will inhibit the growth of ionized regions. This will allow more tunnels to survive, as evidenced by the difference between the 1 curves of SN1 and SN3. The largest impact of mfp is associated with the final phases of reionization as can be seen from the 2 curves. The values are much higher for SN3, indicating that imposing a short mean free path leads to the formation of many more small neutral islands.
As pointed out in Section 4, the rise of 1 in simulation FN1 marks the appearance of the percolation cluster. The percolation curves for all our reionization models can be found in Fig. 4. We find that for all models, percolation happens roughly at the epoch when 1 crosses 0 confirming our hypotesis. The Betti numbers curves from all our reionization models, both fully numerical and semi-numerical, show the characteristic forms defined in Section 4. We present the values of the fit parameters for all models in Table 3. We conclude that these fitting formulae Eq. 10, 11 and 12 provide a useful description of the evolution of Betti numbers in reionization simulations.

BETTI NUMBERS OF 21-CM SIGNAL
In this section we will study the evolution of Betti numbers estimated from the 21-cm signal ( b ). As explained in Section 3.2, this signal depends on neutral fraction HI ( ) = 1 − HII ( ) , density and spin temperature fields (see Eq. 6). For our study we assume the Universe has been heated before reionization begins causing the spin temperature factor to saturate to 1 as s CMB . Even with this assumption, the b will still contain properties of both the ionisation and density fields.
We first study the topology of structures by putting a threshold  ( th b ) on the b field. We measure the topology of structures with pixel values below a given value of th b . In Fig. 7, we show colour plots of the Betti numbers estimated from the b field of the FN1 simulation against the values of th b (y-axis) and v HII (x-axis). For high values of th b , the topology is mostly defined by the fluctuations in the density field. However the Betti numbers will not exactly trace the topology of the density field as reionization will reduce the signal in the (predominantly) dense regions. For low th b value, the structures trace the topology of ionisation field. For th b 10 mK, the Betti numbers estimated from b fields are similar to that estimated from HII field of FN1 simulation.
For different reionization scenarios we will obtain different colour plots from the one shown in Fig. 7. However, it is difficult to compare scenarios with these plots as they will be affected by instrumental limitations such as lack of an absolute calibration of the signal, limited resolution and noise. In the next section, we propose a method Table 3. The values of parameters when we model the Betti numbers using Eq. 10, 11 and 12. The fit parameters for HII are Betti number curves estimated from the field at simulation resolution. The fit parameters for b are Betti number curves estimated from the mock SKA-Low 21-cm images. We give the fit parameters corresponding to mock observations at obs = 1000 h and = 2 km. The fit parameters given in brackets correspond to mock observations at obs = 100 h and = 1 km. to explore the topological properties of reionization using mock 21cm observations for the future SKA-Low telescope.

BETTI NUMBERS OF 21-CM IMAGES OBSERVED WITH RADIO TELESCOPE
As discussed in the previous section, the structures in the ionisation field will be imprinted on the 21-cm signal. Therefore we can study the morphology of H regions if we can identify these regions in 21-cm images. As the future SKA-Low has been designed to be able to produce such images (Mellema et al. 2015), we will consider mock SKA-Low observations in this section and study our ability to estimate the Betti numbers from such observations. Identifying the ionised (or neutral) regions in 21-cm images is a non-trivial problem. Giri et al. (2018b) compared various methods taken from the field of image processing and found that the superpixel method works the best in identifying ionised (or neutral) regions in 21-cm images. In the superpixel method, connected pixels with similar properties are grouped together. These pixel groups are known as superpixels. We then construct the histogram from the mean intensity values of all the superpixels. From this histogram we find the threshold for the superpixels that classifies them into ionised and neutral superpixels. See Giri et al. (2018b) for a detailed description of the method.
In Section 6.1, we consider mock SKA-Low observations with infinite observation time as we want to first learn what we can achieve in this most optimal case with our method. In this situation, the 21cm images will be free from noise and only be affected by the limited resolution and the lack of absolute calibration. In Section 6.2 we study the impact of telescope noise on our analysis strategy. Fig. 8 shows the impact of SKA-Low resolution on the estimated Betti numbers. The solid and dashed lines represent the Betti numbers calculated from the HII datasets degraded to a resolution corresponding to a maximum baseline ( ) of 2 km and 1 km respectively. These curves differ slightly from the ones shown in Fig. 5 as the resolution corresponding to a certain value of is redshift dependent. Even though the peak amplitude decreases as the resolution is degraded, the peak position of all the Betti number curves estimated from HII does not change. This behaviour agrees with our findings in Section 4.4.

Limited resolution
The dash-dotted and dotted lines in Fig. 8 represent the Betti numbers estimated from the b fields at a resolution corresponding to = 2 km and 1 km respectively. The peak position of 0 for the b field observed at = 2 km matches the peak positions of the 0 curves derived from the HII fields. However, for the lower resolution case of = 1 km the 0 derived from the b fields attains its peak at a much earlier epoch. This is because the ionized regions are not very large at these early times and for the lower resolution the superpixel method struggles to separate the ionization and density fluctuations. This effect is in fact also present for the = 2 km case where it causes the non-smooth increase of 0 .
The middle panel of Fig. 8 shows that the 1 curves for b field are slightly biased towards larger v HII compared to the ones for HII field. We also observe that the peak amplitude of 1 curves for b field decreases and the width of the curve increases compared to the ones for HII field. At early times ( v HII 0.2), the 1 determined from HII field is close to zero for all our reionization scemarios (see Section 4.5). However our method gives non-zero values for 1 determined from b field. This anomaly is again caused as the superpixel method struggles to separate ionization and density fluctuations during these early stages of reionization.
The right-most panel of Fig. 8 shows the 2 curves for b field. These curves are also slightly biased towards larger v HII compared to the ones for HII field. Even though the resolution corresponding to = 1 km smooths away more features compared to = 2 km, the superpixel method identifies similar number of neutral regions as these neutral regions are relatively large structures (Giri et al. 2019a). Therefore the 2 curves for both resolution cases overlap with each other.

Telescope noise
The SKA-Low observations will contain instrumental noise, which we model using the method described in Section 3.3.2. Here we study the Betti numbers estimated from noisy 21-cm images. We consider two observation times, which are obs = 1000 h and 100 h. The 21cm images observed at obs = 1000 h is degraded to a resolution corresponding to = 2 km. In order to get similar SNR, we use = 1 km for the 21-cm images observed with obs = 100 h.
In the left-most, middle and right-most panels of Fig. 9, we show from left to right the 0 , 1 and 2 curves estimated from b field. As can be seen from these curves our method can reconstruct the topology of reionization most reliably for epochs v HII 0.4 from the noisy 21-cm images. For earlier times the curves are too noisy. In Section 4, we found that the topology of reionization during the early stages is mostly traced by the 0 values. These are very difficult to extract from the noisy 21-cm images as most ionized regions are small scale features which are indistinguishable from noise. The 1 and 2 values are more sensitive to the topology of reionization during the middle and late stages of reionization respectively. We observe good reconstruction of 1 and 2 curves in Fig. 9 derived from noisy 21-cm images, with the exception of the 1 values for the = 2 km, 1000 h observation which are severely underestimated.
As the 0 curves at v HII 0.4 are noisy, we fit Eq. 10 to the these curves for v HII 0.4 only. We list the fit parameter values in Table 3. Even though we cannot see any clear peak position in the 0 curves, the fit for the mock observations obs = 1000 h does give us good values for 0 and 0 . For the mock observations with obs = 100 h, no reasonable fit could be made. We conclude that to study the 0 evolution from 21-cm images, a long observation time is needed.
Similarly, we fitted parameters of Eqs. 11 and 12 to the 1 and 2 curves for v HII 0.4. These fit parameters are also given in Table 3. The peak position and width of the 1 curves derived from the noisy 21-cm images match quite well with the ones for the noiseless case. However, their amplitude is lower than for the ideal case. The

Comparing different reionization models
In this section, we compare Betti number curves for all our reionization models when observed with SKA-Low. The top panel of Fig. 10 shows the curves derived from mock observations at obs = 1000 h and = 2 km. We find that we can extract the correct evolution the topology of reionization in all the models. These curves are more reliable at epochs v HII 0.4. Using all the Betti numbers together, we can distinguish between the models. In Table 3, we give the parameter values while fitting all these curves with Eq. 10, 11 and 12. The derived and values are quite close to the values derived from the HII field.
We also compare the Betti number curves for all our reionization models in our more pessimistic scenario. The bottom panel of Fig. 10 gives the curves derived from mock observations at obs = 100 h and = 1 km. Even though the curves are noisy, we can discern the expected evolution of the Betti numbers for each model. These curves are more reliable at epochs v HII 0.5, making it difficult to derive a fit for the evolution of 0 . In Table 3 contains the fit parameters for the 1 and 2 curves. As the late neutral islands tend to be large, we find that we can measure the topology of reionization at late times quite well. As noted above, this can even be done using relatively short observations times with SKA-Low.

SUMMARY
In this work we explored a new method to study the three dimensional topology of H regions during reionization, namely the three Betti numbers. In the perspective of the distribution of ionized regions the zeroth Betti number ( 0 ) gives the number of isolated H regions, the first Betti number ( 1 ) gives the number of neutral tunnels through ionized regions and the second Betti number ( 2 ) gives the number of neutral islands embedded in ionized regions.
We investigated the evolution of the Betti numbers in a small set of fully numerical and semi-numerical simulations and found a  characteristic pattern in which each of the Betti numbers first rises and then falls, each attaining a maximum value at different stages of reionization. When reionization starts, most of the ionised bubbles are isolated and 0 is the largest of the three numbers. It initially rises as the number of ionising sources, including individual and clustered sources, grows. However, as existing ionized regions grow and new ones continue to form, they will start to overlap, leading eventually to a decrease of 0 .
When substantial overlapping of ionised bubbles starts, more and more of these connected regions will be crossed by neutral tunnels and we consequently observe a rise in the values of 1 . The time when the decreasing 0 and the increasing 1 cross corresponds to the appearance of the ionized percolation cluster (Furlanetto & Oh 2016). 1 reaches its maximum value at the middle stages of reionization. During this stage, isolated neutral regions, some of them the remnants of broken up tunnels, become more common and therefore the value of the second Betti number ( 2 ) starts to increase. The time when the 1 and 2 curves cross corresponds to the disappearance of the neutral percolation cluster. During the later stages of reionization 2 reaches its peak value after which it drops as the last neutral islands disappear.
We find that the characteristic evolution of the Betti numbers with the volume-averaged ionisation fraction ( v HII ) of the Universe is quite well fitted by simple functions, a normal or Gaussian function for 1 and log-normal functions for 0 and 2 . Each of these is determined by three parameters specifying the peak amplitude ( ), peak position ( ) and width ( ) of the curves. Using our suite of reionization simulations, we showed that we can distinguish reionization models by their sets of Betti curves.
On the basis of these results we conclude that the Betti numbers provide a better way to describe the topology of reionization than the much better studied genus or Euler characteristic. Not only do the Betti numbers represent a set of intuitively clear morphological concepts in the context of ionized regions, they also display characteristic evolutionary patterns in which they each distinguish different stages of reionization, and which also appear to fit simple functional forms. Furthermore, their evolution shows a connection with the percolation process characteristic of reionization. As the Euler characteristic can easily be calculated from the Betti numbers, there appears to be no advantage to studying the topology of reionization with it.
The redshifted 21-cm signal probes the distribution of neutral hydrogen gas during reionization. Observations with the future SKA-Low will deliver 3-D tomographic datasets consisting of sky images of the 21-cm signal at different frequencies. We explored the prospect of deriving the topology of H regions from such datasets. To identify ionized regions in the 21-cm data we use the superpixels method introduced in Giri et al. (2018b). We find that we can retrieve the topology of reionization quite well for middle and late stages of reionization ( v HII 0.4). At early times, most H regions are small and difficult to identify in the mock 21-cm datasets due to a combination of low resolution and instrumental noise. We tested our method for two observation times, 1000 h and 100 h. For both cases we could retrieve the evolution of 1 and 2 better than the one for 0 . We therefore expect that 1 and 2 in practice will be more robust measures of topology than 0 . All Betti numbers curves derived from the mock 21-cm observations are noisy due to the imperfections of the observational data which do not allow a perfect identification of H regions. However when we fit 1 and 2 curves with our three parameter models, the intrinsic values of the model parameters are retrieved quite well. Retrieving the model parameters for 0 is more challenging for the reasons explained above.
We have also compared the topologies determined from mock 21-cm observations constructed from our suite of reionization simulations. We can clearly distinguish between these models for a 1000 h observation with SKA-Low. Determining the evolution of the Betti numbers with a 100 h observation is more difficult. However, we can still distinguish between our models using 1 and 2 . As iso-lated neutral regions are relatively large structures (Giri et al. 2019a), the values of 2 estimated from the mock observations are the least affected by limited resolution and noise.
In this paper we have explored a small set of five reionization simulations, produced by two different codes. One might wonder about the validity of some of our conclusions, e.g. regarding the shapes of the Betti curves. If reionization proceeded inside-out, starting in the densest regions where most of the photons are produced, then the evolution of the Betti numbers will approximately follow the shapes of the Betti curves for a Gaussian Random Field (GRF), see Appendix A. This means that each Betti number will peak at a different time, in the order 0 , 1 , 2 . As our simulated and observed volumes contain many structures, we also expected the Betti curves to vary smoothly. It is harder to prove that generic shapes we found in all our simulations are truly universal. For a GRF all three Betti curves appear to be approximately Gaussian, so the log-normal shapes we found for 0 and 2 are indicative of the fact that ionization and density are only partially correlated in simulations. We consider our proposed fitting functions to be a useful working hypothesis until cases are found where this approach breaks down.
Our study has used a number of simplifying assumptions which we would like to briefly discuss here. The first is that we have calculated the differential brightness temperature assuming the high spin temperature limit. If X-ray heating is not as efficient as generally as-sumed, this assumption may not be globally valid during the earlier stages of reionization. However, as we have seen, the early stages of reionization will anyway be challenging to study topologically as it will be difficult to identify ionized regions in the data.
We have also neglected the various line-of-sight effects when constructing our mock 21-cm observations. These are the light-cone (LC) effect (e.g. Datta et al. 2012) and redshift space distortions (RSD) (e.g. Jensen et al. 2013b). The former effect is caused by the evolution of the signal along the frequency direction. If we would analyse 21-cm datasets with a large frequency range ( 10 MHz), we would see more overlapping regions at the high frequency end than at the low frequency end. This will affect the topology of H regions and therefore the Betti numbers. However, this can be avoided by choosing a narrower frequency range. The RSD effect displaces the signal from its cosmological redshift due to peculiar velocities of gas. This process mostly alters the amplitude of the signal and has a relatively minor impact on the morphology of H regions (Giri et al. 2018a) Therefore we expect only a minor impact on the Betti numbers, specially at the resolution of the telescope. We defer a detailed study of the impact of these line-of-sight effects to the future.
A potentially larger challenge is the impact from bright foreground signals which are many times brighter than the 21-cm signal we want to analyse (Jelic et al. 2008). A good foreground mitigation strategy is crucial to extract the 21-cm signal from radio observation. However, these methods are not perfect and therefore will leave residuals in the 21-cm images. If our structure identification methods struggles to distinguish between these residual foreground and the H regions, then studying the topology of reionization would become difficult. In such a case, we have to develop a foreground mitigation method that preserves the topological information. A detailed study of the impact of foreground residual is beyond the scope of this paper.

APPENDIX A: TOPOLOGY OF GAUSSIAN RANDOM FIELD
In this section we use a Gaussian random field (GRF) to validate our Betti numbers estimator for digital data. Our GRF is constructed from a (300) 3 data cube filled with Gaussian random numbers. Due to discretization, our data cube will not be a perfect GRF. Therefore, following Park et al. (2013) we smooth this data with a Gaussian filter with a FWHM of 5 cell widths.
We identify structures by putting a threshold on the intensity values. All the pixels below the threshold constitute the structure whose topology we want to study. For any field A the thresholds will scan through all the intensity values. Therefore we will get a set of Betti numbers corresponding to a set of threshold values. The set of thresholds is defined such that A =
The solid curve in Fig. A1 shows the Euler characteristic ( ) curve of the GRF. The curve is an even function with two peaks and one Figure A1. The Euler characteristics (solid, black) and Betti numbers 0 (dashed, blue), 1 (dotted, red) and 2 (dash-dotted, green) of a Gaussian random field calculated for various threshold values . trough which follows a fixed analytical form (see Doroshkevich 1970;Tomita 1990). Giri et al. (2019a) already showed that the curve derived by our estimator for digital data matches this analytical form. Fig. A1 also shows the Betti numbers as a function of threshold as determined with our estimator (see Section 2.2). The dashed, dotted and dash-dotted curve are 0 ( ), 1 ( ) and 2 ( ) respectively. We see that the left peak of the ( ) curve is caused by the peak in 0 or a large number of components compared to tunnels and cavities. Similarly the right peak in is associated with a large value of 2 or a large number of cavities compared to components and tunnels. The trough in the ( ) curve is due to the value of 1 or a large number of tunnels compared to components and cavities. The Betti curves for a GRF estimated with our method agree with those found in previous works. As far as we know no analytical forms exist for these ( ) curves. See Park et al. (2013) and Pranav et al. (2019) for a more extensive description of the Betti numbers of GRFs. This paper has been typeset from a T E X/L A T E X file prepared by the author.