Inferring the properties of the sources of reionization using the morphological spectra of the ionized regions

High-redshift 21-cm observations will provide crucial insights into the physical processes of the Epoch of Reionization. Next-generation interferometers such as the Square Kilometer Array will have enough sensitivity to directly image the 21-cm fluctuations and trace the evolution of the ionizing fronts. In this work, we develop an inferential approach to recover the sources and IGM properties of the process of reionization using the number and, in particular, the morphological pattern spectra of the ionized regions extracted from realistic mock observations. To do so, we extend the Markov Chain Monte Carlo analysis tool 21CMMC by including these 21-cm tomographic statistics and compare this method to only using the power spectrum. We demonstrate that the evolution of the number-count and morphology of the ionized regions as a function of redshift provides independent information to disentangle multiple reionization scenarios because it probes the average ionizing budget per baryon. Although less precise, we find that constraints inferred using 21-cm tomographic statistics are more robust to the presence of contaminants such as foreground residuals. This work highlights that combining power spectrum and tomographic analyses more accurately recovers the astrophysics of reionization.


INTRODUCTION
The formation of the first stars and galaxies through gravitational instability of small density fluctuations, several hundred million years after recombination, marks the beginning of a key phase transition in the Universe history referred to as the Cosmic Dawn. These first light sources emitted photons with enough energy to ionize the neutral hydrogen, which propagated in the Intergalactic medium (IGM) and progressively reionized the entire Universe. This latter era is called the Epoch of Reionization (EoR), where the transition from the Cosmic Dawn is rather ill-defined (see, e.g Ciardi & Ferrara 2005;Morales & Wyithe 2010;Pritchard & Loeb 2012;Furlanetto 2016;Dayal & Ferrara 2018, for reviews). Many unknowns remain about the nature of theses sources, and especially their efficiency in re-ionizing the surrounding neutral-hydrogen environment. Over the past decade, both observational and theoretical studies have suggested that a population of low-mass, very starforming galaxies with an average escape fraction of ionizing photons of esc = 0.1 − 0.2, were the dominant suppliers of ionizing photons during the EoR (Ouchi et al. 2009;Bouwens et al. 2012; ★ E-mail: s.r.n.gazagnes@rug.nl Robertson et al. 2013;Dressler et al. 2015;Finkelstein et al. 2019;Mason et al. 2019;Trebitsch et al. 2020). Nevertheless, the contribution of brighter sources is still actively discussed (Fontanot et al. 2012(Fontanot et al. , 2014Robertson et al. 2015;Madau & Haardt 2015;Mitra et al. 2018;Naidu et al. 2020;Dayal et al. 2020). Further observational studies are required to constrain the escape fraction of these sources during reionization. Directly measuring esc at high redshifts is very challenging because of the limited mean free path of these photons. Hence, indirect proxies are needed to estimate the ionizing efficiency of the sources of reionization, and some progress has been recently made in that direction using analogs at lower redshift (e.g Schaerer et al. 2016;Verhamme et al. 2017;Chisholm et al. 2018Chisholm et al. , 2020Henry et al. 2018;Wang et al. 2019;Cen 2020;Gazagnes et al. 2018Gazagnes et al. , 2020Izotov et al. 2020;Katz et al. 2020).
The detection of the 21-cm line resulting from the spin-flip of the neutral Hydrogen electron can provide complementary information to investigate the nature of ionizing sources and the propagation of ionizing fronts. First-generation interferometers, such as the Murchison Widefield Array 1 (MWA; Tingay et al. 2013;Bowman et al. 2013), the Low-Frequency Array 2 (LOFAR; van Haarlem et al. 2013), the Giant Metrewave Radio Telescope (GMRT) 3 (Swarup et al. 1991), or the Precision Array for Probing the Epoch of Reionization 4 (PAPER; Parsons et al. 2010), aim at detecting the 21-cm fluctuations during the EoR. While no detection has been achieved yet, they yield tremendous improvements in our understanding of interferometric observations. Recently, improved upper limits on a statistical detection of the 21-cm fluctuations (at the 2 level) using the power spectrum have been published in the literature: Trott et al. (2020) reported Δ 2 21 ≤ (43 mK) 2 at k = 0.14 h Mpc −1 and = 6.5 using 110 hours of MWA observations and Mertens et al. (2020) found Δ 2 21 ≤ (73 mK) 2 at k = 0.075 h Mpc −1 and = 9.1 using 141 hours of LOFAR observations. The latter results already put some light on the astrophysics of reionization by ruling out exotic scenarios incompatible with this upper limit (Greig et al. 2020a;Ghara et al. 2020;Mondal et al. 2020).
The progress with current instruments, but also the development of the second-generation interferometers such as the Hydrogen Epoch of Reionization Array 5 (HERA; DeBoer et al. 2017) and the Square Kilometre Array 6 (SKA; Mellema et al. 2013;Koopmans et al. 2015) will progressively tighten our understanding of the EoR by either yielding deeper limits on the power spectrum of the 21-cm fluctuations, or directly imaging it with redshift. Additional insights can be extracted from the non-Gaussian part of the 21-cm fluctuations, which are not encoded in the power spectrum. Over the past decades, a large number of theoretical studies have introduced new statistical formalism to extract information from the 21-cm signal. Shimabukuro et al. (2016), Majumdar et al. (2018), Watkinson et al. (2019) or Hutter et al. (2020) have used the 21-cm bispectrum, and shown that it could provide valuable insights about the ionization topology, and the size distribution of the ionized structures. Similarly, Gorce & Pritchard (2019) have shown that comparable insights could be extracted using 3-points correlations functions. Banet et al. (2020) recently used quantiles statistics to investigate the skewness and kurtosis of the 21-cm intensity probability distribution function (PDF) and shown that these non-gaussianity measurements can trace different reionization scenarios.
Finally, HERA and SKA should have enough sensitivity to map the 21-cm fluctuations and provide images of the evolution of the ionizing fronts during the EoR. The use of these 21-cm tomographic images has already been shown useful to analyze the size statistics of the ionized or neutral regions (Kakiichi et al. 2017;Giri et al. 2018a;Giri et al. 2019), the morphology of the brightness temperature fluctuations (Chen et al. 2019;Kapahtia et al. 2019), or the topology of the ionizing field (Elbers & van de Weygaert 2019). All these studies have emphasized that alternative approaches to the power spectrum could provide very valuable insights to understand the astrophysics of cosmic reionization, and could be used to break the degeneracy between models with similar power spectra (Kakiichi et al. 2017).
However, further analysis is required to assess how these approaches can be incorporated within a 21-cm analysis tool to quantify their robustness in inferring the astrophysical information lying in the 21-cm fluctuations. In this paper, we investigate how statistics extracted from 21-cm tomographic images can be included in a 21-cm EoR inference framework. To do so, we extend 21CMMC (Greig & Mesinger 2015, a Markov Chain Monte Marco (MCMC) analysis tool, originally designed to quantify reionization properties using the power spectrum of the 21-cm fluctuations. 21CMMC has been widely used over the past years to investigate the theoretical constrains set by different 21-cm experiments (Greig & Mesinger 2015Park et al. 2019), break the degeneracy between different reionization topologies (Binnie & Pritchard 2019), propose optimal designs for 21-cm instruments (e.g with the SKA; Greig et al. 2020b), or rule out astrophysical models incompatible with the recent LOFAR upper limit (Greig et al. 2020a). In this work, we adapt it to assess the constraints set by 21-cm tomographic statistics and investigate their robustness compared to theoretical results obtained using the power spectrum. To achieve this, we create realistic mock observations using the point spread function and noise profile of the SKA1-Low telescope configuration, assuming 1000 hours of observations at redshifts 10, 9, and 8. We use DISCCOFAN (Gazagnes & Wilkinson 2019) to extract the number-count and the morphological pattern spectra (Maragos 1989;Wilkinson & Westenberg 2001;Urbach et al. 2007;Westenberg et al. 2007) encoding the shape characteristics of the individual ionized regions. These statistics provide an intuitive approach to investigate the evolution of the ionizing field at different redshifts, and to infer the sources and IGM properties of the process of reionization.
This work is organized as follow: Section 2 details the implementation of 21CMFAST, the semi-numerical code embedded in 21CMMC that simulates the redshifted 21-cm signal during the EoR. Section 3 explains the mathematical description of the statistics extracted from the 21-cm tomographic images, and Section 4 analyzes how these statistics vary with redshift, different reionization scenarios, or are impacted by the characteristics of the interferometer. Section 5 details the 21CMMC setup, and Section 6 shows the results of our MCMC analysis. Section 7 further assesses how 21-cm tomographic statistics robustly trace the underlying astrophysics of the reionization and can combined with power spectrum analyses. Finally, Section 8 summarizes our main conclusions. In this paper, we assume a ΛCDM Universe with cosmological parameters values Ω Λ = 0.692, Ω b = 0.048, Ω m = 0.308, 0 = 67.8 km s −1 Mpc −1 and 8 = 0.81.

SIMULATIONS AND INSTRUMENT SENSITIVITY
We use 21CMMC (Greig & Mesinger 2015), a Bayesian parameter estimation tool which combines a Markov Chain Monte Carlo (MCMC) algorithm with 21CMFAST , to explore how the shape of the ionized regions extracted from 21-cm tomographic images can be used to infer the properties of the sources of reionization. Section 2.1 details the 21CMFAST parametrization and Section 2.2 describes the interferometer characteristics adopted in this work.

21CMFAST
21CMFAST 7 is a publicly available semi-numerical code that simulates the evolution of the 21-cm signal during the Cosmic Dawn and the EoR using an analytical model introduced in Furlanetto et al. (2004a). Its implementation is detailed by Mesinger et al. (2011). 21CMFAST first generates realizations of the evolved baryonic and velocity density field using the Zel'dovich approximation (Zel'Dovich 1970) at different redshifts. The brightness temperature contrast, b , which is proportional to the 21-cm signal intensity and evaluated relative to the temperature of the Cosmic Microwave Background (CMB), is then computed as (Furlanetto et al. 2006 where ion is the global ionized fraction of the IGM in the universe, b is the fractional overdensity in baryons, (z) is the Hubble parameter, d /d is the comoving gradient of the comoving velocity along the line of sight, S is the spin temperature, and CMB is the CMB temperature. b is evaluated at a redshift , such as = 0 / − 1 where 0 is the rest frame 21-cm frequency (1420 MHz). We include the effects of peculiar velocities and assume that the spin temperature is above the CMB temperature ( S >> CMB ) such that, in the absence of foregrounds or instrumental effects, the presence of ionized structures directly relates to the absence of 21cm signal ( b = 0 mK). This assumption, while considered valid during most of the reionization (6 < < 10, Chen & Miralda-Escudé 2004;Baek et al. 2010), could break down during the early stages, or around under-dense regions where the IGM is not sufficiently heated. We discuss the impact of this hypothesis in Section 7.3.
The ionized regions are recovered using an excursion set formalism that compares the number of ionizing photons per time-step to the density of baryons in regions of decreasing scales . A given cell, , is identified as ionized if it fulfills the following criterion: where coll ( , , , min ) represents the collapse fraction estimated within a given scale around the cell at a redshift , and depends on the minimum mass of the star-forming halo min (Press & Schechter 1974). The parameter characterizes the ionizing efficiency of the sources and is further detailed in Section 2.1.1. 21CMFAST additionally includes partially ionized cells by setting their ionizing fraction to the value of coll ( , , , min ) when reaches the minimum scale. The focus of this study is to explore how the morphological properties of the ionized regions can be used to infer the properties of the ionizing sources. For this proof-of-concept work, we restrict ourselves to a reionization parametrization with three parameters: the ionizing efficiency of the reionization sources ( ), the mean free path of ionizing photons ( mfp ), and the minimum virial temperature of dark halos hosting star-forming galaxies ( min vir ). We detail each of them in the following sub-sections.

The ionizing efficiency
The ionizing efficiency of the galaxies during the EoR is defined as where esc is the escape fraction of ionizing photons, ★ is the fraction of galactic gas in stars, is the number of ionizing photons produced per baryon within a star, and rec is the typical number of recombinations of a hydrogen atom. Equation (3) assumes a constant ionizing efficiency for all galaxies formed in halos with sufficient mass ( > min ). We note that this results in a sharp drop of the non-ionizing UV-luminosity function for halos with mass < min (see Figure 1 in Greig & Mesinger 2015).
Throughout, we consider a range of ionizing efficiencies from 1 to 250. As discussed in Greig & Mesinger (2017), this range explores a large variety of EoR models, with sources having both very low and large ionizing efficiency. The choice of strongly affects the evolution of the EoR, such that increasing will lead to a faster reionization process. Nevertheless, its impact also depends on the minimal virial temperature min vir , since the combination of both these parameters sets the average ionizing budget during reionization.

The minimum virial temperature min vir
The collapse fraction coll is the fraction of mass in a given volume enclosed in halos of individual mass larger or equal to min . In 21CMFAST, min is defined through the minimum virial temperature of star-forming halos, min vir , and expressed as (Barkana & Loeb 2001): with the mean molecular weight, Ω m the evolved Ω m at redshift ,and Δ c = 18 2 + 82 − 39 with d = Ω m − 1. As mentioned above, the choice of min vir impacts the position of the sharp drop in the UV luminosity function ( = 0 if vir < min vir ). We adopt the same prior as used in Greig & Mesinger (2017), such that min vir varies from 10 4 to 10 6 K. The lower limit of vir corresponds to the minimum temperature to trigger efficient atomic cooling (Barkana & Loeb 2001;Kimm et al. 2017), while the upper limit is chosen to be consistent with the host halo mass of high redshift Lyman break galaxies observations (Kuhlen & Faucher-Giguère 2012;Barone-Nugent et al. 2014).

The mean free path of photons mfp
The growth of the ionized regions is a dynamic process, which mainly depends on the recombination rate of the hydrogen atoms and the propagation distance of the ionizing photons. The balance between both processes sets the physical size of the ionized bubbles. The current 21CMMC version explicitly computes in-homogeneous recombinations. However, we use a simplification which assumes a maximum horizon for the ionizing photons in the IGM. This parameter is denoted by mfp and fixes the maximum smoothing scale used in Equation (2). Its impact is most significant when the ionized regions size becomes closer to mfp (Greig & Mesinger 2017). Small values of the mean horizon of the ionizing photons (e.g. < 10 Mpc) can delay the late stage of reionization. On the other hand, Greig & Mesinger (2017) note that values larger than 15 Mpc have little impact on the power spectrum of the 21-cm fluctuations because the fusion of the ionized regions is then the dominant growing process. We adopt a flat prior of mfp ∈ [5,25] Mpc, similar to Greig & Mesinger (2015.

Interferometer characteristics
We create mock observations using the point spread function (PSF) and the expected noise from 1000 hours of observations with SKA1-Low 8 . The current configuration of SKA1-Low consists of 512 stations, where 224 of them are randomly distributed in a central circular core of 500 meters in radius. The remaining 288 stations are placed in 36 clusters located in three spiral arms out to a radius of 50 km from the central core. The noise maps are computed using the python package 21 9 (Giri et al. 2020) which uses the formalism detailed in Ghara et al. (2017) and Giri et al. (2018a). The system noise per visibility is a Gaussian random variable with mean zero, and variance 2 defined as where is the Boltzmann constant, sys is the telescope temperature which accounts for the receiver and sky temperature ( sky ∝ −2.55 ), A eff is the effective collective area of the individual receivers, Δ is the frequency resolution of the data and Δ is the integration time. sys and A eff are fixed by the SKA1-Low technical design. Additionally, we adopt Δ = 10 seconds per visibility, 6 hours of observing per day ( day obs ), and a total observation time ( tot obs ) of 1000 hours. The noise maps are obtained by generating the uv maps (u, v, ) using the baseline distribution in the gridded uv plane. Then, a noise cube is derived in the Fourier domain using Gaussian distributed values with mean zero and variance 2 for both the real and imaginary parts. The cells in the uv-plane that are not sampled are set to 0, and we divide the noise values in the remaining bins by 1 √ (u,v, ) . Finally, the noise values are further scaled down by a factor tot obs / day obs . In reality, the baseline distribution varies in frequency such that a distinct uv map should be generated for each channel. However, in practice, these differences are small when the frequency bandwidth of the observation is relatively narrow. Consequently, we generate noise cubes using a single uv distribution which corresponds to the central frequency of the observation. Table 1 summarizes the interferometer characteristics and observation parameters for this work.

THE MORPHOLOGY OF THE IONIZED REGIONS
The power spectrum is a well-defined approach to analyze the excess of the 21-cm signal at different scales and already provides 8 https://astronomers.skatelescope.org/wp-content/uploads/2016/09/SKA-TEL-SKO-0000422_02_SKA1_LowConfigurationCoordinates-1.pdf 9 https://github.com/sambit-giri/tools21cm valuable clues about the evolution of the ionization fronts during the EoR. However, it is blind to the non-Gaussianities of the 21-cm fluctuations (Furlanetto et al. 2004b), while the latter can provide important information to understand and constrain the underlying physical processes during reionization . A wide range of novel approaches, using higher-order and topological statistics, have been investigated to extract the non-Gaussian information lying in the 21-cm fluctuations (see Section 1 and Greig 2019). In this work, we define a new approach based on a pure morphological description of the ionized regions. This method provides a simplistic but intuitive description of the shape of the ionized regions, which can be efficiently extracted from noisy and PSFconvolved 21-cm image cubes. Thus, it is well suited for a Bayesian framework analysis which requires comparing thousands of models in a reasonable amount of time. Section 3.1 introduces these 21-cm tomographic statistics, and Section 3.2 describes DISCCOFAN, a massively parallelized image processing tool designed to extract the morphology of the structures observed in image and image-cubes.

Morphological attributes
Several studies already highlighted theoretical differences in the morphology of the ionized regions for diverse reionization models (e.g Kakiichi et al. 2017;Giri et al. 2018a;Kapahtia et al. 2019;Gorce & Pritchard 2019). In this work, we use the morphological pattern spectra (Maragos 1989) of the ionized regions extracted from 21-cm image-cubes. This mathematical formalism is based on scale-invariant morphological attributes derived from the momentof-inertia matrix of each ionized region. This approach has been previously introduced for detection and extraction of anomalies (aneurysms and stenoses) in blood-vessels (Wilkinson & Westenberg 2001;Urbach et al. 2007;Westenberg et al. 2007). For a discrete case (e.g. quantized images or volumes), the moment of inertia matrix of a three dimensional object is defined by where [ , , ] are the coordinates of the voxels belonging to , [¯,¯,¯] are the coordinates of its center of mass, and is its volume. This formalism is derived from the continuous case where = ∭ 2 . The factor /12 is introduced to account for the moment of inertia of the individual cubic voxels. From and its eigenvalue decomposition, we extract four moments-invariant attributes encoding different morphological properties of a three dimensional geometrical structure: the Elongation E, the Flatness F , the Non-compactness N , and the Sparseness S.

Non-compactness: N is defined as
where Tr( ) is the trace of the moment of inertia tensor . The unit of is consistent with a length L to the power 5 while the volume scales with L 3 , hence the ratio Tr( ) 5/3 is scale-invariant. N reaches a minimum value of 0.25 for perfectly spherical objects and increases for complex asymmetric structures.
Elongation and Flatness: E and F are derived using the ratio of the eigenvalues of , referred as 1 , 2 and 3 with | 1 | ≥ | 2 | ≥ | 3 |, such that The eigenvalues represent the variance of the coordinates of an object along its main axes, scaled by its volume. In other words, they probe the spatial expansion of a three dimensional structure along the directions of maximal growth, such that | 1 | = | 2 | = | 3 | for perfectly spherical objects or | 1 | > | 2 | = | 3 | for cylindrical shapes. Thus, E and F refer to the divergence between the object growth in different directions. Combined, they provide information about the eccentricity of a geometrical shape. From the description above, it comes trivially that spherical objects have E = F = 1, cylindrical or cigar-shaped objects have E > 1 and F ≈ 1, and more complex tri-axial shapes have E and F larger to 1. The analysis of E and F is useful to reveal a dominant direction in the growth of the ionized regions, and therefore provide information whether the propagation of the ionizing radiation is isotropic.

Sparseness: S is defined as
It corresponds to the ratio of the expected volume computed using the coordinate variance along the three main axes (product of the three eigenvalues), versus the actual volume, obtained by summing over all the voxels belonging to the object. Broadly speaking, S relates to the filling factor of an object. It reaches 1 for filled spheres or ellipsoids, 1.127 for solid rectangular blocks, and increases as the structures become more porous.
Overall, these four morphological attributes provide different information about the shape of the ionized regions. E and F highlight the the structures eccentricity, while N and S describe the shape complexity, through its degree of asymmetry and porosity. Additionally, in all this paper, we also consider the ionized regions volume (V), defined by the number of individual cells belonging to each object O multiplied with the cell spatial resolution, and the number of ionized regions observed, bub . Both have been shown to be reliable probes of the underlying astrophysics of reionization (e.g. Kakiichi et al. 2017;Giri et al. 2018a).

Extracting morphological attributes with DISCCOFAN
To extract the number and the morphological pattern spectra of the individual ionized regions from 21-cm tomographic images, we use DISCCOFAN 10 (DIStributed Connected COmponent Filtering and ANalysis; Gazagnes & Wilkinson 2019, Gazagnes & Wilkinson, submitted), a recent method that combines image processing and mathematical morphology techniques to efficiently find, select, and analyze the connected structures in two and three dimensional data sets. Contrary to classical pixel-based approaches, DISCCOFAN works on a region-based representation of the data referred to as a component tree (Salembier et al. 1998). During the first step (the flooding), the algorithm goes through all the voxels and group them in connected components. The typical clustering rule to define a connected component is to select all the neighbors' voxels above a given threshold set. The nested relations between the connected components at different intensity levels are stored in a tree structure, which allows us to perform complex multi-scale operations on this optimized hierarchical representation. The use of similar treetechniques has been successfully used to improve the detection of faint-objects in optical astronomical surveys (Teeninga et al. 2016). We only consider the cases of binary three dimensional cubes where the ionized structures have been extracted beforehand using an independent thresholding technique (see Section 4.3). These tools provide a powerful framework to analyze the full range of 21-cm intensities, and we plan to explore this aspect in future works.
DISCCOFAN can process very large data-sets (> 100 Gigapixels; Gazagnes & Wilkinson, submitted). It is massively parallelized using shared and distributed memory techniques, thus is fast and efficient to analyze the properties of connected structures in any data. It takes ≈ 10 seconds to create a model on a 128 3 box, while it takes less a second to extract the morphological pattern spectra. This asset makes it well suited to be incorporated in a Bayesian inferential framework, where thousand of models need to be processed in a reasonable time.

MORPHOLOGY OF THE IONIZED REGIONS
In this section, we investigate the evolution of the morphological pattern spectra of the ionized regions as a function of redshift (Section 4.1), different reionization scenarios (Section 4.2), and the impact of the expected noise profile and PSF of SKA1-Low on the recovered statistics (Section 4.3).

Evolution during reionization
We first investigate the evolution of the morphological attributes (V, E, F , S, and N ) during reionization using simulated 21-cm maps.
We generate co-eval boxes of 512 3 cells with 2 Mpc resolution using 21CMFAST with the set of parameters = 30, min vir = 5.10 4 , and mfp = 15 Mpc (referred as the model). This model characterizes a cosmic reionization dominated by a population of several sources with a relatively low ionizing efficiency (Mesinger et al. 2016;Greig & Mesinger 2017). We consider three different redshifts corresponding to snapshots where the ionizing fraction ( ion ) in the box is successively 0.1, 0.3 and 0.7. This choice investigates the evolution of the ionized regions morphology for three different reionization stages where the universe is mostly dominated by (1) individual ionized regions; (2) ionized filaments resulting from the fusion of the ionized structures; and (3) neutral islands spanning through a main ionized percolated region which fills most of the universe (Chen et al. 2019). Figure 1 11 shows the resulting distributions of the five morphological attributes for the three snapshots as a corner plot. The diagonal and lower panels show respectively the the one and twodimensional marginalised probability density functions (PDFs). The contours in the lower panels enclose successively 68, 95, and 99.7 % of the ionized regions distributions. Additionally, we highlight the morphological properties of the largest ionized region in each box using diamonds. The largest scales are typically more sensitive to sample variance. However, resolving the properties of individual ionized regions with SKA is expected to be simpler as their size grows larger (Ghara & Choudhury 2020 reionization process using the morphological properties of these large structures. We observe that the typical size of the ionized regions shrinks as reionization progresses, while the largest objects grow larger. The probability to observe several large ionized structures decreases because more and more of these regions fuse into a larger percolated cluster, spanning the whole box, with boundaries that are artificially infinite. Figure 1 shows that a large ionized regions of ≈ 10 7 Mpc 3 is already present at ion = 0.1, suggesting that a percolated object already formed at this early stage. This is consistent with Furlanetto & Oh (2016) who found that the formation of a percolated cluster typically happens when the universe is still only mildly ionized ( ion between 0.1 and 0.2).
The distributions of E and F also provide some interesting insights about the growth of the ionized regions. The elongation and flatness are large when the ionized regions are small, but decrease as their volume gets larger, with E and F tending closer to 1. This suggests that the growth mechanisms of the small structures are likely anisotropic, but the fusion of the individual regions equalizes the spatial extension of the larger merged structures. However, the E and F values derived in the largest ionized region should be taken with caution. Indeed, the box has a finite length, such that the maximal spatial extension of an object is limited by the size of the simulation. Hence, the percolated cluster will always have E ≈ F ≈ 1, because it spans over the whole box.
Interestingly, the two dimensional PDF of F versus E highlights that only few objects have E and F = 1 which supports that the shape of the ionized structures strongly diverges from the idea of spherical bubbles. On the other hand, objects with the largest values of E (F ) have F (E) closer to 1, suggesting that these regions are more extended along a specific direction, and resemble cylindrical or filamentary structures. This outcome is somewhat expected from inside-out reionization, as implemented in 21CMFAST (Furlanetto, Hernquist, and Zaldarriagan (FZH) model;Furlanetto et al. 2004a). In such a model, the star formation is predominantly located on the high-density filaments such that these regions are ionized first. Thus, the ionized regions grow and merge along with these filamentary structures, which likely explains the observed shape. Additionally, the distributions of E and F remain fairly constant as reionization progresses, which might indicate that these quantities probe independent IGM properties, such as the structure of the underlying density field. Recent studies have shown that the power spectrum should be able to provide information about the topological property of the reionization (outside-in or inside-out) because it can indirectly probe the correlation between the ionization and density field with the high-density regions (Binnie & Pritchard 2019;Pagano & Liu 2020).
Finally, Figure 1 highlights that larger objects are typically more porous and less compact than smaller ionized regions. This is somewhat expected since the formation of the largest objects is mostly driven by the fusion of individual ionized regions, such that the resulting structures are more complex and porous. The values of S and N decrease as reionization evolves because the neutral regions within the ionized filaments become progressively ionized. This can be seen as the 99.7% contours enclose a narrower range of S and N for ionized regions with the same volume at any redshift.
This effect is also particularly noticeable on the properties of the percolated cluster in each box. As reionization progresses, these objects become more compact and "filled" because the remaining neutral patches are progressively ionized. Hence, even when the universe is predominantly ionized, comparing the morphology of these percolated clusters might still provide valuable insights about the properties of the ionizing sources. This is further discussed in the next section.
Overall, Figure 1 emphasizes that the evolution of the morphology of the ionized regions probes the reionization history. Mainly, early reionization stages form complex, eccentric, and overall porous ionized structures due to the fusion of the individual ionized regions along the high-density filaments of the underlying density field. As reionization progresses, these structures continue to grow as their ionizing fronts are propagating. However, their maximal size becomes limited by the merging mechanisms, such that they are more likely to fuse early with the percolated cluster. In the next section, we compare these statistics for two different reionization scenarios.

Morphological spectra for different reionization scenarios
Understanding how the neutral hydrogen was ionized during to reionization requires one to constrain the properties of the sources that dominantly contributed to the ionizing budget of the EoR. Studies generally investigate the impact of different populations of reionization sources, labeled through their ionizing efficiency, the typical mass of the host halos, or the hardness of their X-ray spectral energy distribution (SED). We use the and models, previously introduced in Mesinger et al. (2016) and Greig & Mesinger (2017), and defined by the following sets of parameters: • : = 30, min vir = 5.0 × 10 4 , and mfp = 15 Mpc.
The model, already introduced in Section 4.1, characterizes a reionization history dominated by a large population of galaxies with a relatively low ionizing efficiency. On the other hand, the model has larger and min vir , and is characterized by fewer sources with larger ionizing efficiency. By construction, these models have fairly similar reionization histories (see Figure 2 in Greig & Mesinger 2017) and match both the inferred evolution of the cosmic star formation rate (SFR) density using the extrapolated observed luminosity functions of Bouwens et al. (2015) and the electron scattering optical depth, = 0.058 ± 0.012 from Planck Collaboration et al. (2016). These two scenarios have strong implications for the patchiness of reionization and the morphology of the observed ionized regions, and provide a first glance at what differences would we observe for an EoR driven by AGNs or by fainter galaxies (Robertson et al. 2015;Finkelstein et al. 2019). We note that these models are not unique to explain reionization, as recent studies have shown that "intermediate" scenarios, driven by brighter galaxies, could very well match the current observational constraints (e.g Lyman damping wings of quasars and galaxies at > 7) (Naidu et al. 2020). Nevertheless, they are interesting for astrophysical parameter forecasting frameworks to explore which information is needed to favor or rule out specific scenarios.
In Figure 2, we investigate the distribution of the morphological attributes of the ionized regions for these two models using a snapshot with ion ≈ 0.2. This ionizing fraction corresponds to redshift 9 and 8.5 for the and , respectively. Similarly to Figure 1, the contours plotted in the lower panels include successively 68, 95, and 99.7 % of the distributions and we highlight the properties of the largest (percolated) ionized region in the box with diamonds.
We note several modest differences between the observed distributions for both models. The size distribution of the ionized regions suggests that larger ionized structures have formed when reionization is dominated by brighter sources. Additionally, the distributions of S and N show that the model produces more sparse and asymmetric ionized regions. We note that these differences are more significant in the tails of these distributions, suggesting that outliers are important to discriminate between these different reionization scenarios.
The distributions of E and F stay roughly similar for both models. As mentioned above, this might suggest that this property is independent of the reionization scenario as it probes the correlation between the ionization field and the underlying density field. This property should prove useful to investigate the global topology of reionization or more complex reionization models, as discussed in Section 4.1.
The morphology of the percolated object in each model provides similar insights. It appears more porous and less compact in the model. This is expected since a population of fainter sources form many individual ionized regions, whose fusions result in more porous and larger structures compared to a reioniza- tion scenario driven by fewer brighter sources. Interestingly, both percolated clusters have similar size, suggesting that their morphological properties can provide important additional insights on the nature of the underlying sources.
Overall, these results highlight that the most significant differences between these two reionization scenarios are related to the size, sparseness, and compactness of the ionized regions. Additionally, while we did not discuss it in this section, the total number of observed ionized regions should also provide additional information to compare different models. Several studies showed that we should observe fewer ionized regions for scenarios with brighter sources, or with a higher minimum mass of the star-forming halos (Kakiichi et al. 2017;Giri et al. 2018a). Nevertheless, these differences are more complicated to extract from realistic observations, as instrumental effects tend to smooth the contours of the ionized regions and suppress the information lying in the smaller scales.

Impact of the point spread function and noise
In this section, we investigate the robustness of the morphological properties of the ionized regions extracted from realistic 21-cm tomographic observations with SKA1-Low. To do so, we simulate noise cubes using the package 21 (which follows the procedure defined in Section 2.2), and add them to the simulated b boxes. Because the root mean square (rms) of the noise is too large compared to the 21-cm signal, we additionally smooth the data by applying a Gaussian smoothing in the spatial direction and a top-hat filter in the frequency direction such that the full width at half maximum corresponds to a maximum baseline of 2 km (similarly to Giri et al. 2018a,b). This is because most of SKA's collecting area is at baselines smaller than 2 km. This additional smoothing increases the S/N of the observation, but also smooths out the contours of the ionized regions. Throughout, we assume perfect foreground removal, but we investigate the impact of potential foreground residuals in Section 6.4. . Two dimensional slices extracted from the 512 3 cells co-eval 21-cm signal cube at ≈ 9 ( ion = 0.20) for the model. From left to right: the simulated brightness temperature intensity map with an intrinsic resolution of 2 Mpc (0.730 arcmin) resolution; the 21-cm signal observed with noise corresponding to the maximum baseline of 10 km with SKA1-Low; the 21-cm signal observed after applying a smoothing kernel in space and frequency corresponding to a maximum baseline of 2 km (≈ 2.9 arcmin or 7.9 Mpc resolution); the extracted ionized regions using the Triangle thresholding approach (see details in Section 4.3). approach to extract ionized regions in noisy interferometric observations using a super-pixel thresholding approach (SLIC; Achanta et al. 2012). They show that this method performs better than typical segmentation approaches when the ionization fraction of the box is larger than 0.1. Nevertheless, this method is relatively timeconsuming, making it less attractive when including it within a Bayesian inference framework such as 21CMMC. We choose a different thresholding approach, referred to as the Triangle thresholding (Zack et al. 1977). This strategy is based on the PDF of the pixel intensities in the image. It finds the optimal threshold value by constructing a line between the maximum and the minimum values and derive the intensity level where the distance between the histogram and the line is maximal. This technique is found to be particularly effective for bi-modal distributions. Hence, it is well suited to extract the ionized structures from noisy observations because the latter should imprint a peak on the PDF of the 21-cm intensities for observations well within the EoR. Nevertheless, this might be more complex for the very early stages where the ionizing fraction is too low to robustly identify these regions (Kakiichi et al. 2017;Giri et al. 2018b). Overall, we found that this method was giving good performance up to ion > 0.05, and that it gives similar performance as the SLIC approach (see Appendix A). The choice of our particular approach is motivated by its better computational efficiency (it takes 1 second using Triangle thresholding compared to 10 seconds using SLIC to process a box of 128 3 cells).
To investigate the effects of noise and resolution on the ionized region morphology, we compare the results obtained in Section 4.1 for the model using a snapshot where ion = 0.2. We display on Figure 3 the impact of noise on the simulated three dimensional image cubes, by showing from left to right the two dimensional slices corresponding to the simulated 21-cm image cube; after adding the noise; after smoothing using a kernel corresponding to a maximum baseline of 2 km; and the recovered ionized regions using the Triangle thresholding.
In Figure. 4, we compare the recovered distributions of the ionized regions morphological attributes in the noise-free case and after including the noise and smoothing, using the same corner plots as in Figures 1 and 2. Typically, instrumental effects tend to smooth the observed distributions of the morphological properties. We note that this effect is more significant for the size distribution of the ionized regions, which peaks at larger object sizes. This is expected from previous studies which showed that smoothing strongly affects the observed distribution of object sizes, because it artificially merges ionized regions that are originally disconnected, and thus indirectly decreases the number of individual ionized regions observed (Kakiichi et al. 2017;Giri et al. 2018a).
The contours of the 2D PDF of E and F are slightly larger, suggesting that they appear more eccentric than in the simulated maps. We note that this might be because we use a different smoothing procedure in the spatial and frequency space, with respectively a Gaussian and top-hat kernel, which could accentuate their extension along specific directions.
Additionally, the ionized regions extracted from realistic observations are overall less porous and more compact, especially regarding the smallest structures, since the PSF of the instrument, and the additional 3D smoothing will smooth the contours and the shapes of the ionized regions.
Interestingly, the percolated cluster has a smaller volume, but larger values of E, F , S and N , suggesting that the cumulative effects of noise and smoothing increase the peculiar morphology of this large structure. This is likely because, at this stage, the percolated cluster is already very sparse, such that the instrumental effects accentuate its porosity and overall complexity.
While the size distribution is slightly shifted towards larger objects, Figure. 4 shows that the four other shape statistics are fairly accurately recovered on noisy low-resolution observations. We note however that the impact of smoothing strongly depends on the reionization stage, and on the contrast of the 21-cm signal fluctuations which can vary for different reionization scenarios. Consequently, accurately analyzing the impact of the instrumental effects of the recovered ionized morphology should be done as a function of multiple reionization stages and different reionization scenarios. However, the performance of the inference framework will indirectly reveal whether the extracted statistics are robust enough against these effects.

21CMMC SETUP
21CMMC 12 is a MCMC sampler designed to explore the astrophysical parameter space of the Cosmic Dawn (also sometimes called the Epoch of Heating; EoH) and EoR. It includes a modified version of the python module C (Akeret et al. 2013) which is built on the top of the python module (Foreman-Mackey et al. 2013) using an affine invariant ensemble sampler (Goodman & Weare 2010). 21CMMC employs a streamlined version of 21CM-FAST (see Section 2.1) to efficiently simulate the 21-cm brightness temperate fluctuations for different sets of astrophysical parameters. It has been designed to prepare forthcoming observations and already used in many studies to quantify the constraints and degeneracies among the reionization model astrophysical parameters (e.g. Greig & Mesinger 2015Park et al. 2019). In this work, we use it to explore the use of 21-cm tomographic statistics as an alternative to the power spectrum to recover the EoR astrophysical parameters. In Section 5.1, we summarize the original implementation of 21CMMC based on the power spectrum of the 21-cm observations. We detail, in Section 5.2, the new likelihood function implemented in 21CMMC, which uses the ionized regions statistics extracted from 21-cm images. Finally, in Section 5.3, we present the mock observations and general setup used in this work.

21CMMC using the power spectrum
In 21CMMC, the likelihood function for the 21-cm power spectrum (Δ) is defined as a 2 statistics. It is expressed as where Δ obs and Δ mod are the power spectrum of the observation and the model, respectively, is the Fourier mode, the number of independent Fourier modes included (8 in this work), and Δ is the 21-cm power spectrum uncertainty defined as where thermal is the thermal noise, and variance obs and variance mod are the sample variance of the observation and model, respectively. In this work, thermal is estimated by deriving the uv coverage for a 1000h observation with SKA1-Low (see parameters in Table 1), simulating the thermal noise using a SEFD of 2500 Jy at the central frequency of the observation, and generating the corresponding thermal noise power spectrum and uncertainty 13 . We note that, in Greig & Mesinger (2015, the authors included an additional modelling uncertainty, typically fixed to 20% of the mock power spectrum value, to account for the semi-numerical approximations compared to fully numerical codes (Zahn et al. 2011;Hutter 2018), and differences in the radiative transfer equations implementation (Iliev et al. 2006). Nevertheless, in this work, we aim to investigate and compare the performance of a parameter inferential approach using the ionized regions morphological pattern spectra and using the 21-cm power spectrum. Hence, including such error term for the power spectrum would require an equivalent for the 21-cm tomographic statistics, which is not trivial to define. Consequently, we choose to exclude this modelling uncertainty for both cases. Additionally, the 8 independent bins are sampled in logarithmic space in the interval [0.1, 1] Mpc −1 . The lower bound of this interval corresponds to the foreground corruption limit while the upper bound is fixed by the thermal-noise limit.

21CMMC using tomographic statistics
Defining a likelihood function to compare the distribution of the morphological attributes of the ionized regions is not trivial. We need to compare two distributions of bub , 5-dimensional vectors ì, such that ì = [V, E, F , S, N ], and bub is the number of ionized regions extracted in the 21-cm images 14 . Comparing 5-dimensional distributions can be computationally costly and penalize the MCMC run if the time taken to compute the likelihood value is too large with respect to the time required to simulate the models. Additionally, the problem is complex because bub significantly fluctuates for different reionization scenarios. Nevertheless, this latter aspect, if handled properly, can also provide additional constraints during the inference, because the number of ionized regions should be closely connected to the parameters of the ionizing sources (Kakiichi et al. 2017;Giri et al. 2018a).
Hence, we choose to follow the approach from Vegetti & Koopmans (2009) to define our likelihood function. The authors used a strategy that divides the likelihood function into two independent factors: the first one accounting for the likelihood of observing a given number of objects using a Poissonian distribution, and the second to compare the properties of these objects.
We define a morphological likelihood function, L (morphology), using two separate factors: a Poissonian factor that compares the number of ionized regions between the model and the observation, and a distance factor that compares the two distributions of five dimensional vectors, independently of the number of objects in the distributions. In Appendix C, we provide a general expression to define a likelihood function suitable to compare -by-distributions, where is the dimension of the statistics used, and the number of objects that might vary for different sets of parameters. For our case ( = 5), the final likelihood expression is: 13 https://gitlab.com/flomertens/ps_eor 14 In practice, we only keep the ionized regions that have a volume larger than 10 Mpc 3 . This is because lower scales are heavily affected by the noise and the resolution of the instrument.
where mod bub and obs bub are the number of ionized bubbles in the observation and in the model, respectively, is a regularization parameter, d maha is the Mahalanobis distance (Mahalanobis 1936), and ì obs and ì mod are the five dimensional vectors carrying the five morphological attributes for each individual ionized region observed in the observation and model. A complete description of this likelihood function can be found in Appendix C, and we only briefly describe the general idea here. As mentioned above, the fac- obs bub obs bub ! assumes that the fluctuation of the number of ionized regions observed follows a Poissonian distribution.
The second factor provides a way to compare the distribution of 5D vectors, by finding, for each ionized region in the observation, the ionized structure in the sampled model with similar morphological attributes. This is done by extracting the pair of vectors ì mod and ì obs such that the Mahalanobis distance between them is minimal.
The choice of the Mahalanobis distance is motivated by the fact that classical distance metrics, such as the Euclidean distance, are not suited for high-dimensional applications (Aggarwal et al. 2001). Additionally, the Mahalanobis distance accounts for the covariance of the 5-dimensional distribution of morphological attributes observed (the distributions are normalized to unit variance for each parameters), such that it provides an unbiased metric that is suitable to compare the structures in different distributions of morphological attributes.
The minimum Mahalanobis distances are computed, elevated to the power five (i.e number of dimensions), and summed for all the ionized regions in the observation. In other words, this can be understood as minimizing the total volume enclosed between the ionized region morphological distribution in the observation and model in a five dimensional space. This approach should favor models whose ionized regions reproduce the observations closest.
The factor mod bub obs bub ensures that this second term is normalized to the number of ionized regions in the observations (see details in Appendix C). Finally, the parameter regulates the weight given either to the Poisson factor or to the normalized distance metric that compares the 5D distributions. Optimally, should be included as an additional parameter during the MCMC inference, such that its best value is defactorined during the sampling process. However, adding a parameter would further increase the complexity and computational time of the 21CMMC run. Hence, in this work, we choose to fix it a priori by performing several test-cases to find an arbitrary optimal value. We find that fixing to 0.001 provides the best inference results, and we use this value for all results shown in this paper.
We note that the function defined in Eq (12) is not a likelihood function, but rather an approximate penalty function, which is typically suited for regularized maximum likelihood estimation problems. This is because the second factor is not a "proper" statistics, contrary to the Poisson factor. Hence, it comes with some caveats, such as the absence of well-defined error terms related to the variance of the 21-cm tomographic statistics with respect to noise or sample variance. Nevertheless, by construction, can be considered as the inverse variance of the distribution, and therefore, act as a proxy for the error. Additionally, this approach still provide valuable insights to understand whether and how 21-cm tomographic statistics can be included in a Bayesian inference framework, and combined to power spectrum analyses to optimize future observational studies. We will refine this framework in future studies to account for the variance of the 21-cm tomographic statistics. In Section 6, we discuss the inference performance of (1) using only the ionized regions number count (the Poissonian factor), (2) using only the ionized regions morphology (the distance factor), or (3) using both. Increasing or decreasing simply shifts the result of (3) towards (2) or (1), respectively.

Mock observations
Extracting and analyzing the ionized regions from noisy 21-cm images is typically more computationally expensive than deriving the power spectrum. Hence, we choose to focus on relatively small boxes of 128 3 cells (250 3 Mpc 3 ). We use two sets of mock observations, corresponding to the and the models (Mesinger et al. 2016), to compare the constraining power of the ionized regions morphological pattern spectra and power spectrum. Additionally, we combine observations at three different redshifts, 10, 9, and 8. The mock power spectra are extracted using the simulated Fourier visibilities and combined with the total theoretical uncertainty computed using the procedure detailed in Section 5.1. We then create realistic 21-cm mock images including the SKA1-Low PSF, a random noise realization, and the additional 3D smoothing, and extract the morphological attributes of the ionized regions using the approach detailed in Section 4.3.
The models are sampled using a different set of initial conditions (i.e different realization of the density field) on the same grid size as the mock observations. Studies using the power spectrum typically use a larger box size to create the mock observations, while the models are produced on a smaller grid (but keeping the same cell resolution) (e.g. Greig & Mesinger 2015. Nevertheless, 21-cm tomographic statistics are not independent of the box size used (e.g. the number of ionized structures will increase for larger boxes), thus, we keep the same grid for both the models and the mock observations. We note that our experiment is likely more affected by sample variance due to this relatively small box size, and we further discuss this point in Section 6.3.
As detailed in Section 4.3, both the noise and the PSF impact the number and morphological attributes of the ionized regions. Consequently, these instrumental effects must also be included when sampling the parameter space during the inference process, to consistently compare models with observations. To accelerate the convergence of the MCMC analysis, we keep the noise realization fixed for each model sampled (but chosen different than for the mock observations). We performed several test-cases to ensure that the choice of a particular noise realization has little impact on the final inference results. The stopping criterion in 21CMMC is defined relative to a certain number of sample iterations, rather than to a convergence criterion. Therefore, we test for the convergence of the Markov chains using two tests. The first one is the Gelman-Rubin diagnostic (Gelman & Rubin 1992), which computes the sample mean and variance from multiple chains, and check whether they are similar enough to indicate approximate convergence. Additionally, we also use the Geweke diagnostic Geweke (1991), which compares the similarity of the mean and variance of segments from the beginning and end of a single chain. We note that these two tests estimate the convergence of the chains but are no proof of actual convergence.
They are usually used to prove a failure to converge, rather than guaranteeing that the chains converged to a global minimum.
Overall, running 21CMMC using the 21-cm tomographic statistics, 3000 iterations and 50 walkers takes around 8 days on a machine with 80 CPUs. Using only the power spectrum with the same setup takes approximately a factor two less time. Assessing the performance of this approach on larger observations will be crucial to understand the full potential of 21-cm tomographic statistics. To do so, using emulators of the tomographic 21-cm image cubes, such as in Chardin et al. (2019) and List & Lewis (2020), might prove useful to extend this work.

RESULTS
This section presents the inference results obtained with 21CMMC. Section 6.1 details the recovered parameter intervals using the 21cm tomographic statistics for both the and mock observations. Section 6.2 compares these results to the constraints inferred using the power spectrum of the 21-cm fluctuations. Finally, Section 6.3 explores the impact of using different initial conditions, and Section 6.4 investigates the consequences of including simulated foreground residuals, as one of the dominant systematic errors, on these results.

Inference using the ionized regions number-count and morphology
We first assess the performance of the 21-cm tomographic inference approach to recover the parameters of the sources of cosmic reionization. In Figure 5 and Figure 6, we compare the outputs of 21CMMC for the and models, respectively, considering three different cases based on the likelihood function defined in Eq. (12): (1) using the number of ionized regions observed (i.e the Poissonian factor); (2) using the morphology of the ionized regions (i.e the distance factor); and (3) combining (1) and (2). The results are shown as corner plots 15 , with the diagonal panels providing the normalized marginalized 1D PDFs of the three parameters, and the lower panels representing the 2D likelihood contours, at the 1 and 2 level (thick and thin lines, respectively). Additionally, Table 6.1 shows the recovered median values and the associated 16 th and 84 th percentile errors, assuming that our framework, and in particular the factor 1/ , decently traces the variance for the attribute values (see discussion in Section 5.2). In the following sub-sections, we detail the results of each inference case.

Comparing only the number of ionized regions
Using the number of ionized regions information already provides remarkably good constraints for the recovered ionizing efficiency and the minimal virial temperature for both reionization scenarios. We note that the median of the posterior distributions are slightly offset compared to the fiducial parameters (around +15 for and +0.2 for log 10 ( min vir ) for the , and around −25 for for the model), but these fairly accurate intervals already support that this information is useful to infer information about the ionizing sources. This is not unexpected since both the 15 Corner plots from Figures 5, 7 Table 2. The median values, and the associated 16 th and 84 th percentile errors of the recovered astrophysical parameters, , log 10 ( min vir ), and mfp for the two different reionization scenarios. We assumed 1000h of observations with SKA1-Low for three different epochs at = 10, 9 and 8. We compare the inference results using: only ionized regions number count; only the ionized regions morphology; both the ionized regions number-count and morphology; and the power spectrum.
values of and min vir drive the number of ionized regions. As discussed in Sect. 4.2, models with lower min vir have fewer ionized regions because the ionizing sources can only exist within the densest halos. Similarly, a larger will boost the number of ionizing photons produced by the sources, accelerating the pace at which ionized bubbles grow, and thus increasing the merging rate during the early stages. In both cases, the observed number of individual ionized regions should decrease. Consequently, this information can already rule out a large number of models that have diverging mod bub and obs bub . The 2D likelihood contours in Figure 5 show the degeneracy between the ionizing efficiency and the minimum mass of the star-forming halos. In the model, the recovered number of ionized regions does not significantly vary for models with larger and log 10 ( min vir ), while for the model, large fluctuations of the ionizing efficiency of the sources still produce a similar number of ionized regions, but small variations of min vir have a significant impact on obs bub . The degeneracy between and min vir is expected because they both impact the number of ionized regions in each model, which depends on the criterion used in Equation (2) ( × coll ) that sets the ionized fraction in each cell. In Section 7.1, we further demonstrate that the 2D likelihood contours of and min vir almost exactly follow the isocontours of the average × coll for a given reionization scenario.
We note that the number of ionized regions does not provide enough information to accurately infer the mean free path of the ionizing photons for both the and models. While the mfp fiducial value is within the recovered interval, the large fractional errors show that this parameter is not tightly constrained. In theory, mfp regulates the maximum scale to which an ionized region can grow around the ionizing sources, and becomes important only when the ionized regions grow larger than this value. Greig & Mesinger (2017) emphasized that its impact on the power spectrum is less significant when its value is larger than ∼15 Mpc because the merging of the ionized bubbles becomes the most dominant growth mechanism. Our result suggests that only models with values lower than 8 Mpc seem to be robustly ruled out. Overall, the impact of mfp fluctuations is likely too complex to recover for our experimental setup given the limited box size and low-resolution aspect of the data sets (see dicussion in Section 6.3).

Comparing only the ionized regions morphology
Comparing the morphological properties of the ionized regions also provides valuable information to constrain and min vir for both reionization models. For the model, the recovered ionizing efficiency of the sources is slightly offset (around +27), while the minimum virial temperature is more robustly inferred. On the other hand, is better constrained for the case, but min vir interval is shifted by around −0.14 (logarithmic). The recovered values are still at ±3 , however, given the particular definition of the likelihood function implemented, the robustness of these uncertainties should be taken with caution. Overall, these results suggest that the morphological properties of the ionized structures also provide independent information to recover the properties of the underlying reionization sources. The accurate inference of min vir is slightly surprising since the virial temperature regulates the number of ionizing sources formed by selecting the mass threshold of the halos that can form such sources. Therefore, it is unclear how this parameter is connected to the morphological properties of the sources. Nevertheless, the distance factor in Equation (12) ensures that the models with the best likelihood values have similar morphological pattern spectra than in the observation. Hence, models with different min vir values might be efficiently ruled out because they do not produce ionized regions with the same morphological spectra. In Section 7.1, we actually show that the ionized regions morphology is indirectly pre-determined by the average ionizing budget per baryon, which depends on both the values and min vir . We note in Figure 5 that the morphology of the ionized regions does not help to place a robust constraint on mfp , but can only be used to rule out models with mfp < 8 Mpc. In theory, fluctuations in mfp impact the volume of the ionized structures, such that using the size distribution of the ionized regions should provide information to infer this IGM property. Nevertheless, as mentioned already in Section 6.1.1, these differences might be too complex to accurately measure given our experimental setup.

Combining the number-count and morphology of the ionized regions
Finally, combining the number count and the morphological attributes of the ionized regions provides the best inference results for the joint set of values of and min vir for both the and models. The posterior distributions peak closer to the fiducial values, and have smaller 16 th and 84 th percentile errors. On the other hand, it does not improve the inferred interval of mfp , where the 2D likelihood contours are typically extended over the whole mfp prior range. In Section 6.3, we highlight that the recovered interval is strongly affected by the set of initial conditions. Nevertheless, we expect this outcome to be different for experiments where both the observation and the models are sampled on a larger grid because fluctuations in the mfp values should be more significantly observable based on the morphological properties or the number of the ionized structures.
Overall, Figures 5 and 6 highlight that the combination of the number and morphological attributes of the ionized structures extracted from 21-cm images provides an alternative approach to the power spectrum to disentangle different EoR scenarios, and constrain properties of the ionizing sources. This first test-case paves the way for more complex, and deeper explorations of these alternative approaches using Bayesian inference frameworks.

Comparison with the power spectrum
Several studies (e.g Greig & Mesinger 2015Park et al. 2019;Binnie & Pritchard 2019) showed that a Bayesian statistical framework using the power spectrum would already provide tight constraints on the EoR for different reionization parameters or models. However, the power spectrum is only a limited statistic, because it does not encode the phase information from the Fourier modes, nor the information about non-Gaussianities or non-ergodic effects. In theory, 21-cm tomographic images give a more complete description of the evolution of the ionizing fronts and probe additional astrophysical properties through the non-gaussianities of the 21-cm signal. In this section, we simply explore how the constraints inferred using the 21-cm tomographic statistics compare to the ones set by the power spectrum of the 21-cm observations. In Figures 7 and 8, we compare the inferred 1D PDFs and 2D likelihood contours when using the power spectrum or the ionized regions number count and morphological spectra for the and models, respectively. Additionally, the corresponding medians and 16 th and 84 th percentile errors can be found in Table 6.1. Overall, the power spectrum provides tighter constraints on and min vir compared to the 21-cm tomographic statistics. These results are not unexpected since, as shown in Appendix B, the theoretical uncertainties on the mock power spectrum used in this work are relatively small at most of the modes sampled, which strongly rules out all models that do not closely reproduce its shape. Additionally, the power spectrum is extracted directly from the simulated brightness temperature fluctuations in the Fourier space, while the ionized morphological attributes are extracted from noisy, lower resolutions, image cubes. Hence, these tomographic statistics are more sensitive to the quality of the 21-cm images, which depends on the additional processing steps, such as the choice of the smoothing kernel, the uv weighting scheme, or the robustness of the ionized regions extraction method.
We note that using the power spectrum only provides a fairly accurate interval for mfp for the case, but not for the case where the median recovered value is at more than 3 than the fiducial value. This is likely because our experiment is significantly impacted by sample variance due to the small box sizes used. In Figure 9, we show the spherically averaged power spectrum extracted at redshift = 8 from the mock observation of the set of parameters, the power spectrum from the sampled model with the same fiducial set of parameters (but different initial conditions), and the power spectrum from the model with the fiducial and min vir but mfp = 22. It highlights that the power spectrum of the model with the same set of fiducial parameters significantly differs at the low modes (which probe the large scales) compared to the mock observation, such that models with a larger mfp (22 Mpc) better reproduce the power spectrum of the mock observation. Typically, the mfp fluctuations dominantly affect the large scales (as shown in Figure 2 of Greig & Mesinger 2015), which can be strongly affected by statistical variance in boxes with limited size (Kaur et al. 2020). This caveat is further investigated in Section 6.3, where we compare the outputs of 10 MCMC runs with different sets of initial conditions.
Overall, these results suggest that using the power spectrum provides tighter constraints on the ionizing efficiency and the minimum mass of the star-forming halos compared to statistics derived on noisy, lower resolution 21-cm images. While this might question the use of these statistics to infer the astrophysics of reionization, we show in Section 6.4 that 21-cm tomographic statistics are more robust to the presence of residual artifacts in the observation. Additionally, we note that we compared, but never combined both statistics within the same inference analysis. This is because these approaches are not entirely independent from each other since they both encode for example the typical scale of the ionized structures. We discuss in Section 7.2 how alternative 21-cm tomographic or higher-order statistics could be further combined to a power-spectrum analysis to improve the inference results.

Impact of different initial conditions
In 21CMFAST, the set of initial conditions determines the underlying density field, which is then used to compute the brightness temperature fluctuations at different redshifts. In this work, we fixed the initial conditions of the sampled models a priori, but choosing it different than the ones used to create the mock observations. A more robust inference framework would consist in varying the initial conditions as we sample the models during the inference, nevertheless, this is computationally costly as it requires to re-compute the evolved density field for each different set of parameters. As already mentioned in the previous section, the choice of the initial conditions can lead to significant differences when the box size is limited. Recently, Kaur et al. (2020) quantified the error using a mock power spectrum extracted from a box of length 1.1 Gpc and showed that boxes larger or equal than 250 Mpc are needed to achieve convergence within 1 level of the noise. For our experiment, both the mock observation and the models are extracted on boxes with this limited physical size, such that we might be more sensitive to this statistical variance. Hence, in this section, we aim to quantify the impact of sample variance on the previous results by running ten different 21CMMC runs using ten different sets of initial conditions for the sampled models. Because running these tests is computationally expensive (around seven days per run), we perform this test only for the model, and use a lower number of iterations for the MCMC runs.
The top and bottom panels of Figure 10 show the results of our experiment using the ionized region number count and morphological spectra, and using the power spectrum, respectively. For clarity, we only show the 1D posterior PDFs, since the 2D likelihood contours do not provide useful information. The black line in each figure represents the average of the ten 1D PDFs. We note that these averaged PDFs are not physically motivated, as they do not represent the combination of ten independent results (for example    (purple). The differences at low k (large scales) for the same set of parameters but different initial conditions suggest that our experiment is significantly impacted by sample variance. by observing ten different fields), but simply summarize the typical impact of varying initial conditions. Additionally, we report in Appendix D the median and 16 th and 84 th percentiles for all ten cases.
Overall, the top panels in Figure 10 support the findings from Section 6.1. The ionized morphological and number count information provide fairly accurate constraints for and min vir , such that the interval of median values recovered from the posterior distributions in each case is between 28.28 and 49.38 for the former, and between 5.0 × 10 4 and 9.0 × 10 4 K for the latter. Interestingly, these values are all within 1.5 of the results presented in Section 6.1, suggesting that systematic errors due to sample variance are consistently contained within the width of the recovered parameter interval from independent runs. We also note the degeneracy between the ionizing efficiency and the minimum virial temperature, already highlighted in Section 6.1, such that a larger recovered is typically paired with a larger min vir . The origin of this degeneracy is further discussed in Section 7.1.
Additionally, the top panels in Figure 10 emphasizes that the width and the peak of the posterior distribution of mfp significantly vary depending on the set of initial conditions when using the likelihood function based on the number and morphological spectra of the ionized regions. On average, models with very low or very large mfp seem slightly more disfavored, but the large percentile errors prevent us from setting a robust constraint to this IGM property in all individual cases.
When considering the power spectrum, Figure 10 emphasizes that varying the initial conditions has little impact on the recovered ionizing efficiency and minimum virial temperature. The interval of the lower and higher recovered median values is within 29 and 31 for and 4.7 × 10 4 and 5.5 × 10 4 for min vir , and all results are within 1.5 from the interval obtained in Section 6.1. This suggests that the constraints on both parameters are very robust against statistical variance. Nevertheless, this is not the case for mfp . The ten posterior distributions have significant differences depending on the set of initial conditions used, such that the average interval of accepted values is within ∼ 14 and 25 Mpc. This is consistent with Greig & Mesinger (2015) who show that this parameter has a stronger impact on the growth of the ionized regions for small values, while  Table 3. The median values, and the associated 16 th and 84 th percentile errors for the three astrophysical parameters, , log 10 ( min vir ), and mfp for the modified model ( = 30, log 10 ( min vir ) = 4.7 and mfp = 15 Mpc) perturbed with a 3D Gaussian Random Field (see Figure 11). it leads to little difference when larger than ≈ 15 Mpc. The results of the ten runs suggest that this parameter is adjusted to match the realization of the large scale structure from the mock observation.
As discussed before, this caveat can be solved by using a larger grid for both the mock observations and the models sampled, to reduce the statistical variance impacting the large scales. Additionally, it shows that 21-cm tomographic statistics, albeit extracted from noisy and lower resolution images, are still fairly robust against sample variance effects. However, the constraints inferred using the power spectrum seems more precise than using the number and morphological properties of the ionized regions. Nevertheless, we show in the next section that the power spectrum is likely more sensitive to the presence of foreground residuals compared to statistics extracted from 21-cm images, and thus could be less accurate.

Impact of foreground residuals
In Sections 6.2 and 6.3, we showed that 21-cm tomographic statistics provide an alternative approach to the power spectrum to disentangle different reionization scenarios. However, using the power spectrum sets tighter constraints on the parameters of the reionization sources. As we already discussed, this is likely because the 21-cm tomographic statistics only account for the properties of the ionized regions, and are extracted on lower resolution data sets that require additional processing steps. However, 21-cm signal images provide a direct measure of the ionized structures and their properties, while the power spectrum is an averaged statistics. Consequently, the presence of diffuse structures locally have a limited impact on the shape of the ionized regions, while it will affect the power spectrum on large scales. In this section, we investigate this effect using simulations of three dimensional Gaussian random fields (GRF) having a power spectrum consistent with a power-law k with = −2.0 (consistent with observations of large scale diffuse Galactic emission using the Amsterdam-ASTRON Radio Transient Facility and Analysis Centre (AARTFAAC); Gehlot 2019) and a brightness temperate variance of 28.9 mK. We add these GRF to the three mock observations of the model at redshifts 10, 9 and 8. These GRFs can be seen as diffuse structures that have not been correctly removed during the foreground subtraction. Figure 11 shows a 2D slice of one of the 3D GRF realization, and the mock power spectrum extracted from the modified observations. The contaminated power spectra are shifted towards larger values, and this effect is stronger for larger bins. This is expected since adding a GRF will enhance the variance if the field is uncorrelated with the 21cm field. We note that our simulation of the GRF assumes that it fluctuates similarly in the frequency and spatial domain. The latter is not very realistic since these diffuse residuals are typically synchrotron spectra, which are frequency coherent. Nevertheless, it still provides a decent test-case to investigate the impact of foreground contamination.  Figure 10. Top: the 1D marginalized PDFs for the three astrophysical parameters , log 10 ( min vir ) and mfp using the likelihood function based on the number count and morphology of the ionized regions and the mock observations. Bottom: same but using the likelihood function based on the power spectrum. We compare ten MCMC runs corresponding to ten different sets of initial conditions (thin color lines). The black thick line corresponds to the average of the ten runs.  Figure 12 shows the result of the MCMC runs using either the number of ionized structures observed and their morphology or the power spectrum of the 21-cm observations. Additionally, we report in Table 6.4 the median values and associated 16 th and 84 th percentile errors. Figure 12 shows that the recovered and min vir are shifted towards larger values for both cases. This suggests that the presence of diffuse structures likely suppress the information lying in the small scale structures, such that we recover fewer but brighter sources. Overall, this bias is smaller when using the statistics extracted on 21-cm images. The recovered intervals for and log 10 ( min vir ) are offset by about +10 and +0.1 compared to the constraints obtained in Section 6.1, and the fiducial values are still within 1.5 of the posterior distributions suggesting that the inferred astrophysics is still consistent with the mock parameters. On the other hand, the intervals recovered using the power spectrum are shifted by more than 3 (+40 and +0.47 for and log 10 ( min vir ), respectively), which could lead to an incorrect astrophysical interpretation. This larger bias can be explained by the fact that the power spectrum is a spherically averaged metric, hence globally sensitive to contamination affecting the contrast of the 21-cm intensities. Typically, adding a GRF to the observation will always bias the power spectrum upward if the field is uncorrelated to the 21-cm field. On the other hand, it acts like a correlated noise in the tomographic images, with both up and down fluctuations with zero mean, and thus do not necessarily bias the shape of the PDF of the 21-cm intensities, such that the number and morphological attributes of the ionized regions can still be robustly extracted using the same approach.
We note that we must remain careful when interpreting the derived uncertainties from the 16 th and 84 th percentile errors. As mentioned in Section 5.2, the likelihood function for the 21-cm tomographic statistics does not include additional variance terms aris- Ionized regions number count and morphology Power Spectrum 1σ 2σ Faint galaxies model fiducial parameters Figure 12. The 1D marginalized PDFs (diagonal panels) and 2D likelihood contours (lower panels) of the three astrophysical parameters , log 10 ( min vir [ ]) and mfp (Mpc) for the model assuming 1000h observations with SKA at redshifts, 10, 9 and 8. The dashed lines show the fiducial parameters ( = 30, log 10 ( min vir [ ]) = 4.7 and mfp = 15). We compare the outcomes of 21CMMC using the power spectrum or the number and morphological spectra of the ionized regions.
ing from the expected fluctuations of the morphological attributes due to sample variance or thermal noise. Similarly, including an additional 20% modelling uncertainty for the power spectrum, as it was done in Greig & Mesinger (2015), might result in larger recover parameter intervals, such that the fiducial values are within 3 . Nevertheless, this would not affect the observed biases of the recovered intervals, which itself supports that 21-cm tomographic statistics are likely more robust to the presence of foreground residuals.
We briefly note that, similar to the results shown in Sections 6.1, 6.2 and 6.3, mfp is still poorly constrained in both cases. This is expected because of the extra level of complexity of this test.
Overall, the interest of 21-cm tomographic statistics to further constrain the EoR is thus well illustrated by this section. These results highlight that they likely provide more accurate, although less precise, statistics than the power spectrum regarding the presence of potential foreground contamination. This is key information for future observational studies, and we discuss this further in Section 7.2.

DISCUSSION
We further discuss here the results presented in Section. 6. Section 7.1 investigates the connection between the morphological pattern spectra of the ionized regions and the astrophysics of the reionization. Section 7.2 discusses the potential of 21-cm tomographic statistics for future observational studies. Finally, Section 7.3 considers the limitations of our experiment.

The ionized regions morphology probes the average ionizing budget per baryon
In this section, we investigate which properties, among the number of ionized regions or their morphological attributes, contain most of the constraining power to infer the astrophysical parameters. To do so, we extract these statistics for a large number of models sampled within the parameter space used in this work, and using a grid interval of 5, 0.1, and 2, for , log 10 ( min vir ), and mfp , respectively. In total, this corresponds to 11781 models for which we created realistic mock observations at = 10, 9 and 8, and extracted the ionized regions number count, volume, sparseness, non-compactness, elongation, and flatness. Figure 13 shows the color maps corresponding to these six statistics as a function of and log 10 ( min vir ). To do so, we marginalize the values over mfp by taking the mean over all 11 mfp realizations for each pair of and log 10 ( min vir ). Except for the number of ionized regions, we focus on the average of the morphological attributes over all the ionized structures. We note that this is slightly different from the likelihood framework (Eq (12)) which compares the distributions in a point by point manner. Additionally, we show on the figure the isocontours (dashed black lines) of the average number of ionizing photons emitted in the IGM per baryon at a given redshift. Typically, the number of ionizing photons emitted in the IGM per baryon, ion/b , is a time-dependent quantity, which relates both to the ionizing efficiency of the sources (considered independent from time or halo mass in this work), and the minimum .
Hence, the average number of ionizing photons emitted in the IGM per baryon at a given redshift , referred to as ion/b, , can be approximated using the mean collapse fraction in the box (averaged over all voxels). In 21CMFAST, the collapse fraction is estimated for different excursion sets corresponding to different scales . Nevertheless, by construction, the average coll should match the Press-Schechter collapse fraction ( P−S coll ) defined for a redshift and a minimum halo mass min . Thus, we derive ion/b, at a given redshift using: The isocontours in Figure 13 correspond to the evolution of ion/b, as a function of and log 10 ( min vir ). The white regions in the different panels are the sets of parameters where the ionized fraction in the box is too low to robustly estimate the number of ionized regions (typically ion < 0.05). These models have large min vir such that too few ionizing sources are formed and ion/b, is less than 0.05. On the other hand, models with ion/b, > 1.5 have few objects, an average volume larger than 10 6 Mpc 3 , and E and F close to 1, suggesting that the universe is dominated by a single percolated ionized region. The models with the most "extreme" morphological attributes (large E, F , N or S) are located around the isocontour ion/b, = 1. The peculiar morphology of the ionized regions in these models is explained by the fact that this isocontour traces a reionization stage just before percolation. At this point, the ionizing field consist of a few large regions (a few tens in number, with an average volume between 10 2 − 10 3 Mpc 3 ), strongly porous and eccentric, which are on the brink to fuse into a single, percolated cluster spanning the whole box.
Overall, Figure 13 emphasizes that the number of ionized regions and their morphology are strongly correlated with the isocontours of ion/b, . Nevertheless, despite a few exceptions, these statistics remain fairly constant along these isocontours, which they are only sufficient to constrain the value of ion/b, that contains the correct astrophysics, but might not be enough to disentangle the correct pair of and min vir on this contour. On the other hand, Figure 13 shows that the shape and location of the isocontours of ion/b evolve with redshift. Hence, combining the information about the ionized regions morphology at different redshifts can indirectly trace the time evolution of the ionizing budget per baryon, and thus provides additional information to recover and min vir . We test for this hypothesis by deriving the relative difference in ion/b , noted err ion/b , for each pair of ( , min vir ), relatively to the ion/b of the fiducial models. Hence, err ion/b is defined as: where mock model ion/b, and ion/b, ( , min vir ) are the average number of ionizing photons emitted in the IGM per baryon at a given redshift for the fiducial model ( or ), and for a given pair of and min vir , respectively. Hence, err ion/b traces the difference in the time evolution of the ionizing budget for all the models sampled relative to the fiducial models considered in this work. We present the evolution of ( err ion/b ) as a function of and min vir for the or cases in Figure 14, and additionally display the 2D likelihood contours obtained for both models in Section 6.1 (shown in Figures 5 and 6) with black solid lines. We note that the 2D likelihood contours of the inference analysis are consistent with the regions having the lowest ( err ion/b ) which characterize the parameter space where the difference in the time evolution of the ionizing budget is minimum. Additionally, the degeneracy between these parameters observed in Section 6.1 is directly consistent with the shape of these regions. Therefore, this analysis highlights that the evolution of the ionized regions number count and morphological spectra at different redshifts likely provides the strongest constraints on and min vir , because it traces ion/b . Thus, it supports that these tomographic statistics give robust and independent insights about the physical process of reionization. Koopmans et al. (2015) and Mellema et al. (2015) pointed out the potential of the synergy between the power spectrum and alternative approaches based on 21-cm tomographic images. Recent studies have further shown that the use of higher-order, topological and morphological statistics could provide valuable additional information to disentangle the astrophysics of the EoR (Section 1 and Greig 2019). Several of them emphasized that these approaches would be especially useful to break the degeneracies between several astrophysical models having a similar power spectrum (e.g. Kakiichi et al. 2017;Giri et al. 2018a;Kapahtia et al. 2019). In this work, we showed how these 21-cm tomographic statistics could be used in a Bayesian inference framework to disentangle two different reionization scenarios, and that these constraints are theoretically more robust to the presence of residual foregrounds than when using the power spectrum (see Section 6.4). Hence, independently combining both approaches can already provide valuable insights into the presence of contamination in the observation if their outcomes are inconsistent.

Combining power spectrum and tomographic analyses
This first test-case supports the importance of using both approaches to achieve the most robust scientific results and paves the way for further studies investigating the synergy between power spectra and tomographic analyses. In this work, we focused on an intuitive formalism related to the shape or number of ionized regions and emphasized that these statistics trace the time evolution of the ionizing budget per baryon, thus providing already crucial insights on the astrophysics of the EoR. The use of more advanced methods, encoding, for example, a topological analysis of the ionization field (Elbers & van de Weygaert 2019) could help to extend this analysis further. Additionally, alternative statistics that only encode the non-gaussianities of the 21-cm signal, such as the bi-spectrum (e.g Watkinson et al. 2019;Hutter et al. 2020) or the skewness and kurtosis of the 21-cm PDF (Banet et al. 2020), could be more easily combined to the power spectrum within the same MCMC analysis, because they provide independent information. This is less trivial for morphological or topological analysis of the ionizing field, where additional steps would be required to ensure that the information they provide is truly disconnected from the power spectrum.
Hence, further theoretical work is required to investigate the optimal strategy to combine both approaches, to prepare for upcoming observational studies with SKA and HERA.

Limitations of using tomography
To investigate the use of the 21-cm tomographic statistics to recover the astrophysics of the EoR, we made a few assumptions to either simplify the computation of the 21-cm signal or its analysis. One might thus ask: how dependent are our results from the typical assumptions made? We discuss here the potential caveats arising from these assumptions.
In this work, we ignored the effects of redshift space distortion (RSD). RSD artificially compresses and expands the high and low-density regions, respectively (Kaiser effect;Kaiser 1987). This effect has been studied in Giri et al. (2018a), where the authors showed that it shifts the ionized bubbles size distribution towards smaller values. This is expected from the typical inside-out reionization topology where the ionized regions grow first on the highest density regions, thus are compressed due to the RSD effects. In Giri et al. (2018a), the authors highlight that accounting for these effects has a limited impact on the final results. However, if the data were affected by RSD, they must be included in the simulations to consistently compare the distribution of the number-count and morphological attributes between the observation and the models.
Additionally, our experimental setup considers co-eval cubes for both the mock observations and the models. However, true observations consist of lightcones, such that the line of sight axis can cover a large range of redshifts. The impact of lightcone effects have been discussed several times in the literature (e.g. Datta et al. 2012Datta et al. , 2014Greig & Mesinger 2018). Giri et al. (2018a) investigated its impact on the ionized bubble size distribution (BSD), and showed that it shifts the ionized regions size towards larger values. However, Datta et al. (2014) showed that the impact of lightcone effects can be mitigated by using a limited bandwidth observation, less than Δ = 10 MHz (Δ = 0.5). Thus, similarly to the RSD effects, we do not expect this issue to be a major obstacle for future 21-cm tomographic analyses.
Another concern is related to the absence of physically motivated foregrounds in our mock observations. Perfect foreground removal is typically assumed in all theoretical papers discussing 21cm tomographic statistics. This is because, in theory, foregrounds properties are significantly different (e.g smooth, or statistically independent) from the properties of the 21-cm fluctuations. Nevertheless, in practice, perfectly removing all foreground contamination is a very complex task, and artifacts can remain after applying foreground removal methods. In Section 6.4, we briefly discussed the possible impact of foreground residual but limited our analysis to a specific case using Gaussian random fields with a given power spectrum shape. This test-case already showed that statistics extracted on 21-cm images are likely more robust to the presence of these artifacts than when using the power spectrum. However, additional studies with more complex physical foregrounds models, are needed to further investigate their impact on power spectra or tomographic analyses. We note however that significant improvements have been made regarding foreground removal algorithms. For example, Chapman et al. (2013) used a theoretical experiment based on the LOFAR sensitivity to show that a Generalized Morphological Component Analysis approach (GMCA; Bobin et al. 2013) would already perform well to recover the 21-cm images from foreground contaminated observations. In fact, Hothi et al. (2020) show that the use of Gaussian Regression analysis tools (GPR Mertens et al. 2018) performs better than GMCA when including the PSF and the noise in the mock observations. Hence the use of already available tools, and the development of futures foreground removal methods, optimal for tomographic applications, will be the key to mitigate the caveats arising from these contaminants.
Finally, the results presented in this work are strongly dependent on the validity of the S >> CMB (post-heating) hypothesis. Indeed, the ionized regions in the 21-cm tomographic images can be robustly extracted, even in the presence of noise or additional smoothing, because they imprint a clear peak in the low-end distribution of the 21-cm intensities. However, this is more complex when the spin temperate is not well heated above the CMB temperature because the presence of low-intensity structures also arise from regions where S = CMB . Getting rid of this assumption would allow us to probe higher redshifts, constrain epochs where the EoR and the Epoch of Heating overlap (e.g. as in Greig & Mesinger 2017), and more generally, explore more complex reionization models. Nevertheless, it would require to adopt a different approach to probe the evolution of the morphological information lying in the 21-cm fluctuations, for example, by considering the 21-cm field morphological spectra as a function of several threshold sets within the 21-cm intensity range (similar to Kapahtia et al. 2019). Such strategy would be simple to implement using the connected component tree framework of DISCCOFAN, but would require to re-define the likelihood function in 21CMMC, which is not a trivial task for this approach. Consequently, dismissing this assumption is likely the more challenging obstacle to overcome to extend this analysis to higher redshifts.
Overall, this work paves the way for further theoretical studies investigating the robustness of 21-cm tomographic statistics using more complex reionization parametrization and observations including both physically motivated foregrounds and additional instrumental artifacts. The ability of 21-cm images to probe the astrophysics, as shown in Section 6.1, and their apparent better robust-ness to residual contamination (Section 6.4), support that they are promising for future observational studies.

CONCLUSIONS
We have examined the use of statistics extracted from 21-cm tomographic images to constrain the properties of the reionization sources using a Bayesian statistical inference framework. We defined an intuitive formalism based on the number count and morphological pattern spectra of the ionized regions, encoding their eccentricity, porosity and asymmetry. We implemented a likelihood function within 21CMMC (Greig & Mesinger 2015), a Markov Chain Monte Carlo analysis tool, to quantify the astrophysical constraints set by these 21-cm tomographic statistics. We used a reionization model with three parameters: the ionizing efficiency of the sources ( , assumed constant), the mass of the star-forming halos derived from the minimum virial temperature ( min vir ), and the mean free path of the ionizing photons in the IGM ( mfp ). We created mock observations of 128 3 cells with 2 Mpc resolution (0.735 arcmin) based on two reionization scenarios dominated by ( = 30, min vir = 5.0 10 4 K, and mfp = 15 Mpc), and ( = 200, min vir = 3.0 10 5 K, and mfp = 15 Mpc), similarly as in Greig & Mesinger (2017). We included the noise profile and point spread function of the SKA1-Low, assuming 1000 hours of observation, a maximum baseline of 2 km, and three snapshots at redshift 10, 9,and 8. Our results can be summarized as follow: • The morphological properties of the ionized regions provide valuable information to trace the reionization history (Section 4.1) or differences between different reionization scenarios (Section 4.2). Additionally, these morphological differences are more significant in the largest structures, such as the ionized percolated region that typically dominates the late reionization stages. This supports that crucial astrophysical insights can already be extracted from the morphology of a single, large ionized structure.
• The ionized regions number-count and morphological spectra, incorporated in a Bayesian inference framework, can disentangle reionization scenarios with different ionizing efficiency of the sources of reionization and minimum mass of the star-forming halos (Section 6.1). This is because these statistics indirectly probe the time evolution of the average number of ionizing photons emitted in the IGM per baryon, which is fixed for a given pair of and min vir (Section 7.1). This supports the use of 21-cm tomographic statistics as an alternative approach to the power spectrum to recover the astrophysics of the ionizing sources.
• Using the power spectrum of the 21-cm fluctuations provides the tightest constraints for and min vir , and is slightly less sensitive to different sets of initial conditions than the number and morphology of the ionized regions (Sections 6.2 and 6.3). Nevertheless, statistics extracted from realistic mock 21-cm tomographic observations are more robust to the presence of diffuse foreground residuals in the observations than using the power spectrum (Section 6.4). This is shown by the presence of a larger bias in the recovered parameter intervals when using the power spectrum ( Figure 12). Comparing the outcomes of power spectra and tomographic analyses is useful to check the consistency of the recovered astrophysical parameters, and can provide valuable information about the presence of residual structures in the observation (Section 7.2).
• The mean free path of photons, mfp , can not be accurately inferred using our experimental setup, both when using the power spectrum or the tomographic statistics (Section 6.2). This is because the mock observations and the models sampled have a limited box size (250 Mpc), such that the large scales, where fluctuations in mfp have a significant impact, are dominated by statistical variance (Figure 9). This is supported by the results of Section 6.3, where we showed that varying the initial conditions has a significant impact on the recovered mfp value, both for the power spectrum and the tomographic statistics.
This work emphasizes that 21-cm tomographic statistics can be successfully incorporated within a Bayesian statistical inference tool, and provide a robust alternative approach to the power spectrum. Additionally, it highlights the importance of combining both approaches in upcoming observational studies with the SKA to achieve the best scientific results. In further exploratory studies, we plan to assess the robustness of these results using more complex reionization parametrizations, by removing the post-heating assumption ( S >> CMB ), and generating mock observations including physically motivated foreground contamination. Overall, this paper supports that the use of 21-cm images and, in general, the morphological properties of the ionized structures, provide key information to the physical process of reionization.

APPENDIX A: COMPARISON OF METHODS TO EXTRACT THE IONIZED REGIONS FROM MOCK TOMOGRAPHIC OBSERVATIONS
In Section 4.3, we briefly mentioned the technique used in this work to extract ionized regions from noisy observations. We provide here more details about the different thresholding methods. Recently, Giri et al. (2018b) proposed a new approach to optimally identify the ionized regions from realistic observations, using a superpixel thresholding approach, SLIC, introduced in Achanta et al. (2012). While this method shows good performance for their application, the SLIC approach is computationally expensive, which makes it less suitable for Bayesian inference frameworks.
On the contrary, we found the Triangle thresholding approach (Zack et al. 1977) to perform well on the typical noisy image cubes we deal with in this work. This thresholding method works on the histogram of the values in the image. It constructs a line between the maximum and minimum brightness value in the histogram, and computes the distance between the line and the histogram for all values in the interval. The threshold is derived as the brightness value where the distance is maximal.
In Figure A1, we compare the performance of four approaches, the SLIC method, the k-means clustering used for example in Kakiichi et al. (2017), the Triangle thresholding applied on the 3D cube (Triangle3D), and the same approach but applied on the 2D slices, and the final threshold taken as the median of list of thresholds obtained that way (Triangle2D). To do this, we used several co-eval cubes of 128 3 cells with ≈ 2 Mpc resolution, with different neutral fractions in the box. We added a noise cube corresponding to 1000 hours observations with SKA1-Low at = 10, and smoothed the final cube assuming a resolution corresponding to maximum baseline of 2 km (the rms of the noise after smoothing is 2.5 mK). The left panel compares the recovered volumetric ionizing fraction rec with respect to the fiducial ionizing fraction in the box. The middle panel shows the evolution of the correlation coefficient ( Cramer 1946) between the recovered binary fields using the different methods, and the binary field from the fiducial simulations extracted from the ionization fraction box taking > 0.5 as a threshold to determine if a pixel is ionized or not. The right panel shows the execution time to extract the ionized regions as a function of the fiducial ion .
Overall, the correlation coefficient shows that the Triangle2D approach performs better than the other methods, while its execution time is lower than 0.1 second. Interestingly, using the triangle thresholding approach on the 2D slices and taking the median works better than applying it directly on the 3D data. SLIC performs slightly better than Triangle2D, but is two order of magnitude slower. Note that the performance of SLIC is not as good as in Giri et al. (2018b), where the authors report e.g a of 0.6 at = 0.9, while we derived = 0.45 for the same ionizing fraction. However, the authors have used different box size (250 3 cells with 1.38 cMpC resolution), hence results are not directly comparable. Therefore, the Triangle approach is more adapted for our framework, but we urge caution about generalizing these results to other applications. Figure B1 shows the 21-cm power spectrum of the ( = 30, log 10 ( min vir [ ]) = 4.7 and mfp = 15 Mpc) and ( = 200, log 10 ( min vir [ ]) = 5.48 and mfp = 15 Mpc) mock observations (Mesinger et al. 2016) at the three co-eval redshifts used in this work. The total uncertainty on each power spectrum is shown by the shaded areas around the power spectra, and is based on 1000 hours of observations with SKA1-Low, derived using the method described in Section 5.1. As seen in this figure, the theoretical uncertainty is particularly small for most of the bins between 0.1 and 0.4 Mpc −1 . This suggests that differences between several reionization models are particularly significant at the large scales (low ), and that this information can be used to disentangle different reionization scenarios. However, as discussed in Section 6.4, the presence of contaminants, such as residual diffuse structures in the observation, can have a strong impact on the observed power spectrum, biasing it upward if the structures are uncorrelated to the 21-cm field.

APPENDIX C: LIKELIHOOD FOR N-DIMENSIONAL DISTRIBUTIONS
In this section, we provide a general framework to define an approximate likelihood estimator suitable to compare two distributions of vectors in a N-dimensional space, with N typically larger than 3. This formalism accounts for the fact that might fluctuate for different observations, and its fluctuations relate to the parameters of the models to be inferred. We define two distributions A and B, with A and B observed structures, respectively. The first factor of the likelihood function is a Poissonian factor that compares A to B , assuming that A is the number of structures observed, and B is the expectation value derived from the model: The second factor must provide a similarity metric to compare the distribution of N-dimensional vectors from A and B. The strategy consists in finding, for each vector ì A in A, the most similar vector ì B in the distribution B. This can be achieved using the  Mahalanobis distance metric, (Mahalanobis 1936), which is particularly suited to compare points from N-dimensional distributions Aggarwal et al. (2001). We then sum over all the minimum Mahalanobis distances for the vectors in the distributions A, which can be expressed as A arg min ∈ B (d N maha (ì A, , ì B, )). Intuitively, this can be understood as looking how closely the distribution B represents the observations in A. Since the Poisson factor accounts for fluctuations in the number of structures in A and B, we need to ensure that this second factor is independent to the number of structures observed in the data and model. To picture this normalization problem, we consider the factor d N maha as a volume in a N-dimensional space. The total volume occupied by the observations in B should be divided by B , such that the volume per observation in the distribution is proportional to 1/ B . Thus, if B gets larger than A , the sum of N ℎ , which encodes the minimum volume between the observations in A and B, will reach lower values because of 1/ B dependence. To compensate for this effect, we multiply the sum of d N maha by B such that the final factor is independent from B . Similarly, we add a factor 1/ A to ensures that the sum of the distances is also normalized with respect with the number of objects in the distribution A. Hence, this second factor, referred to as the distance factor, is expressed as: We additionally include a regularization factor to adjust the rela-tive importance of each term, such that, larger and smaller values respectively increase and decrease the weight on the factor that compares the N-dimensional vectors. The final expression of our maximum likelihood estimator function is defined as: We note that this equation is not strictly speaking an exact likelihood function. It suffers a few caveats, as it is not normalized, and does not encode potential systematics related to the variance of the measurement of the N-dimensional vectors in A and B. The presence of the regularization factor makes it closer to a regularized function. However, it provides a suitable framework for comparing the distributions of N-dimensional vectors, where can significantly fluctuate, and additionally includes the information lying in the distribution outliers, which are typically not accounted for in classical distribution metrics (e.g. mean or standard deviation). This strategy can be less effective when N is large and the vectors of observations extracted from A or B are subject to noise, because distributions with outliers will be penalized by the factor N ℎ . However, we also tested different cases using K ℎ , with K < N, to reduce this caveat. We found that the best results were achieved when K = N.