Bubble mapping with the Square Kilometre Array – I. Detecting galaxies with Euclid, JWST, WFIRST, and ELT within ionized bubbles in the intergalactic medium at z > 6

The Square Kilometre Array (SKA) is expected to provide the first tomographic observations of the neutral intergalactic medium at redshifts z > 6 and pinpoint the locations of individual ionized bubbles during early stages of cosmic reionization. In scenarios where star-forming galaxies provide most of the ionizing photons required for cosmic reionization, one expects the first ionized bubbles to be centred on overdensities of such galaxies. Here, we model the properties of galaxy populations within isolated, ionized bubbles that SKA-1 should be able to resolve at z ≈ 7–10, and explore the prospects for galaxy counts within such structures with various upcoming near-infrared telescopes. We find that, for the bubbles that are within reach of SKA-1 tomography, the bubble volume is closely tied to the number of ionizing photons that have escaped from the galaxies within. In the case of galaxy-dominated reionization, galaxies are expected to turn up above the spectroscopic detection threshold of JWST and ELT in even the smallest resolvable bubbles at z ≤ 10. The prospects of detecting galaxies within these structures in purely photometric surveys with Euclid, WFIRST, JWST, or ELT are also discussed. While spectroscopy is preferable towards the end of reionization to provide a robust sample of bubble members, multiband imaging may be a competitive option for bubbles at z ≈ 10, due to the very small number of line-of-sight interlopers expected at these redshifts.


I N T RO D U C T I O N
In the currently favoured view of galaxy-dominated reionization, large ionized bubbles in the intergalactic medium (IGM) will first appear around overdensities of galaxies, progressively grow, and finally coalesce (for recent reviews, see Loeb & Furlanetto 2013;Barkana 2016;Mesinger 2016;Dayal & Ferrara 2018). Upcoming observations of the redshifted 21 cm signal from the neutral IGM will open a new window on this process, and existing constraints from the high-redshift galaxy luminosity function, from the cosmic microwave background radiation and from quasar absorption systems, can be used to forecast the viable range of 21 cm signals from E-mail: erik.zackrisson@physics.uu.se neutral hydrogen in the reionization epoch (e.g. Kulkarni et al. 2016;Greig & Mesinger 2017a;Hassan et al. 2017;Mirocha, Furlanetto & Sun 2017).
While current interferometers are limited to detecting the 21 cm signal in a statistical sense (for instance the 21 cm power spectrum), phase one of the Square Kilometre Array (hereafter SKA-1) will be able to resolve physical scales down to 5-10 comoving Mpc in the plane of the sky and the corresponding physical scales along the line-of-sight (frequency) direction at z ≈ 6-10 (Mellema et al. 2015;Wyithe, Geil & Kim 2015;Datta et al. 2016;Mondal, Bharadwaj & Datta 2018;Mondal et al. 2019). This will for the first time allow tomography (three-dimensional imaging) of the 21 cm signal.
It is already well established that 21 cm data correlated with galaxy surveys can provide powerful constraints on reionization scenarios (Wyithe & Loeb 2007b;Lidz et al. 2009;Wiersma et al. 2013;Park et al. 2014;Hasegawa et al. 2016;Hutter et al. 2016;Sobacchi, Mesinger & Greig 2016;Vrbanec et al. 2016;Hutter, Trott & Dayal 2018). However, most of the studies in this field have focused on the prospects of wide-field (and therefore comparatively shallow) galaxy surveys, whereas relatively little effort has been devoted to the prospects of deep, small-field surveys that focus on uncovering the galaxy content of individual ionized bubbles [but see Beardsley et al. 2015;Geil et al. 2017, for discussions on how to combine MWA/HERA/SKA data with Wide-Field Infrared Survey Telescope (WFIRST) and James Webb Space Telescope (JWST) data this way].
Here, we will use relatively simple simulations to explore what one can hope to learn by combining an SKA-1 measurement of the dimensions of an individual ionized IGM bubble with a photometric/spectroscopic galaxy survey of its content using upcoming telescopes like the JWST, Euclid, WFIRST, or the Extremely Large Telescope (ELT). By mapping the galaxy content of individual, relatively isolated bubbles, it may be possible to assess the ionizing photon budget within these regions and constrain the role of the galaxies detected inside, in a more direct way than what can currently be done for the photon budget of ionized regions surrounding z 6 Lyman α emitters (e.g. Bagley et al. 2017;Castellano et al. 2018;Yajima, Sugimura & Hasegawa 2018), as both the total ionizing photon budget and the contribution from galaxies not exhibiting detectable Lyman α emission within such regions tends to remain ambiguous.
What galaxies are expected inside regions of the Universe that reionize early? Depending on the redshift, size, and isolation of such structures, these regions may be highly biased and could in principle contain galaxies with properties that deviate substantially from those in the average galaxy population at the same redshift. Throughout this paper, we will however adopt the conservative assumption that the galaxies clustered within 21 cm bubbles exhibit higher number densities but properties otherwise identical to those in the field population at the same epoch. This zeroth-order estimate can then serve as a benchmark for more detailed simulations in future efforts.
In Section 2, we explain how mapping the galaxy populations within ionized regions of the IGM at z 7 can provide constraints on the role of galaxies in the emergence of these structures. Using seminumerical simulations of galaxy-dominated reionization, we, in Section 3, predict the relation and scatter between the number of ionizing photons emitted from galaxies within a bubble and the resulting volume of that structure, as a function of redshift. The detection limits for galaxies within these structures are explored in Section 4. In Section 5, a number of simplifications adopted in this work are discussed. We also comment on the prospects of using populations of bubble galaxies to constrain early assembly/environmental bias and to place combined constraints on the luminosity function of bubble galaxies and on the time-integrated mean escape fraction of ionizing photons from these objects. Section 6 summarizes our findings.

T H E P H OTO N B U D G E T O F I O N I Z E D B U B B L E S
Considering a spherical ionized region of comoving radius r and volume V ion = (4/3)π r 3 , and ignoring the effect of recombinations inside it, the relationship between the comoving ionized volume and the total number of ionizing photons N ion, tot that has ever been emitted into the IGM in this region can be expressed as where n H is the average comoving number density of hydrogen atoms in the IGM. In scenarios where star-forming galaxies provide the bulk of ionizing photons required for cosmic reionization, N ion, tot corresponds to the total number of ionizing photons that have ever escaped from galaxies within the bubble.
Equation (1) suggests that if SKA-1 is able to identify individual, highly ionized IGM bubbles and also estimate their volume (V ion ), it may be possible to place a constraint on the integrated number of ionizing photons N ion, tot emitted from galaxies within this structure. Formally, the N ion, tot constraint inferred from equation (1) will be a lower limit, since a greater number of ionizing photons will be required once recombinations are considered. However, this N ion, tot estimate is, for reasonable assumptions on the IGM clumping factor, expected to be accurate to within a factor of a few (e.g. McQuinn et al. 2007;Sobacchi & Mesinger 2014).
The number of ionizing photons emitted by the galaxy population into a specific region of the IGM is determined by the number of ionizing photons produced, modulo the escape fraction of these photons. Under the assumption of an invariant stellar initial mass function (IMF), the number of ionizing photons produced is, in turn, related to the total mass in stars produced in this region. However, neither the total mass in stars nor the total number of ionizing photons produced within a region are directly observable. All one can hope to detect is individual galaxies in the bright-end tail of the galaxy population within this volume. In what follows, we will explain how these quantities are related.
While IGM bubbles grow gradually, with galaxies in different mass and luminosity regimes contributing to N ion, tot at different times, a constraint on N ion, tot may nonetheless be converted into a rough estimate on the number of galaxies expected within that bubble at the epoch from which we detect the 21 cm signal. This is possible since the instantaneous, rest-frame 1500 Å UV luminosity L UV (i.e. in the non-ionizing part of the UV; redshifted into the nearinfrared at z > 6), which traces recent star formation (over the past 10 8 yr) within a galaxy is predicted to be correlated with the total stellar mass ever formed in that system and in all the progenitors that have merged into it. This stems from the generic simulation prediction that z > 6 galaxies on average have star formation/accumulation rates that increase over time (e.g. Finlator, Oppenheimer & Davé 2011;Jaacks, Nagamine & Choi 2012;Dayal et al. 2013;Shimizu et al. 2014;Ma et al. 2015;Zackrisson et al. 2017).
The number of ionizing photons N ion, i emitted from a single galaxy i into the IGM over its past star formation history up to the point in time when it is observed (t obs ) can be expressed as whereṄ ion (t) is the production rate of the number of ionizing photons in this galaxy at time t and f esc (t) describes the temporal evolution of the escape fraction of ionizing photons into the IGM.
If we define f esc as the N ion -weighted mean f esc over the past history of the galaxy, equation (2) simplifies to The total number of ionizing photons produced by a whole population of galaxies N ion, tot in a volume V ion can then be derived by integrating over galaxies of all UV luminosities, L UV : where (L UV ) describes the luminosity function of galaxies in this ionized region (in units of galaxies per volume per unit 1500 Å luminosity) -which is going to have a much higher scaling than the galaxy luminosity function in the field. In this equation, we have for simplicity assumed that all galaxies have the same f esc (this assumption is relaxed in Section 5) and that N ion, i only depends on L UV . If we take N ion, i to be known then equation (4) indicates how an estimate on N ion, tot (provided by SKA-1, via the bubble volume V ion in equation 1) can be used to constrain the galaxy population ( (L UV )) within the bubble and the time-integrated escape fraction of ionizing photons f esc from the bubble galaxies. We may, for an individual ionized IGM bubble of a given size, conversely also provide a rough estimate on the number of galaxies that are expected to lie above some UV luminosity detection within this bubble, given an assumption on the relative shape or slope of the galaxy luminosity function within this bubble and on the likely value of f esc . This allows us to assess the prospects of detecting bubble galaxies with some of the telescopes that are expected to be operational in the SKA-1 era, which we set out to do in the following sections.
In reality, N ion, i will vary substantially from galaxy to galaxy of the same observed L UV due to differences in star formation history, metallicity, and dust attenuation, and we will in Section 4 use galaxy spectral energy distribution (SED) models coupled to galaxy simulations in an attempt to quantify the distribution of N ion, i /L UV , i.e. the total number of ionizing photons produced over the momentary UV luminosity, and its impact on the relation between galaxy counts and the ionizing photon budget.

The smallest ionized bubbles detectable with SKA-1
As shown by e.g. Mellema et al. (2015), Wyithe et al. (2015), and Datta et al. (2016), SKA-1 should be able to identify individual ionized IGM bubbles of angular diameters down to 5 arcmin 1 at z ≈ 6-10, which corresponds to a spherical bubble radius of 6-7 cMpc or a spherical volume of 1000 cMpc 3 . Ionized bubbles of this size are most readily detected using the matched filtering technique in the Fourier domain proposed by Datta et al. (2007Datta et al. ( , 2008Datta et al. ( , 2012Datta et al. ( , 2016 and Majumdar et al. (2011Majumdar et al. ( , 2012. This technique optimally combines the complete three-dimensional 21 cm signal from H I outside the bubble using a matched filter. This method also takes advantage of the fact that noise is uncorrelated in the Fourier domain, whereas it is correlated in the image domain, thereby resulting in a higher signal-to-noise ratio for a given bubble size than methods based on imaging (e.g. Mellema et al. 2015;Kakiichi et al. 2017;Giri et al. 2018) or one-dimensional 21 cm spectra (e.g. Geil et al. 2017).

Bubble simulations
To explore how tightly coupled V ion may be expected to be to N ion, tot , we use a set of seminumerical simulations for reionization, which are identical to those presented by Mondal, Bharadwaj & Majumdar (2017). These simulations involve three major steps: (a) First, we simulate the matter distribution at different redshifts using a publicly available particle mesh N-BODY code 2 and assume that hydrogen follows this underlying matter field; (b) Next, we identify collapsed structures in this matter distribution using a publicly available halo finder 3 based on the Friends-of-Friend (FoF) algorithm (Davis et al. 1985); (c) We then assume a model for the sources of ionization hosted by these collapsed haloes and generate an ionizing photon field using a publicly available seminumerical code. 4 A general assumption in our ionizing source model is that the number of ionizing photons that are produced by these sources is proportional to their host halo mass M halo . We use the constant of proportionality n ion (dimensionless) as a parameter for our simulations. This quantity (also known as the ionization efficiency) combines a number of reionization parameters e.g. the star formation efficiency, the fraction of ionizing photons escaping into the IGM, the number of ionizing photons per baryons produced, etc. For a detailed discussion on this, we refer the readers to section 2.3 of Choudhury, Haehnelt & Regan (2009). Finally, we use this ionizing photon field and the matter density field under an excursion set formalism (Furlanetto, Zaldarriaga & Hernquist 2004) to identify ionized regions within the hydrogen distribution (e.g. Mesinger & Furlanetto 2007;Zahn et al. 2007). Our method of simulating the ionization fields during reionization is similar to that of Choudhury et al. (2009), Majumdar et al. (2014, Mondal et al. (2015), and Mondal, Bharadwaj & Majumdar (2016).
The N-body simulation that we use here has a comoving volume of V = [215 cMpc] 3 , corresponding to ∼1.3 • on the sky for 7 < z < 10, with a 3072 3 grid of spacing 0.07 cMpc and a particle mass of 1.09 × 10 8 M . Thus the smallest dark matter halo that we can resolve is 1.09 × 10 9 M (assuming a minimum of 10 particles required to form a halo). Once we have identified the haloes, we then map the matter and the ionizing photon density fields on a grid which is eight times coarser than our original N-body simulation resolution (i.e. on a 384 3 grid). These coarser fields are then used to implement the excursion set formalism. We identify a grid point as neutral or ionized at a certain stage of reionization, by smoothing and comparing the hydrogen density and the photon density fields using spheres of different radii starting from a minimum radius of R min (the coarse grid spacing) to R mfp (mean free path of the ionizing photons). A specific grid point is considered to be ionized if for any smoothing radius R (R min ≤ R ≤ R mfp ) the photon density exceeds the neutral hydrogen density at that grid point. For the simulation shown here we have used n ion = 23.21 and R mfp = 20 Mpc (which is consistent with Songaila & Cowie 2010) at all redshifts. These Relation between the volumes of ionized IGM bubbles V ion at z = 11-7 and the number of ionizing photons N ion, tot that have escaped from galaxies and into the IGM within each such region. The black crosses indicate the size of the 1σ scatter in each bin. The different panels feature the neutral IGM fraction of our default reionization scenario along with the best-fitting N ion -V ion relation at each redshift. For bubbles sufficiently large to be resolved by SKA-1 (V ion 1000 cMpc 3 in the case of spherical bubbles), the 1σ range in N ion at fixed V ion is always limited to a factor of <4.
values of the parameters ensure that reionization ends at z ≈ 6 and we obtain a Thomson scattering optical depth τ = 0.057, which is consistent with Planck Collaboration XLVII (2016). We have used the Planck + WP best-fitting values of cosmological parameters m = 0.3183, = 0.6817, b h 2 = 0.022032, h = 0.6704, σ 8 = 0.8347, and n s = 0.9619 (Planck Collaboration XVI 2014).
Once ionization maps have been generated at a set of redshifts, we once again make use of an FoF algorithm on these gridded ionization maps to identify individual ionized regions. In this FoF algorithm, we identify any cell having a neutral fraction x H I ≤ 10 −4 as ionized. This reionization model and numerical machinery results in several tens to hundreds of ionized IGM bubbles above the SKA-1 tomographic limit (volume 1000 cMpc 3 ) at z ≈ 7-9 within our simulated volume. Rescaling these bubble counts to the volume covered by the planned 100 deg 2 deep SKA1-LOW survey  would result in ≈7 × 10 4 , 1 × 10 4 , and 1 × 10 3 such bubbles per (z) = 1 at z ≈ 7, 8, and 9, respectively. Hence, deep surveys with SKA-1 has the potential to detect substantial numbers of such bubbles up to fairly highly redshifts, although we stress that the exact numbers would depend on the details of the reionization scenario.
In Fig. 1, we plot the number of ionizing photons N ion, tot that have gone into various ionized bubbles of volume V ion at z = 7-11 in our simulations. Due to large density fluctuations on small scales, there is substantial variation (by more than one order of magnitude) in the number of ionizing photons that have been used to produce the smaller bubbles (V ion ∼ 10 1 -10 2 cMpc 3 ). However, as bubbles approach the SKA-1 resolution limit (V ion 10 3 cMpc 3 ), the effects of density fluctuations tend to even out, leaving a 1σ range that corresponds to a factor of <4 in the required N ion, tot for a fixed comoving V ion . This suggests that SKA-1 measurements of the volumes of ionized bubbles may relatively tight limit on the number of ionizing photons that have been emitted into the IGM within these regions, thereby allowing for constraints on the properties of the galaxy populations within these structures. Section 5.3 features a brief discussion on how these results are affected by different assumptions on the ionization efficiency.
The best-fitting N ion, tot -V ion relation varies slightly between the different redshift snapshots, but combining the simulation data from all snapshots with significant numbers of V ion 1000 cMpc 3 bubbles gives the average relation: We will adopt this relation in the following sections to predict the number of galaxies required to produce a bubble of a given volume.
Note that in an exact inside-out reionization scenario, analytically one would expect the power index in equation (5) to be ∼1, which is consistent with the results from our simulations. The scatter in the power-law index from panel to panel of Fig. 1 is mainly due to the spatial fluctuations in the hydrogen number density, clustering of the sources, and non-conservation of the ionizing photon numbers in the later part of the reionization (Choudhury & Paranjape 2018).

H OW T H E I O N I Z I N G P H OTO N B U D G E T W I T H I N I G M B U B B L E S I S T I E D TO P RO P E RT I E S O F B U B B L E G A L A X I E S
The results presented in Section 3 suggest that the volumes of the isolated bubbles that SKA-1 will be able to resolve are strongly coupled to the number of ionizing photons that have been emitted throughout the previous history of these regions. In Section 5.3 we also demonstrate that this number is relatively insensitive to how the production of ionizing photons is distributed across the halo population.
In our fiducial simulations, ionized bubbles of the smallest size that SKA-1 can hope to resolve (V ion ∼ 10 3 cMpc 3 ) include ∼1000 dark matter haloes of mass 10 9 M , which represents a reasonable ballpark estimate of the total number of galaxies expected within these structures (but please note that only a very small fraction of these will be sufficiently bright to be detected). However, the exact number of haloes or galaxies needed to produce the required number of ionizing photons will depend on how efficient these are in emitting ionizing photons into the IGM. If the galaxies produce very few ionizing photons (e.g. because of intermittent star formation) or if only a small fraction of the ionizing photons enter the IGM (due to low f esc ), then only very extreme matter overdensities, with more haloes and more galaxies, will be able to produce resolvable bubbles. Vice versa, if galaxies are highly efficient in emitting ionizing photons into the IGM, then resolvable bubbles will contain fewer haloes and galaxies.
We note that, under the assumption of an invariant stellar IMF, the N ion, tot parameter is closely tied to the total mass M stars locked up in stars. For the set of simulated galaxies and the spectral evolutionary model adopted in this paper (see Section 4.1), the approximate relation is In principle, this stellar mass could be locked up within a single galaxy, but this would require a very extreme scenario. If we assume f esc ≤ 0.1 and consider a galaxy that starts forming stars somewhere in the z ≈ 15-20 range and manages to do so at a constant star formation rate (SFR) thereafter, then the ≈1 × 10 71 ionizing photons required to produce a V ion ≈ 10 3 cMpc 3 bubble (equation 5) by z ≈ 10 would correspond to a total stellar mass of ≥1 × 10 11 M , an SFR ≥ 500 M yr −1 , and a dust-free UV luminosity M UV −25.0. Such bright, high-mass galaxies are not yet known at z 8, and to bring such objects in agreement with the brightest galaxies known in this redshift range (Calvi et al. 2016;Stefanon et al. 2019) would require >2 mag of UV dust attenuation. In this section, we will therefore assume that the ionized IGM bubbles that SKA-1 can resolve contain a population of galaxies, rather than a single object that somehow formed in isolation, and proceed to discuss the details of how the ionizing photon budget of a bubble translates into estimates of the number of detectable galaxies within this structure.

Past production of ionizing photons tied to the rest-frame UV luminosity
By aiming a telescope with near-IR capabilities (e.g. Euclid, JWST, WFIRST, ELT) at the same area of the sky surveyed by SKA-1 for 21 cm emission at z > 6, we can detect the rest-frame UV 5 (λ 1216 Å) light from galaxies in these structures. Throughout this paper, we will quantify the UV luminosity L UV of z > 6 galaxies using the monochromatic luminosity or flux at a rest-frame of 1500 Å. The UV luminosity measured this way reflects the recent SFR over the past ∼10-100 Myr (e.g. Boquien, Buat & Perret 2014). For star formation histories stretching over several billions years, as in the case of low-redshift galaxies, this would not be a good proxy for the total stellar mass or the total number of ionizing photons ever produced by this object, since the prior SFR could have either been much higher or much lower than in the epoch from which we detect its light.
However, simulations of reionization-epoch galaxies generically predict that z > 6 galaxies should experience semicontinuous star formation, often with SFRs increasing over time for the more massive ones (e.g. Finlator et al. 2011;Jaacks et al. 2012;Dayal et al. 2013;Shimizu et al. 2014;Ma et al. 2015). Semicontinuous star formation, coupled to the limited time span since the onset of star formation (a few hundred Myr) in the z > 6 galaxy population, limits the variations one can expect in the ratio between N ion , the cumulative number of ionizing photons a galaxy has produced in the past (either in situ or within smaller galaxies that have merged into this galaxy by the redshift at which it is observed), and L UV . Lowmass galaxies may well experience more stochastic star formation activity (e.g. Mutch et al. 2016;Ma et al. 2018), and consequently larger variations between N ion and L UV , but the greater number density of such objects also means that such variations may largely average out over a bubble population that contains large numbers of galaxies. For the interested reader, Appendix A features a more thorough description of how this N ion /L UV parameter is tied to the prior star formation history.
By combining the Shimizu et al. (2016) simulations for z = 7 and z = 10 galaxies with the stellar population spectra produced with the STARBURST99 model (Leitherer et al. 1999) under the assumption of the Kroupa (2001) universal IMF and Geneva stellar evolutionary tracks with high mass-loss, Calzetti et al. (2000) dust attenuation, and nebular emission as in Zackrisson et al. (2017), we predict the distribution of N ion /L UV ratios as a function of total stellar mass M stars ≥ 10 6.5 M in Fig. 2 for z = 7 and z = 10. For models with a standard stellar IMF, the LyC escape fraction f esc has no significant impact on the 1500 Å luminosity and here it has been set to f esc = 0.
As seen in Fig. 2, the N ion /L UV ratio displays galaxy-to-galaxy variations by factors of a few at the highest masses, but varies by more than two orders of magnitude among the lowest mass galaxies resolved (log 10 (M stars /M ) ≈ 6.5) due to large temporal fluctuations in star formation activity within these objects. Such low-mass galaxies are expected to contribute most to the ionizing photon budget within an ionized IGM bubble, but are also present in larger numbers, which means that summing the fluctuating N ion, i contributions for the whole population of bubble galaxies still results in a fairly well-constrained N ion, tot .
The mean N ion /L UV ratio (solid line) also evolves slightly with M stars and reaches its highest value for the smallest M stars due to the increasingly stochastic SFRs of such objects. High N ion /L UV ratios are produced by galaxies that have experienced a high SFR in the past, but are observed in a phase when the star formation activity is very low, leading to a near-constant N ion set by the prior activity and a fading L UV due to the ageing stellar population.
To put these N ion /L UV ratios into context, a very bright M UV ≈ −20 (L UV ≈ 6 × 10 40 erg s −1 Å −1 ) galaxy at z = 10 with N ion /L UV ≈ 6 × 10 28 photons erg −1 s Å would have produced 4 × 10 69 ionizing photons over its lifetime, which -by itself -is insufficient (by more than an order of magnitude) to produce an ionized bubble that SKA-1 can resolve (requires ∼10 71 ionizing photons) even in the case of f esc ≈ 1. For f esc ≈ 0.1, it would take ≈300 such galaxies to produce a detectable bubble. However, given the shape of the halo mass function or the galaxy luminosity function, it is far more likely that an even larger number of much fainter galaxies is present within these structures.
While it is possible that N ion /L UV ratios even larger than seen in Fig. 2 may be relevant for galaxies below the resolution limit of the simulation used, the N ion /L UV ratio cannot fluctuate without bounds. In the absence of stellar IMF variations, the lower limit would be set by a newborn stellar population (age ≈1 Myr) which for a STARBURST99, Z = 0.004 stellar population of the type adopted here is log 10 N ion /L UV ≈ 27.1 photons erg −1 s Å, whereas the upper limit would be set by an instantaneous-burst population (a.k.a. a single or simple stellar population) with an age equal to the age of the Universe. At z = 7 and z = 10, this limit would be at log 10 N ion /L UV ≈ 30.5 and 31.1 photons erg −1 s Å, respectively. These theoretical limits are indicated by dashed lines in Fig. 2.
In the following text, however, we will assume that the N ion /L UV ratio follows a base-10 lognormal function with μ ≈ 29.34 (28.89), σ ≈ 0.46 (0.34), and arithmetic means N ion /L UV ≈ 3.7 × 10 29 (1.1 × 10 29 ) photons erg −1 s Å at z = 7 (z = 10). While this approach fails to capture the evolution of the mean N ion /L UV ratio with mass evident from Fig. 2, this does not have any substantial impact on the distribution unless assumptions on f esc places very large weight on galaxies in some particular mass range (see Section 5). At z = 7, the mean N ion /L UV ratio varies by a factor of ≈4 across the range of galaxy masses considered, and limiting the contribution to the bubble ionization to some very narrow current mass range could in principle alter the result by up to this factor. At z = 10, the corresponding factor is ≈2. Given the large total number of galaxies ( 1000; see Section 4.2) that in our model are expected to inhabit the ionized IGM bubbles that SKA-1 can detect, the impact of the mass evolution of the object-to-object scatter in N ion /L UV around the mean is negligible compared to the mass evolution of the mean N ion /L UV itself.

Total number of galaxies per bubble
To predict galaxy number counts within individual IGM bubbles, we will adopt the simplifying assumption that galaxies within an ionized IGM bubble exhibit higher number densities but are otherwise similar to field galaxies at the same redshift (see Section 5.1 for a discussion on this). We adopt the relative shape of the z ≈ 7 and z ≈ 10 UV luminosity functions by Bouwens et al. (2015), extended down to M UV = −14, and randomly sample the scatter predicted in the case of the Shimizu et al. (2016) simulation in Fig. 2. We then calculate the number of galaxies necessary to produce the N ion, tot ≈ 1 × 10 71 photons (equation 5) required to obtain a V ion ≈ 10 3 cMpc 3 ionized bubble by z ≈ 7 and z ≈ 10. Following this procedure, we obtain N galaxies f esc ≈ 260 galaxies at z ≈ 7 and ≈1300 at z ≈ 10.
Hence, for f esc ≈ 0.1, we would expect a total of ≈2600 galaxies in a bubble resolvable by SKA-1 at z ≈ 7, and ≈13 000 galaxies at z ≈ 10. The value is higher at z ≈ 10 due to a combination of lower N ion /L UV and differences in the luminosity function. The scatter in N ion /L UV just affects these estimate at the ≈ 10 per cent level compared to adopting a constant N ion /L UV throughout the whole galaxy population.
While we have here adopted M UV = −14 as the faint cut-off of the z = 7-10 luminosity function, observations of lensed fields have indicated that it may in fact extend several magnitiudes fainter than this before turning over (e.g. Bouwens et al. 2017;Livermore, Finkelstein & Lotz 2017). The effect of adopting a fainter cutoff limit would boost the total number of bubble galaxies, thus further reducing the effects of galaxy-to-galaxy scatter in N ion /L UV . For instance, assuming that bubble galaxies are forming down to M UV = −10 (while keeping the same luminosity function shape) would boost the total number of galaxies by a factor of ≈30, but has a much smaller effect on the number of detectable galaxies, as will be demonstrated in the next section. The factor of ≈30 is smaller than would be expected from a simple extrapolation of the luminosity function to fainter magnitudes, since this extension alters the ionizing photon flux budget and requires a different absolute scaling of the luminosity function.

Galaxy detection limits
Only a small fraction of the galaxies present within an ionized IGM bubble ( 1 per cent by number) are likely to appear above the detection threshold of near-IR telescopes within the foreseeable future.
To provide quantitative estimates for the number of detectable galaxies, we consider both photometric detections with Euclid, WFIRST, and JWST plus spectroscopic detections with ELT/MOSAIC and JWST/NIRSpec. The pros and cons of these two detection methods are described in more detail in Section 4.5, but the basic difference is that spectroscopic surveys are less prone to line-of-sight interlopers, whereas photometric surveys in principle can probe further down the galaxy luminosity function within the bubble. Below, we describe the various detection limits we consider in the discussion on detectability of bubble galaxies. When assessing the detection of emission lines, we have chosen to be conservative and therefore ignore the Ly α line. Even though the ionized IGM in SKA-selected bubbles may well allow a favourable transmission factor of Ly α photons through the IGM, scattering and extinction within the galaxies may still render this line very weak for many of these objects.
Euclid 6 is a 1.2 m telescope scheduled for launch in 2022 with optical and near-IR imaging capabilities that can also do 1.1-2.0 micron slitless spectroscopy (resolution λ/ (λ) = 250). While Euclid will provide a survey of 15 000 deg 2 , Euclid deep fields of about 40 deg 2 in total will also be observed, with 5σ broad-band detection limits in the optical of m AB ≈ 27 mag and m AB ≈ 26 in the YJH bands. Galaxy candidates at z > 6 can be singled out through drop-out criteria in multiband surveys of this type, by requiring these candidates to be undetected in all filters that sample their spectra at wavelengths shortward of the redshifted Ly α break, yet detected in one or several filters on the longward side of the break. Throughout this paper, we will assume that a sufficient drop-out criterion is met if an object is undetected at the 2σ level shortward of the Ly α break, yet detected at 5σ or more in at least one filter on the other side. For the multiband imaging surveys considered in this paper, we neglect any minor variation in flux detection thresholds among the different near-IR bands, and therefore simply adopt the 5σ limit as the effective drop-out detection threshold (m AB ≈ 26 in the case of Euclid). The line detection limit for Euclid is estimated at ≈5 × 10 −17 erg cm −2 s −1 (Marchetti, Serjeant & Vaccari 2017). For galaxies at z 7, Euclid can cover lines in the rest-frame UV up to λ ≤ 2500 Å, which basically covers He II (1640 Å), C IV (1549 Å), O III] (1666 Å), C III] (1909 Å). However, these lines are in general expected to remain undetectable at m AB ≈ 26 mag for a ≈5 × 10 −17 erg cm −2 s −1 spectroscopic detection limit (e.g. Shimizu et al. 2016), which means that Euclid will only detect z 7 objects as faint as m AB ≈ 26 mag through imaging. Hence, we only consider an m AB ≈ 26 mag photometry threshold of this telescope.
WFIRST 7 is a 2.4 m telescope scheduled for launch in the mid-2020s, which is envisioned to be equipped with imaging capabilities in the 0.48-2.0 micron range and a slitless spectroscopy mode covering 1.00-1.95 micron. WFIRST will carry out wide-field surveys (2200 deg 2 in the high-latitude survey), but also deep field 6 http://sci.esa.int/euclid/ 7 https://www.nasa.gov/wfirst (≈20 deg 2 ) observations with expected imaging and spectroscopy detection limits of m AB ≈ 28 mag and ≈1 × 10 −17 erg cm −2 s −1 , respectively. Here, we consider only the imaging detection limit, since the line flux detection limit will effectively be much brighter than the m AB ≈ 28 mag limit (as predicted by the simulations of Shimizu et al. 2016).
James Webb Space Telescope 8 (JWST), scheduled for launch in 2021, is a 6.5 m telescope that will be able to do extremely deep imaging and spectroscopy in the 0.6-5 micron range and will hence have access to the rest-frame optical lines from z ≈ 7-13 galaxies that neither Euclid nor WFIRST will. The downside is the much smaller field of view, which is 3.6 arcmin × 3.4 arcmin for the JWST/NIRSpec spectrograph and 4.4 arcmin × 2.2 arcmin for JWST/NIRCam (imaging or spectroscopy), which implies ≈2-3 fields to cover just the smallest ionized bubbles that SKA-1 can resolve (≈5 arcmin across). If we consider a total of ≈20 (≈100) h of exposure time to cover a single ionized bubble with either photometry or spectroscopy, we arrive at a point-source detection limit of m AB ≈ 29 (30) mag for NIRCam imaging in two shortwavelength channel (0.6-2.2 μm) filters across two fields (although photometry in two long-wavelength channel filters at 2.5-5 μm would also be achieved simultaneously). The corresponding limits for spectroscopy are redshift-dependent, because the most suitable emission line ([O III] (5007 Å) at z = 7 is out of JWST/NIRSpec range at z = 10. Using the Shimizu et al. (2016) model to predict the emission-line strengths for galaxies with a given UV continuum flux, we instead base the z = 10 limit on the [O II] (3727 Å) line at z = 10. For observing programmes of either ≈20 h or ≈100 h across two fields in resolution R ≈ 1000 mode, this results in S/N ≈ 5 line detection limits of 3 × 10 −19 erg s −1 cm −2 or 1.3 × 10 −19 erg s −1 cm −2 , which corresponds to galaxies of m AB ≈ 28.3 or ≈29.2 mag at z = 7, but m AB ≈ 26.5 or m AB ≈ 27.5 at z = 10. The continuum detection limits are approximately the same if the C III] (1909 Å) line is targeted instead of [O II] at z = 10.
Extremely Large Telescope 9 (ELT) is the largest ground-based optical/near-IR telescope under construction and will have first light around 2025. The currently planned ELT instrumentation does not allow for wide-field imaging, so we do not consider this option in this paper. However, the planned MOSAIC instrument, which is expected to be operational towards the end of the 2020s, is expected to be capable of multi-object spectroscopy at 0.9-1.8 micron over a 7 arcmin diameter field. Hence, MOSAIC can cover the smallest ionized bubbles detected by SKA-1 in just one field and should, in a total of 40 h of observing time, be able to detect rest-frame UV lines at z ≈ 7 with S/N = 5 at ≈1 × 10 −19 erg cm −2 (Evans et al. 2015). By targeting the C III] (1909 Å) line at z = 7 and the C IV (1549 Å) at z = 10, this corresponds to galaxies with m AB ≈ 29.0 mag at z = 7 and ≈28.25 mag at z = 10, based on predictions from the Shimizu et al. (2016) models. However, we stress that pre-imaging at this depth will be required to select the spectroscopic targets for ELT/MOSAIC, and this imaging cannot easily be performed by ELT itself given currently planned instrumentation.

Detectable galaxies per bubble
In Fig. 3, we show the number of galaxies expected above these various detection limits for three different options concerning the time-integrated, photon-number-weighted LyC escape fraction Figure 3. Number of galaxies expected above different limiting rest-frame UV absolute magnitude within a V ion ≈ 1000 cMpc 3 ionized bubble (set by the requirement that ≈1 × 10 71 ionizing photons need to be emitted into the IGM) at z = 7 (left) and z = 10 (right), for f esc = 0.05 (blue stripes), 0.1 (green stripes), or 0.2 (red stripes). The width of the stripes is set by the predicted standard deviation in the galaxy number counts between individual bubbles, caused by the random sampling of the adopted luminosity function with faint cut-off at M UV = −14. Even in the most pessimistic case ( f esc = 0.2), one expects to detect tens of galaxies above the deepest detection limits. However, even the most optimistic predictions ( f esc = 0.05) indicate that Euclid could well be blind to galaxies in the smallest ionized IGM bubbles that SKA-1 may resolve. f esc . Here, we have assumed that the bubble galaxies follow a luminosity function with the same relative shape (but different scaling) as the Bouwens et al. (2015) z ≈ 7 and z ≈ 10 UV luminosity function, extended down to M UV = −14. Under the assumption that f esc has no mass/luminosity dependence (see Section 5.6 for a discussion on this), the number of galaxies expected above the various detection threshold is given by the differently stripes, for f esc = 0.05, 0.1, and 0.2.
How would these results change if we assume that the luminosity function retains its shape faintward of M UV = −14? Our computational machinery indicates that, at fixed f esc , one expects to detect a factor of ≈2 fewer galaxies at z = 7 (a factor of ≈3 at z = 10) within an ionized bubble if the luminosity function is extended down to M UV = −10. The conversion can simply be done by shifting all the plotted galaxy counts down by this factor. It should however be noted that this parametrization assumes that galaxies all the way down to M UV = −10 display N ion, tot /L UV ratios that follow a lognormal distribution with parameters similar to those presented in Section 4.1. However, galaxies as faint as M UV = −10 may have total stellar masses as low as M stars ∼ 10 5 M , which is significantly below the resolution limit of our simulations. The intermittent star formation episodes expected in such low-mass systems may well cause a shift in the mode of the distribution, which can be explored with higher resolution simulations in the future.
There are a few things to note from Fig. 3. For reasonable values of f esc (≈0.05-0.2), considerable numbers of potentially detectable galaxies are expected within each bubble at both z = 7 and z = 10, and this number scales with 1/ f esc , since a higher f esc means that fewer galaxies are required to provide the ionizing photons needed to form the bubble. Even in the most pessimistic case shown ( f esc = 0.2), one expects to detect a handful of galaxies at z = 7 with WFIRST photometry and several tens of galaxies with either JWST photometry programme. Spectroscopy with ELT or JWST can also produce ∼10 bubble galaxy detections.
At z = 10, JWST spectroscopy fares somewhat worse than ELT spectroscopy because the intrinsically brighter [O III] line that gave JWST an edge at z = 7 have been redshifted out of JWST range. The pessimistic limits at z = 10 places several tens of galaxies above the detection limit of the JWST imaging surveys, and a few galaxies above the threshold of either WFIRST imaging or ELT spectroscopy.
The detection prospects for Euclid are considerably worse, and we conclude that Euclid may largely be blind to drop-out galaxies in the smallest ionized IGM bubbles that SKA-1 can resolve.
The difference between the z = 7 and z = 10 cases is that, due to the lower N ion /L UV at z = 10 (see Section 4.1) and a slightly different shape of the adopted luminosity function, a larger number of galaxies is required at z = 10 than at z = 7 to produce a bubble of a given size, provided that f esc is kept fixed. The detection limits are also slightly shifted to brighter UV luminosities at z = 10.
Because f esc is here assumed to be independent of UV luminosity (see Section 5 for a discussion on this), only a minor fraction (e.g. ≈7-20 per cent for WFIRST, but up to ≈20-50 per cent for the deepest JWST photometry limits) of the ionizing photons that have contributed to the ionization of the IGM in the bubble are accounted for by galaxies above the detection limits in the case where the luminosity function is truncated at M UV = −14.
As an alternative to the procedure used to generate the galaxy count predictions of Fig. 3, in which the relative shape of the observationally determined field luminosity functions was used as a basis for populating IGM bubbles with galaxies until a fixed ionizing photon budget had been reached, one may instead start from the halo mass distribution. In Fig. 4, we start from the dark matter haloes predicted within the ≈1000 cMpc 3 bubbles predicted by the fiducial reionization simulations of Section 3.2 at z = 7 and then adopt the fitting function presented by Inoue et al. (2018) for the relation (and Gaussian scatter) between halo mass and UV continuum luminosity to attach galaxy fluxes to each halo. If we furthermore adopt the N ion /L UV distribution of Section 4.1, we would arrive at f esc ≈ 0.15 for these galaxies. However, the predicted number of bright galaxies (M UV < −19) in Fig. 4 is several times lower than would be expected from even f esc ≈ 0.2, in Fig. 3, whereas the number of galaxies at the faintest detection levels (M UV ≈ −17) in most cases are similar to what one estimate for f esc ≈ 0.15 through interpolation in Fig. 4. This is primarily because the procedure of relating UV  Figure 4. Same as the left-hand panel of Fig. 3, but with predictions based on halo catalogues from five different realizations of ≈1000 cMpc 3 bubbles drawn from the seminumerical simulations of Section 3.2, coupled to an analytical recipe for coupling halo mass to UV luminosity. Each orange stripe corresponds to the predicted galaxy counts for one bubble, with the stripe width representing the standard deviation in galaxy number counts stemming from the scatter in the relation between halo mass and UV luminosity. This procedure gives rise to a luminosity function for the bubble galaxies that differs in shape from the one assumed in Fig. 3, with fewer galaxies that would be detectable above the brighter detection limits but mostly a similar number at the faintest detection limit as the predictions for f esc ≈ 0.2 in Fig. 3. However, one of the bubbles has managed to reach ≈1000 cMpc 3 volume with a significantly lower number of total ionizing photons than the rest, implying much lower galaxy counts. luminosities to halo masses in Fig. 4 results in a luminosity function for bubble galaxies with a different shape than that assumed in Fig. 3, with more very faint galaxies at the expense of bright ones. Moreover, this procedure also gives rise to substantial bubble-tobubble variations in predicted galaxy counts, due to differences in halo mass distributions within bubbles of similar volume. To illustrate this, Fig. 4 features galaxy predictions for five randomly selected ≈1000 cMpc 3 bubbles. One of these bubbles stands out in having reached this volume despite a much smaller combined dark halo mass and ionizing photon production than the rest, resulting in a galaxy population that would only be detectable at the very faintest detection limits considered.
As further discussed in Section 5.1, more sophisticated simulations, meeting observational constraints on the properties of z 7 galaxies in overdense regions, will be required to gauge which of the two approaches, Figs 3 or 4, results in the most realistic estimates.

Photometric or spectroscopic selection?
Bubble galaxies may be identified either in an imaging/photometry survey or through spectroscopy. For a given telescope and a fixed total observing time, photometry will typically reach deeper, but drop-out criteria (or photometric redshifts based on an SED fit from multiband data) have the drawback of not allowing very accurate redshift information. For a typical broad-band drop-out criterion, the redshift error will be (z) ≈ 1. This should be compared to the size of ionized IGM bubbles, which for a spherical 1000 cMpc 3 bubble will cover a line-of-sight depth that at z = 6-10 will be (z) ≈ 0.03-0.06. Hence, an imaging survey runs the risk of misidentifying galaxies located in the foreground or background as Figure 5. Schematic figure illustrating the issue of interlopers in a dropout survey (here assumed sensitive to galaxies within a redshift interval of (z) = 1) towards an ionized bubble at either redshift z = 7 or z = 10. The number of galaxies inside an ionized bubble of fixed size detectable by an imaging survey are expected to change just by factors of a few between z ≈ 7 and z ≈ 10, whereas the number of interlopers in the line-of-sight (z) = 1 volume is expected be significantly higher at lower redshift. Hence, a photometric survey becomes less risky when aimed at a bubble at higher redshift (here z ≈ 10) than when aimed at a bubble at the end of reionization (z ≈ 7). bubble members. A spectroscopic survey, on the other hand, would only need a relatively low spectral resolution of R = λ/ (λ) 200 to reach the redshift accuracy required to identify the bubble membership of a given galaxy through the detection of an identified emission line.
How substantial is the risk of misidentifications in a photometric survey? This depends on the redshift of the bubble targeted. From the detection limits in Fig. 3, we see that the number of detectable galaxies above a fixed luminosity, within a bubble of fixed size, changes by no more than a factor of ≈3 between z ≈ 7 and z ≈ 10 if a constant f esc is assumed. At the same time, the ambient number density of galaxies above a certain threshold luminosity drops by an order of magnitude between these redshifts (Bouwens et al. 2015). This leads to a situation (schematically illustrated in Fig. 5) where the number of interlopers in a drop-out survey towards a given bubble will be much greater towards the end of reionization (z ≈ 7) than at earlier stages (z ≈ 10).
Another way to understand this is to note that ionizing a ∼1000 cMpc 3 volume requires a similar ionizing photon budget, and hence a similar collapsed mass, at both redshifts (Fig. 1). However, a much higher peak in the density field is needed for such a structure to collapse and form stars by z ≈ 10 than at z ≈ 7. Such high peaks are correspondingly rarer, and their overdensity compared to their environment is much greater. For a bubble with a line-ofsight depth of (z) ≈ 0.05, the volume probed by a broad-band imaging survey (line-of-sight resolution (z) ≈ 1) at z = 7-10 will be more than 20 times larger than that of the bubble, but the average galaxy number density in this volume is also likely to be much lower, since the overdense regions tend to be the first to reionize. If we adopt the cosmic average for the number density of galaxies in the line-of-sight volume outside the bubble, the Bouwens et al. (2015) luminosity function predicts that there should at z ≈ 7 be ≈25 interloper galaxies at M UV ≤ −19.5 in the (z) ≈ 1 cylindrical volume projected against a 1000 cMpc 3 bubble. This is larger than the number of bubble galaxies for all the f esc cases considered at z = 7 in Fig. 3. However, at z = 10, the number of interloper galaxies at M UV ≤ −19.5 (approximately the WFIRST photometry limit at this redshift) is ≈1, which means that photometric interlopers within the (z) = 1.0 volume selected by drop-out criteria are unlikely to be a problem, since the number of bubble galaxies predicted in Fig. 3 is expected to be several times higher than this.
A consistency check of this conclusion, based on halo statistics instead of luminosity functions, is presented in Appendix B.
We caution, however, that a purely photometric survey may also be prone to interlopers from much lower redshifts than indicated by the (z) = 1 volume considered above, because very strong optical emission lines can give the appearance of a Ly α break, unless the selection is based on detection on several filters shortward of the break.

Galaxy assembly bias in overdense regions
Throughout this paper, we have -as a first baseline approach -adopted the assumptions that the relative shape of the galaxy luminosity function at z ≥ 7 is independent of environment, that the ionized bubbles of the IGM simply feature a scaled-up version of the field luminosity function at the same redshift, and that there are no significant differences between the properties of galaxies residing in a matter overdensity and those in the field. However, overdense regions that reionize early are, by definition, not typical and may well have galaxy populations quite different from the cosmic average at this epoch.
At low redshifts, it is well established that the luminosity function and the ratio of red-to blue-sequence galaxies change with environment (e.g. McNaught-Roberts et al. 2014). Changes in galaxy properties with environment have observationally been traced up to z ≈ 3 (e.g. Grützbauch et al. 2011), but exactly how early such differences get imprinted in the galaxy population remains an open question (for a review, see Overzier 2016).
In simulations, both the shape of the dark halo mass function and the properties of individual haloes of a given mass (in terms of accretion rate, spin, concentration, and shape) are predicted to be affected by the overdensity of the environment (Lee et al. 2017), and increased merger rates, galaxy interactions, and the feedback from ionizing radiation produced within overdense regions may further augment changes in galaxy properties compared to the field population.
A common expectation is that feedback from an ultraviolet background may quench star formation in low-mass dark matter haloes (e.g. Mesinger & Dijkstra 2008;Sobacchi & Mesinger 2013;Maio et al. 2016). In an ionized bubble, this could potentially alter the shape of the galaxy luminosity function at the faint end, or alternatively affect the typical ionizing emissivity of faint galaxies. To first order, this can be treated as an effective truncation of the luminosity function (below which galaxies do not contribute ionizing photons) similar to the different faint luminosity function extensions that we have considered in this paper (M UV limit -14 to -10). However, if external feedback also affects the star-forming properties of massive galaxies (for a scenario of this type, see Susa 2008), then changes to the shape of the luminosity function at the bright end may also occur. Assembly bias (the notion that the statistical properties of galaxies depend on properties other than halo mass) could also manifest itself in other, more complex ways. Indeed, some simulations have indicated differences in specific SFRs, galaxy mass functions, metallicities, and dust content in overdense regions compared to field at z ≥ 6 (Yajima et al. 2015;Sadoun et al. 2016).
As samples of z > 7 galaxies grow larger, it should be possible to observationally test for such environmental effects by, for instance, studying the slope of the bright end of the luminosity function as a function of clustering. A couple of notable overdensities of galaxies at z 7 have already been discovered in deep HST surveys -a z ≈ 7 overdensity of 17 Lyman break galaxy candidates, out of which three are confirmed Lyman alpha emitters (two with consistent redshifts) in an area a few arcminutes across (Castellano et al. 2016(Castellano et al. , 2018, and a z ≈ 8.4 overdensity of up to eight Lyman break galaxy candidates, out of which one is a confirmed Lyman alpha emitter, in an area just a 10 arcsec across (Ishigaki, Ouchi & Harikane 2016;Laporte et al. 2017). Provided that a fair fraction of these Lyman break galaxies are at the redshift of Lyman alpha emitters, the brightness distribution of Lyman break galaxies in these regions is roughly consistent with what our models predict for ionized regions of a scale that SKA-1 should be able to resolve. Recently, an overdensity of 12 Lyman α emitters at z ≈ 6.6 was discovered by Harikane et al. (2019), but these appear to cover a volume that is considerably larger than covered by our simulations. By targeting such overdensities with deep photometric and spectroscopic JWST observations it may be possible to observationally constrain feedback and environmental effects within some of the most extreme overdensities in the reionization epoch. A few years down the line, many more such overdensities are also expected in to be uncovered in WFIRST deep field observations, which are expected to cover ∼100 times the solid angle of correspondingly deep HST surveys.
In future efforts, it would be worthwhile to study the properties of simulated galaxies drawn directly from ionized IGM regions of reionization simulations to study the effects that assembly bias and galaxy feedback are expected to have on galaxy counts within such structures. Recently, Geil et al. (2017) used such simulations to study how the sizes of ionized bubbles correlate with the luminosity of the brightest galaxy within the bubble. Their study focuses on ionized IGM bubbles at z ≈9-11 that are factors of a few smaller in radius than the SKA-1 tomographic resolution limit than we have adopted, but if we apply their best-fitting relation between bubble radius and brightest bubble galaxy at z ≈ 10 to bubbles of radius 2-3 cMpc (similar to their mean bubble size at this redshift) to derive the UV luminosity of their brightest galaxy in these structures, and then scale the number of such galaxies (M UV = −18.3 to −19.5) up by factors of ≈9-30 to match the volume of our ≈1000 cMpc 3 bubbles, we get a good match with our f esc = 0.2 predictions for this luminosity range at z = 10 in Fig. 3. While this indicates that there is agreement for the average bubble, the extreme tail of their distribution contains galaxies that are much brighter than any of our predictions in Fig. 3 -for example, their very largest bubble (radius ≈4.8 cMpc) at z ≈ 11 contains a brightest galaxy that attains M UV = −22.5 (albeit under the assumption of zero dust attenuation) which seems to completely dominate the UV luminosity of its bubble (as seen in their fig. A1), violate our assumption of a relatively well-sampled bubble luminosity function and lie closer to the monolithical galaxy case discussed in Section 4. Whether even rarer and more extreme objects could exist that produce individual ionized IGM bubbles sufficiently large to be detectable with SKA-1 warrants further study.

Population III stars and active galactic nuclei
Our current treatment assumes that ionizing photons from young stars represent the only substantial contribution to the ionizing photon budget within 21 cm bubbles. However, if objects or mechanisms other than star-forming galaxies and their associated Lyman continuum radiation contribute to bubble growth, this could of course affect the outcome.
Population III stars are expected to emit a large fraction of their radiation at Lyman continuum energies, owing both to high stellar surface temperatures at zero metallicity and potentially a top-heavy stellar IMF (e.g. Schaerer 2002). However, Population III stars in minihaloes are only likely to dominate the cosmic SFR density at z 15 (e.g. Maio et al. 2010) and are not expected to play any major, global role during late stages of cosmic reionization (e.g. Kulkarni et al. 2014). Even so, one could envision such stars to have a significant effect locally, in regions of delayed and highly concentrated Population III star formation.
It has been argued that Population III galaxies may form in z < 15 H I cooling haloes that have managed to remain chemically pristine (Stiavelli & Trenti 2010), but current simulations suggest that such systems are unlikely to attain total stellar masses higher than ∼10 6 M (e.g. Yajima & Khochfar 2017;Inayoshi, Li & Haiman 2018). Based on the Population III galaxy SED models of Zackrisson et al. (2011), such systems cannot, over the expected lifetimes of ∼10 6 -10 8 yr, produce more than ∼10 70 ionizing photons even under the assumption of an extremely top-heavy stellar IMF (typical stellar mass ∼100 M ). Since ∼10 71 photons are required to produce a bubble detectable with SKA-1, such rare, exotic galaxies cannot contribute significantly to the photon budget, even if they exhibited extreme Lyman continuum leakage (f esc ≈ 1).
This leaves black hole accretion as the most likely mechanism to rival the ionizing flux from star-forming galaxies within bubbles close to the SKA-1 resolution limit. A single quasar can easily produce ∼10 71 ionizing photons in as little as ∼10 7 yr (e.g. Maselli et al. 2007). An active, high-luminosity quasar would usually be readily identifiable as such from spectroscopy, but an accreting black hole that has contributed early on and then has turned dormant may not be. Is it then possible that SKA-1 may detect ionized IGM bubbles that appear completely devoid of galaxies when probed with upcoming near-IR telescope, because the quasar responsible for the bubble is in an inactive phase? This could potentially happen for the most shallow limits considered in Fig. 3 but seems unlikely for the deepest ones. Using scaling relations from Wood & Loeb (2000), a supermassive black hole radiating at the Eddington luminosity for ∼10 7 yr with f esc = 1.0 would need to have a mass of ∼10 8 M to seriously impact the ionizing photon budget of a 1000 cMpc 3 bubble. For a bulge-to-supermassive black hole mass ratio of 0.1 at z ≤ 10 (Targett, Dunlop & McLure 2012), this requires a host galaxy with a stellar mass of at least 10 9 M , which is expected to have a luminosity M UV < −19 and therefore be in the detectable range of many of the bubble surveys considered in Fig. 3. Moreover, statistics on galaxy counts within a few bubbles of similar size can put constraints on scenarios including such transient, stochastic contributions to bubble growth; and the sharpness and morphology of the 21 cm profile of the bubble may also reveal contributions from X-ray photons that could be inconsistent with a normal stellar population (e.g. Tozzi et al. 2000;Wyithe & Loeb 2007a;Pacucci et al. 2014;Ghara et al. 2016;Kakiichi et al. 2017).

The ionization efficiency of haloes
The fiducial reionization simulations used to produce N ion, tot -V ion predictions of Fig. 1 are based on the assumption of a fixed ionization efficiency n ion for all haloes above a minimum halo mass of M halo, min = 1.09 × 10 9 M . Raising (lowering) the adopted ionization efficiency alters the reionization history by shifting the completion of reionization to higher (lower) redshifts (see fig. 1 of Greig & Mesinger 2017b), but this effect can be offset by simultaneously raising (lowering) M halo, min . Through this degeneracy, current observational constraints on cosmic reionization are consistent with ionization efficiencies that differ by an order of magnitude (Greig & Mesinger 2017a;Monsalve et al. 2018).
One could perhaps suspect that our results concerning the connection between the volume (V ion ) of ionized bubbles and the number of ionizing photons produced therein (N ion, tot ) would strongly depend on the overall scaling of n ion or its halo mass dependence. However, while changing the relation between n ion and M halo does affect the reionization history of the simulations, we find that the bubble population remains strongly correlated with the overall ionization state of the Universe [i.e. the global neutral fraction (x H I ) quoted in different panels of Fig. 1 for our standard parameter set], and that both the average N ion, tot -V ion relation and the scatter around this relation remains almost the same for V ion ≥ 1000 cMpc 3 despite rather substantial alterations of the relation between n ion and M halo . Specifically, by redistributing the production of ionizing photons across the halo population, while still retaining the same reionization history (x H I (z)) as in Fig. 1, we find that the scatter at V ion ≥ 1000 cMpc 3 is insignificantly altered when either the minimum halo mass for ionizing photon production is raised to M halo, min = 10 10 M , when a random uniform scatter that corresponds to a factor of 20 variation in ionizing efficiency is added to each halo, or when the relation between the ionizing photons produced by a halo(∝ M n halo ) is altered from n = 1 to n = 1.41.

The impact of small-scale density variations
A shortcoming in the seminumerical simulations that we have used to obtain the relation between V ion and N ion, tot in Section 3 is that this machinery does not take small-scale spatial variations of hydrogen density into account when calculating the recombination rate. Instead, the recombination rate is assumed to be uniform throughout the IGM (independent of density). This ignores the effect of the clumping factor and hence the relatively rapid recombinations that may take place in high-density, non-star-forming regions known as damped Lyman α absorbers (DLA) or Lyman limit systems (LLS). These DLAs and LLSs are expected to be dense enough to self-shield themselves from the ionizing background created by the galaxies and quasars and work as the sinks of ionizing radiation within an ionized bubble. Under the standard model of structure formation in our Universe, these systems are expected to be more abundant than the haloes that are capable of hosting galaxies that we assume produces majority of the ionizing photons. A failure to properly consider such small-scale, high-density systems therefore likely leads us to underestimate the N ion, tot required to produce bubbles of a given volume V ion .
To accurately take into account the effects of these absorbers when predicting the ionization topology and the corresponding ionizing photon budget would require reioniziation simulations spanning a very large dynamic range. On one hand they should have mass resolutions of the order of Jeans scales to model the recombination processes inside these sub-Mpc objects and on the other hand they should be able to simulate large volumes (of the order of Gpc) to be able produce the bubble size distribution and the corresponding large-scale fluctuations in the signal. Recent results from theoretical modelling (Schaye 2001;Kaurov & Gnedin 2015), high-resolution but sub-Mpc size simulations (Park et al. 2016) and their sub-grid adaptation in the large-scale seminumerical simulations (Sobacchi & Mesinger 2014) suggest that non-uniform recombinations in the IGM would cost two to three or more photons per ionized hydrogen atom by the end of the reionization era. A direct and obvious implication of this on the N ion, tot versus V ion plot shown in Fig. 1 will be an increment in the amplitude of the power-law fit at all stages of reionization. This would also increase the scatter in the plot at the low V ion end of the plots. However, as we approach V ion ∼ 1000 Mpc 3 (our adopted SKA-1 detection limit), one would expect this scatter to die down substantially due to the effect of averaging over large volumes. This should be explored more carefully in future simulations.

Multiple ionizations from each ionizing photon
Our current treatment assumes that each hydrogen-ionizing (Lyman continuum) photon emitted from stars will ionize exactly one hydrogen atom. However, ionized gas may itself emit ionizing photons through free-bound and free-free transitions, effectively resulting in multiple ionizations from a single stellar Lyman continuum photon. The overall impact of this depends on the shape of the ionizing stellar continuum and the gas temperature, but is not expected to boost the effective ionizing emissivity of galaxies by more than at most a factor of 1.6 under realistic conditions (Inoue 2010).

The escape fraction of ionizing photons
Throughout this paper, we have assumed the same f esc for all bubble galaxies, but one can also envision this parameter evolving as a function of UV luminosity, halo mass, or stellar mass (e.g. Yajima, Choi & Nagamine 2011;Kimm & Cen 2014;Paardekooper, Khochfar & Dalla Vecchia 2015;Xu et al. 2016;Sharma et al. 2017). This will affect the number of galaxies required to emit a fixed number of ionizing photons into the IGM, but the detectable number of bubble galaxies may still be tied to the time-integrated, photon-number-weighted f esc of the whole bubble population. This population-wide f esc quantity basically represents the ratio of the total number of ionizing photons that have ever escaped into the IGM over the total number produced within the bubble, and the results of Fig. 3 still hold as long as f esc is interpreted this way. For example, in the case where the z = 7 luminosity function is considered to extend to M UV = −14, a scenario in which the very faintest galaxies with M UV > −16 have f esc = 0.25 and the rest have f esc = 0 produces a population-wide f esc ≈ 0.1 and the same number of detectable galaxies as predicted by f esc ≈ 0.1 in that figure. Similarly, a scenario with the same luminosity function in which bright galaxies with M UV ≤ −19 have f esc = 0.25 and fainter galaxies have f esc = 0.0 corresponds to a populationwide f esc ≈ 0.05 and the same number of detectable galaxies as indicated by that line.
In principle, it would seem that using SKA-1 to estimate the volume of an ionized bubble and then simply counting the galaxies brighter than a specific M UV detection threshold within that structure would result in an observational constraint on the f esc parameter. However, this would only be possible in so far as both the shape of bubble galaxy luminosity function above and below the M UV detection limit and the N ion /L UV parameter are under control, i.e. that assembly/environmental biases of the type discussed in Section 5.1 are either negligible or can be accurately modelled. Moreover, the measurement error on the SKA-1 volume would also need to be 50 per cent to make this measurement meaningful -a very difficult endeavour indeed.

Studying larger bubbles
In this paper, we have focused on ionized bubbles of size ≈1000 cMpc 3 , which is at the limit of what SKA-1 can hope to detect. In future works, it would also be interesting to consider larger bubbles, since these would be easier to detect in the 21 cm data due to higher signal-to-noise ratio. Providing a census on the galaxy content of such structures with near-IR telescopes would on the other hand be more challenging, since the observing time required to survey the galaxy content of such bubbles to the same depth scales with the projected area of the bubble in the plane of the sky. Hence, a spherical bubble that has a volume a factor of 5 times larger requires a factor of 5 (2/3) ≈ 3 more observing time if the same magnitude limit is to be reached. On the other hand, larger structures are also likely to contain brighter galaxies, which would make the most extreme members detectable with telescopes wide-field survey telescopes like Euclid.

C O N C L U S I O N S
Our results can be summarized as follows: (i) Our seminumerical simulations indicate that -for ionized IGM bubbles sufficiently large to be resolved by SKA-1 (volume 1000 cMpc 3 ) -there is a relatively tight relation between the volume of ionized IGM bubbles at z = 7-10 and the number of ionizing photons that have escaped from the galaxies within. For bubbles of volume 1000 cMpc 3 , the 1σ bubble-to-bubble scatter in this relation is a factor of <4 (Section 3).
(ii) The total number of ionizing photons required to produce an ionized IGM bubble sufficiently large to be resolved by SKA-1 (volume 1000 cMpc 3 ) at z > 6 is 1 × 10 71 . This can be converted into a rough constraint on the minimum total stellar mass that has formed within such a structure (equation 6), and -using additional assumptions on the properties of the bubble galaxiesestimates on the number of galaxies detectable within that structure at the redshift where the bubble is detected (Section 4).
(iii) Using conservative assumptions on the shape of the luminosity function of bubble galaxies, the prior star formation history of these objects and their time-integrated, photon-number-weighted mean Lyman continuum escape fractions, we predict that the deepest spectroscopic surveys with JWST or ELT of SKA-detected ionized bubbles at z = 7-10 are expected to turn up at least a few bubble galaxies (and in many cases far more), even in the case of the smallest resolvable bubbles. The same also holds for purely photometric (multiband-imaging) observations with JWST or WFIRST. However, Euclid may only be able to detect galaxies within ionized bubble with volumes an order of magnitude higher than the SKA-1 detection limit (Section 4.4). Detailed detection predictions are presented in Figs 3 and 4.
(iv) Although spectroscopic observations (with JWST or ELT) may be required to survey the smallest SKA-detected bubbles towards the end of reionization (z ≈ 7) without ending up with excessive numbers of line-of-sight interlopers, photometric surveys (with WFIRST or JWST) may be competitive at higher redshifts (z ≈ 10) due to the smaller number of field-galaxy interlopers expected at redshifts close to that of the bubble (Section 4.5).
(v) If large numbers of bubble galaxies are detected, this could in principle allow for combined constraints on the luminosity function of bubble galaxies and the time-integrated, photon-numberweighted mean escape fraction f esc of ionizing photons from these objects into the surrounding IGM (Section 5.6). For instance, if one adopts our baseline assumption -which seems consistent with the current observational picture of the z 7 galaxy populationthat galaxies in ionized IGM bubbles are not significantly different (in terms of star formation history and relative shape of the luminosity function) from field galaxies at the same redshift, it should be possible to derive a lower limit on f esc by adopting a very faint extension of the luminosity function when assessing the galaxy contribution to the ionizing photon budget. However, effects related to galaxy assembly bias and environmental feedback in overdensities could complicate this procedure (Section 5.1). Such effects may manifest themselves as differences between bubble and field galaxies in terms of spectral properties, luminosity function shape, and in the ratio between luminous and dark halo mass. The detection of such signatures, which would shed new light on the properties of galaxies that form in extreme environments of the z 7 Universe, may be uncovered by the bubble galaxy observations themselves (e.g. spectral and dynamical properties uncovered through JWST and ELT spectroscopy; luminosity function constraints through WFIRST or JWST photometry). Early indications (prior to the detection of any ionized IGM bubbles with SKA-1) of such effects at z 7 may also come from studies with ALMA, JWST, and WFIRST of currently known overdensities of z 7 galaxies, since some of these structures are likely located in ionized IGM bubbles that SKA-1 will eventually be able to resolve though 21 cm tomography. The theoretical understanding of such differences between galaxies in overdensities and the field, and how they affect the emergence of ionized IGM bubbles, must ultimately come from numerical simulations geared to this specific problem. s Å) 12 11 10 9 8 z = Figure A1. Demonstration of how three parametric star formation scenarios for z ≥ 7 galaxies translate into different predictions on the evolution of the N ion /L UV ratio. The time axis indicates the age of the Universe from 300 (z ≈ 14, when star formation is assumed to start) to 750 Myr (z ≈ 7). Left: Star formation histories of the three models. The yellow represents a galaxy where the SFR drops by a factor of ≈30 after 100 Myr, the orange line a galaxy where the SFR remains constant for 350 Myr and then rises by a factor of ≈30, and the blue line represents a galaxy where the SFR remains constant throughout 450 Myr. Right: Prediction of how the N ion /L 1500 ratio (cumulative number of ionizing photons produce divided by momentary UV 1500 Å luminosity) evolves for the three star formation scenarios of the left-hand panel. In the constant SFR scenario, the N ion /L 1500 gradually increases, but drops in SFR (yellow line) lead to higher ratios whereas boosts in SFR (orange line) lead to lower ratios. In these models, a constant metallicity of Z = 0.004 is assumed and dust attenuation of L 1500 is neglected.

A P P E N D I X B : B U B B L E G A L A X I E S V E R S U S L I N E -O F -S I G H T I N T E R L O P E R S I N P H OTO M E T R I C S U RV E Y S
In Section 4.5, we use an argument based on luminosity functions to argue that photometric surveys in the direction of ionized IGM bubbles are less likely to pick up line-of-sight interlopers for bubbles at z ≈ 10 compared to z ≈ 7. However, one may also reach the same conclusion by directly comparing the number of high-mass haloes present within simulated ionized bubbles, like the one presented in Section 3, to those in the ambient field along a column spanning (z) ≈ 1. In this case, the outcome will depend on the detailed recipe for relating ionzing photon fluxes to haloes of a given mass. In the simulations discussed in Section 3, bubbles with volume ≈1000 cMpc 3 contain ≈20-50 haloes with virial mass ≥10 10 M h −1 at z ≈ 7-10. The ambient field will on the other hand contain ≈600 such haloes in the (z) ≈ 1 volume along the line of sight at z = 7, yet only ≈10 at z = 10. This lends further support to the notion that the fraction of line-of-sight interlopers may be very high in photometric bubble surveys at z ≈ 7, but much smaller at z ≈ 10. This calculation of the ambient halo density is based on the halo mass function derived from the Bolshoi-Planck and MultiDark-Planck