The reionising bubble size distribution around galaxies

Constraining when and how reionisation began is pivotal for understanding when the first galaxies formed. Lyman-alpha (Ly$\alpha$) emission from galaxies is currently our most promising probe of these early stages. At z>7 the majority of galaxies detected with Ly$\alpha$ are in candidate overdensities. Here we quantify the probability of these galaxies residing in large ionised bubbles. We create (1.6 Gpc)$^3$ reionising intergalactic medium (IGM) simulations, providing sufficient volume to robustly measure bubble size distributions around UV-bright galaxies and rare overdensities. We find $M_{\rm UV} \lesssim -16$ galaxies and overdensities are $\gtrsim$10-1000x more likely to trace ionised bubbles compared to randomly selected positions. The brightest galaxies and strongest overdensities have bubble size distributions with highest characteristic size and least scatter. We compare two models: gradual reionisation driven by numerous UV-faint galaxies versus more rapid reionisation by rarer brighter galaxies, producing larger bubbles at fixed neutral fraction. We demonstrate that recently observed z~7 overdensities are highly likely to trace large ionised bubbles, corroborated by their high Ly$\alpha$ detection rates. However, the z~8.7 association of Ly$\alpha$ emitters in EGS and GN-z11, with Ly$\alpha$ at z=10.6, are unlikely to trace large bubbles in our fiducial model -- 11% and 7% probability of>1 proper Mpc bubbles, respectively. Ly$\alpha$ detections at such high redshifts could be explained by: a less neutral IGM than previously expected; larger ionised regions at fixed neutral fraction; or if intrinsic Ly$\alpha$ flux is unusually strong in these galaxies. We discuss how to test these scenarios with JWST and the prospects for using upcoming wide-area surveys to distinguish between reionisation models.


INTRODUCTION
The reionisation of intergalactic hydrogen in the Universe's first billion years was likely caused by photons emitted from the first galaxies, and is thus intimately linked to their nature (e.g., Stark 2016;Dayal & Ferrara 2018;Mesinger 2019).Constraining the reionisation process thus enables us to infer properties of these first luminous sources, importantly giving us information about the earliest generations of galaxies which are too faint to observe directly, even with JWST.In the past decade, substantial progress has been made in measuring the timing of the late stages of reionisation.The electron scattering optical depth to the CMB indicates reionisation was on-going at  ∼ 7−8 (Planck Collaboration et al. 2020) and the attenuation of Lyman-alpha (Ly, 1216 Å) photons by neutral hydrogen in the intergalactic medium (IGM) in the spectra of  ∼ > 5 quasars and galaxies implies the IGM was almost entirely ionised by  ∼ 5.5 − 6 (McGreer et al. 2015;Lu et al. 2022;Qin et al. 2021;Bosman et al. 2022) but that the IGM was significantly neutral (volume-averaged ★ E-mail: tingyi-lu@nbi.ku.dk neutral fraction  ∼ > 0.7) just ∼ 300 Myr earlier at  ∼ 8 (e.g., Davies et al. 2018b;Hoag et al. 2019;Mason et al. 2019b;Bolan et al. 2022).
While we are beginning to reach a consensus on when the end stages of reionisation occurred, we still do not understand how it happened.Which sources drove it and when did it start?The onset of reionisation provides pivotal information about the onset of star formation.Simulations predict the first reionised regions grow around overdensities (e.g., Furlanetto et al. 2004b;Zahn et al. 2007;Trac & Cen 2007;Mesinger & Furlanetto 2007;Ocvirk et al. 2020;Hutter et al. 2021;Qin et al. 2022), but, while there are strong hints (Castellano et al. 2016;Tilvi et al. 2020;Hu et al. 2021;Endsley & Stark 2022;Jung et al. 2022;Larson et al. 2022), this is yet to be robustly confirmed observationally.Furthermore, the ionising emission properties of high-redshift sources are still highly uncertain, and, with current constraints on the reionisation timeline alone, there is a degeneracy between reionisation driven by numerous low mass galaxies with low ionising emissivity (e.g.ionising photon escape fraction ∼ 5%), and rarer bright galaxies with high ionising emissivity (e.g., Greig & Mesinger 2017;Mason et al. 2019a;Finkelstein et al. 2019;Naidu et al. 2020).However, the clustering strength of the dominant source population has a large impact on the expected size distribution of ionised bubbles (e.g., McQuinn et al. 2007a;Mesinger et al. 2016a;Hassan et al. 2018;Seiler et al. 2019).Thus, identifying and measuring large ionised regions at early times provides vital information about the reionisation process.
Before we will detect the 21-cm power spectrum (e.g., Pober et al. 2014;Morales & Wyithe 2010) the most promising tool to study the early stages of reionisation and the morphology of ionised regions is Ly emission from galaxies, which is strongly attenuated by neutral hydrogen (e.g., Malhotra & Rhoads 2006;Stark et al. 2010;Dĳkstra 2014;Mesinger et al. 2015;Mason et al. 2018a).If reionisation starts in overdensities we expect a strong increase in the clustering of Lyemitting galaxies (LAEs) in the early stages of reionisation (McQuinn et al. 2007b;Sobacchi & Mesinger 2015;Hutter et al. 2015).Strong evidence of enhanced clustering has not yet been detected in widearea Ly narrow-band surveys (e.g., Ouchi et al. 2017), likely because these surveys have mostly observed at  < 7, when the IGM is probably still < 50% neutral (e.g., Mason et al. 2019a;Qin et al. 2021) and thus the clustering signal is expected to be weak (Sobacchi & Mesinger 2015).
However, spectroscopic studies of  > 7 galaxies selected by broad-and narrow-band imaging in smaller fields have yielded tantalizing hints of spatial inhomogeneity in the early stages of reionisation.In particular, an unusual sample of four UV luminous ( ∼ −22) galaxies detected in CANDELS (Koekemoer et al. 2011;Grogin et al. 2011) fields (three of which are in the EGS field) with strong Spitzer/IRAC excesses, implying strong rest-frame optical emission, were confirmed with Ly emission at  ≈ 7.1 , 7.3, 7.7, and 8.7 (Zitrin et al. 2015b;Oesch et al. 2015;Roberts-Borsani et al. 2016;Stark et al. 2017).Furthermore, Ly was recently detected at  = 10.6 in the UV-luminous galaxy GNz11 (Bunker et al. 2023).The high detection rate of Ly in these UV bright galaxies is at odds with expectations from lower redshifts, where UV-faint galaxies are typically more likely to show strong Ly emission (e.g., Stark et al. 2011;Cassata et al. 2015).
This may imply that these galaxies trace overdensities which reionise early, or that they have enhanced Ly emission due to young stellar populations and hard ionising spectra, or, more likely, a combination of these two effects (Stark et al. 2017;Mason et al. 2018b;Endsley et al. 2021;Roberts-Borsani et al. 2022;Tang et al. 2023).Photometric follow-up around some of these galaxies has found evidence they reside in regions that are ∼ > 3× overdense (Leonova et al. 2022;Tacchella et al. 2023).Furthermore, spectroscopic follow-up for Ly in neighbors of these bright sources has proved remarkably successful: to date, of the ∼ 30 galaxies detected with Ly emission at  > 7, 14 of these lie within a few physical Mpc of three UV luminous galaxies detected in the CANDELS/EGS field at  ≈ 7.3, 7.7 and 8.7 (Tilvi et al. 2020;Jung et al. 2022;Larson et al. 2022;Tang et al. 2023).Do these galaxies reside in large ionised regions?Due to the high recombination rate at  ∼ > 10 large ionised regions require sustained star formation over ∼ > 100 Myr (e.g., Shapiro & Giroux 1987), thus detection of large ionised regions at early times would imply significant early star formation.
Assessing the likelihood of detecting Ly emitting galaxies during reionisation requires knowledge of the expected distribution of ionised bubble sizes around the observed galaxies.Previous work has focused on predicting the size distribution of all ionised regions during reionisation, as is required for forecasting the 21-cm power spectrum (e.g., Furlanetto & Oh 2005;Mesinger & Furlanetto 2007;Geil et al. 2016;Lin et al. 2016).However, as galaxies are expected to be biased tracers of the density field (e.g., Adelberger et al. 1998;Overzier et al. 2006;Barone-Nugent et al. 2014), these ionised bubble size distributions likely underestimate the expected ionised bubble sizes around observable galaxies.The 21-cm galaxy cross-power spectrum for different halo masses (e.g.Lidz et al. 2009;Park et al. 2014) reflects the typical ionised region size around different halo masses.However, the size distributions of ionised regions were not discussed in previous works.Mesinger & Furlanetto (2008b) show the Ly damping wing optical depth distributions around galaxies of various masses at  ∼ 9, finding the most massive halos have the lowest optical depth with smallest dispersion in optical depth, which corresponds to being hosted by larger bubble sizes with smaller variance in bubbles compared to lower mass halos, though that work did not model the UV magnitude of the halos and only presented optical depths for halos  ℎ < 2 × 10 11  .The correlation between galaxy properties and their host ionised bubbles has been explored in some semi-analytic simulations (Mesinger & Furlanetto 2008b;Geil et al. 2017;Yajima et al. 2018;Qin et al. 2022), finding that more luminous galaxies are likely to reside in large ionised bubbles.However, these studies have been restricted to small volumes, (100 cMpc) 3 , simulations with only a handful of UV-bright galaxies and overdensities, so Poisson noise is large, or models of cosmological Strömgren spheres which do not account for the overlap of bubbles (Yajima et al. 2018).
In this paper we create robust predictions for the size distribution of ionised bubbles around observable ( −16) galaxies.We create large volume (1.6 cGpc) 3 simulations of the reionising IGM using the semi-numerical code 21cmfast (Mesinger et al. 2011).With these simulations, we can robustly measure the expected bubble size distribution around rare overdensities and UV-bright galaxies ( −22 or  halo 10 11  ) to compare with observations.We assess how likely the observed  > 7 associations of Ly emitters are to be in large ionised bubbles, finding that while  ∼ 7 observations are consistent with our current consensus on the reionisation timeline, Ly detections at  > 8 are very unexpected.We further demonstrate how different reionising source models produce very different predictions for the bubble size distribution at any neutral fraction.We discuss the prospect of using upcoming wide-area surveys to distinguish the reionising source models based on our bubble size distribution predictions by observing a large number of overdensities to chart the growth of the first ionised regions.
This paper is structured as follows: we describe our simulations in Section 2, we present our results on the bubble size distributions as a function of galaxy luminosity and overdensity, and compare with observations in Section 3. We discuss our results in Section 4 and conclude in Section 5. We assume a flat ΛCDM cosmology with Ω  = 0.31, Ω Λ = 0.69, ℎ = 0.68 and magnitudes are in the AB system.

METHODS
In the following sections, we describe our simulation setup and analysis framework.In Section 2.1 we describe our reionisation simulations.In Section 2.2 we describe how we populate simulated halos with galaxy properties and in Section 2.3 we describe how we measure the ionised bubble size distribution using the mean free path method and the watershed algorithm.

Reionisation simulations
To study the link between galaxy environment and reionisation, we use the semi-numerical cosmological simulation code, 21cmfast M UV [cMpc] [cMpc] Figure 1.Slices from our simulations at  = 0.2, 0.5, 0.7, 0.9 for Gradual (upper panel) and Rapid (lower panel).White regions show ionised gas and black regions show neutral gas.We show 1.5 cMpc slices in a 300 × 300 cMpc region of our (1.6 cGpc) 3 coeval cubes.We plot galaxies in this slice, color-coded by  , in the leftmost column.Here we only show galaxies with −22 ≤  ≤ −16 for demonstration purposes.
v21 (Mesinger et al. 2011;Sobacchi & Mesinger 2014;Mesinger et al. 2016b).21cmfast first creates a 3D linear density field at high redshift, which is evolved to the redshift of interest using linear theory and the Zel'dovich approximation.The ionisation field (and other reionisation observables such as 21cm brightness temperature) is then generated using an excursion-set theory approach assuming an ionisation-density relation and a given reionisation model.In this way, 21cmfast can quickly simulate reionisation on large scales (> 100 Mpc), with a simple, flexible model for the properties of reionising galaxies.
Here we briefly describe the creation of ionisation boxes before proceeding to our simulation setups, and refer the reader to Mesinger et al. (2011);Mesinger et al. (2016b) for more details.For a density box at redshift, , a cell (at position x) is flagged as ionised if where  coll (x, , ,  min ) is the fraction of a collapsed matter residing in halos larger than a minimum halo mass,  min , inside a sphere of radius , and  rec is the average cumulative number of recombinations. is an ionising efficiency parameter: where   and  esc are the input reionisation parameters, the number of ionising photons per stellar baryon, and the ionising photon escape fraction, respectively. * is the fraction of galactic gas in stars. b is the fraction of baryons inside the galaxy.While we expect a variation of these parameters with halo mass and/or time (see e.g., Wise et al. 2014;Kimm & Cen 2014;Xu et al. 2016;Trebitsch et al. 2017;Lewis et al. 2020;Ma et al. 2020), simply changing  and  min can encompass a broad range of scenarios for reionising source models and thus produce different reionising bubble morphologies (e.g., Mesinger et al. 2016a).High values of  and  min correspond to reionisation dominated by rare, massive galaxies, which require a larger output of ionising photons to produce a reionisation timeline consistent with observations, while low  and  min values correspond to reionisation driven by numerous faint galaxies with weaker ionising emissivity.
In this paper, we simulate large-scale boxes of dark matter halos and the IGM ionisation field in order to produce robust bubble size distributions as a function of galaxy properties with minimal Poisson noise, using two different reionising source models.We produce (1600 cMpc) 3 coeval boxes at  = [7, 8, 9, 10], with a grid size of 1024 pixels, resulting in a resolution of ∼ 1.6 cMpc.We generate a catalogue of dark matter halos from the density fields associated with these boxes using extended Press-Schechter theory (Sheth et al. 2001) and a halo-filtering method (see Mesinger & Furlanetto (2007) for full description of the method) which allows us to generate halos with accurate halo mass function down to M 10 8 .We use identical initial conditions (and thus density field and halo catalogue at each redshift) for all of our models, so in our analysis below we can isolate the impact of the reionisation source model on the bubble size distribution in different galaxy environments.
We create ionisation boxes spanning  = 0.1 − 0.9 (Δ =0.1), using Equation 1, for two reionising source models, similar to the approach of Mesinger et al. (2016a), which span the plausible range expected by early galaxies: (i) Gradual: Reionisation driven by faint, low mass galaxies down to the atomic cooling limit ( min = 5 × 10 8  ,  −11.0).Reionisation driven by numerous faint galaxies leads to a gradual reionisation process, where the IGM can begin to reionise very early.We show in Figure 1 that the ionised regions in this model start slowly and gradually grow and overlap.We use this as our fiducial model.
(ii) Rapid: Reionisation driven by rarer bright galaxies ( min = 10 10  ,  −19.5).As massive galaxies take more time to assemble, reionisation starts later and the morphology is characterised by rarer, larger ionised regions at fixed neutral fractions.
For each model, at each redshift, we vary  so as to compare different reionisation morphologies at the same  .In the end, we create a total of 72 simulations: 4 (redshift) ×2 (reionisation model) × 9 ( ) ionisation boxes, and 4 (redshift) halo catalogues.In addition, in Sec.3.4, to compare our simulations with observations, we expand the  range at the high- end to  =[0.85,0.90,0.95]at  = 9 for the two models.
Example slices of the ionisation field from the two sets of simulations are shown in Figure 1.This clearly shows that the Rapid model has larger, rarer bubbles compared to the Gradual model at fixed  .Underdense regions are more likely to be ionised in the Gradual model.This is because in the Gradual model, faint galaxies, which live in a wider density range, are able to ionise the IGM.While in the Rapid model, only bright, more massive galaxies, which most likely only live in overdensities, can ionise the IGM.
Figure 2 shows potential reionisation timelines of the two reionisation models, for demonstration purposes only.To produce example reionisation histories for our two models we follow the standard procedure (e.g., Robertson 2010) and generate an ionising emissivity from the product of the halo mass density, integrated down to the two mass limits described above, and an ionising efficiency, .We alter  for both models to fix the redshift of the end of the reionisation to  ∼ 6.The Gradual model has an earlier onset of reionisation and slower redshift evolution of  compared to the Rapid model.We note that as we use coeval boxes we do not assume a model reionisation history in this work, rather we will use non-parametric reionisation timeline inferred by Mason et al. (2019b) from independent constraints on the IGM neutral fraction, including the Ly equivalent width distribution (Mason et al. 2018a(Mason et al. , 2019a;;Hoag et al. 2019), Ly emitter clustering (Sobacchi & Mesinger 2015), Ly forest dark pixels fraction (McGreer et al. 2015), and QSO damping wings (Davies et al. 2018a;Greig et al. 2019) and the Planck Collaboration et al. (2020) electron scattering optical depth.

Galaxy population model
To populate halos with realistic galaxy properties we use a conditional UV luminosity to halo mass relation, to assign UV luminosities, with intrinsic scatter, to our halo catalogue.We follow Ren et al. (2019) and assume UV magnitudes at a given halo mass are drawn from a Gaussian distribution with dispersion  and median  , ( h , , ): The dispersion was originally introduced to explain scatter in the Tully-Fisher relation (Yang et al. 2005).It is a free parameter in our model, and following Whitler et al. (2020) we assume  = 0.5 mag.Ren et al. (2019) found that this value is consistent with observed luminosity functions over  ∼ 6 − 10, and this value is also consistent with the expected variance due to halo assembly times (Ren et al. 2018;Mason et al. 2023).Whitler et al. (2020) found that this scatter has only a minor impact on the transmission of Ly from galaxies in the reionising IGM, so we do not expect it to significantly change the relationship between galaxy luminosity and the size of the ionised bubbles they reside in.2015), and QSO damping wings (diamonds, Davies et al. 2018a;Greig et al. 2019) observations.The grey line with shaded region is the reionisation timeline and its 16-84 percentile inferred using the aforementioned observations (Mason et al. 2019a).In the following we will use this grey posterior for  for comparing to observations as a function of redshift, the Rapid and Gradual models are shown just to illustrate how these models differ when the ionisation efficiency is fixed (see Section 2.1).
the UV luminosity function.Ren et al. (2019) showed that above  ℎ ∼ > 10 12  a flattening is required in  , ( h ) to maintain consistency with the observed UV LFs -which can be thought of as a critical mass or luminosity threshold for star formation.Given that our halo catalogue contains only a small number (0.001% of the total catalogue) of > 10 12  halos at  ∼ 7, and far fewer at  > 7 due to the steepness of the halo mass function, we do not consider this flattening.We thus use the  , ( h , ) relations from the Mason et al. (2015) UV luminosity function model as the median UV magnitudes for Equation 3. Our resulting luminosity functions are consistent with  ∼ 7 − 10 observations over the range where observations are currently magnitude complete: −22 ∼ <  ∼ < − 17 (e.g., Bouwens et al. 2021, see Appendix A).

Measuring bubble sizes
We measure the size of ionised regions,  ion , using both the meanfree-path (MFP) method (Mesinger & Furlanetto 2007) and the watershed algorithm (Vincent & Soille 1991), an image segmentation algorithm which was first applied to reionisation simulations by Lin et al. (2016).Lin et al. (2016) tested a range of approaches for estimating the sizes of ionised bubbles in simulations and determined these two methods were optimal compared to other techniques in the literature because they most accurately recover input ionised bubble size distributions, can account for overlapping bubbles, and produce sizes corresponding to a physically intuitive quantity.Other commonly used approaches for modelling the bubble size distribution, i.e. the excursion set formulation (Furlanetto et al. 2004b;Furlanetto & Oh 2005) or approaches which grow cosmological Stromgren spheres around halos (e.g., Yajima et al. 2018) will underestimate the largest bubble sizes because these approaches do not include the effect of overlapping bubbles.
Here we describe these two methods, and their advantages and limitations.We will discuss how our resulting bubble size distributions compare to works using other methods in Section 4.1.

Mean free path (MFP)
This method was first used to measure ionised bubble sizes by Mesinger & Furlanetto (2007).It is essentially a Monte-Carlo raytracing algorithm, which enables us to measure a probability distribution for ionised bubble sizes by estimating the distance photons travel before they encounter neutral gas.We randomly choose a starting position (or the position of a galaxy, as described later), if the cell is fully ionised, we measure the distance from that position to where we encounter the first neutral or partially ionised cell at a random direction.Given our simulation resolution, the smallest bubble size we can measure is ∼ 1 cMpc.If the position is neutral, we set  ion = 0 cMpc.We measure bubble sizes over the full simulation volume by sampling the distance to neutral gas from 10 5 random positions and sightlines to build bubble size distributions for our simulations.
In Section 3.1 we will show the bubble size distribution as a function of galaxy  , to estimate the sizes of ionised bubbles around observable galaxies.For this, we use the mean free path method as defined above, but start our measurements at the position of each galaxy in the simulation box.
We also will measure the bubble size distribution as a function of galaxy overdensity to compare with current observations.Galaxy overdensity depends on the dark matter density of the underlying field (e.g.Cole & Kaiser 1989;Mo et al. 1997;Sheth et al. 2001):  = (1 + ), where  is the number density of galaxies observed in a field,  is the mean cosmic number density,  is the bias, and  is the dark matter density in the field.Since 21cmfast populates halos and calculates  HI based on galaxy number density via the excursion set formulation (Furlanetto et al. 2004b), we expect a strong relation between bubble size and galaxy overdensity (e.g., Mesinger & Furlanetto 2007).
We define the observed overdensity, /  , as the number, , of galaxies brighter than a given limit in a survey volume relative to the number expected in that volume based on the average in the whole simulation box,  .To measure overdensity using our galaxy catalogue, described in Section 2.2, for a mock survey, we discard galaxies with  >  UV,lim , where  UV,lim is the UV magnitude limit in an observed overdensity.While the galaxy catalogue and  boxes are generated from the same density field, as described in Section 2.1, galaxies are given sub-grid positions, thus to compare the overdensity and  fields we convert the resulting galaxy catalogue into a galaxy number count grid of the same size as the  grids.Then we convolve the galaxy number count grid with a 3-D kernel of the survey volume and divide the value of each cell by  , the mean number count per cMpc 3 in the halo box, to obtain the overdensity in each cell.
Cells in the resulting overdensity box correspond to positions with a given overdensity above the magnitude limit within the volume.We can then carry out an analogous procedure to that described above using the mean free path algorithm to find the bubble size distribution as a function of overdensity using the mean free path method, by starting in positions of a given overdensity.

Watershed algorithm
This method was first used to measure ionised bubble sizes in reionisation simulations by Lin et al. (2016).It is an image segmentation algorithm which treats constant values of a scalar field as contour lines corresponding to depth in a tomographic map, which it then "floods" to break up the images into separate water basins (Vincent & Soille 1991).
We use the implementation of the watershed algorithm in skikit-image (van der Walt et al. 2014).We apply the algorithm to 3D binary  cubes.We first apply the 'distance transform' to calculate the Euclidean distance,   of every point to the nearest neutral region (if the point is neutral then the distance is zero).We invert the distances to 'depths':   → −  .Centres of bubbles are then local minima in the depth cube and the bubble boundaries are identified by flooding regions starting from the local minima, and marking where regions meet -these are contours of constant depth   .
As with any image segmentation algorithm, the identification of local minima will lead to over-segmentation, as every local minimum will be marked as a unique bubble, even if it is overlapping with a larger one, thus a threshold must be used to avoid this.We follow the prescription of Lin et al. (2016) and use the 'H-minima transform' to essentially 'fill in' small basins.We identify basins with a relative depth of ℎ from the local minimum to the bubble boundary and set   →   + ℎ for these regions, reducing the depth of the local minima.After the H-minima transform, we can again identify bubbles as above and see that large bubbles are correctly identified.This process may remove small isolated bubbles which had depth < ℎ.These can be added back in manually using the initial segmentation cube.
The H-minima threshold ℎ is a free parameter, we use ℎ = 2.5, which is fixed so that the resulting bubble size distribution is comparable to that obtained with the MFP method above and bubbles do not suffer too much from over-segmentation.We solve for the value of ℎ by minimising the Kullback-Leibler divergence (KL divergence;Kullback 1968) between the watershed bubble size distribution and MFP bubble distribution in a (500 cMpc) 3 sub-volume of our simulation.We obtain a cube with the cells corresponding to unique bubbles labelled.From this we can calculate the volume of each bubble and calculate the size as the radius of each bubble assuming they are spherical:  = (3/4) 1/3 The watershed algorithm is a more computationally intensive method than the MFP method, and requires some tuning of the ℎ threshold, so we predominantly use the MFP approach.However, the watershed algorithm has a significant advantage in that it can measure the absolute number of bubbles in a volume.It is also possible to use it to directly connect galaxies and their host bubbles.We will use it in Section 3.5 to make forecasts for the number of large bubbles expected in upcoming wide-area surveys.

RESULTS
Previous works have focused on simulating the global bubble size distribution, in order to produce predictions for 21-cm experiments (e.g.Furlanetto & Oh 2005;Mesinger & Furlanetto 2007;Geil et al. 2016;Lin et al. 2016).Some 21cm-galaxy cross correlation studies (e.g.Lidz et al. 2009;Park et al. 2014) calculate the correlation scales for various halo masses but do not directly calculate the bubble size distribution.Here we focus on the expected bubble size distribution around observable galaxies, which are likely to be more biased density tracers, and thus we expect are likely to trace the largest bubbles.
In Section 3.1 we present the bubble size distribution as a function of galaxy UV luminosity, and in Section 3.2 we show the bubble size distribution as a function of galaxy overdensity.The impact of different reionising source models on the bubble size distribution is discussed in Section 3.3.We demonstrate in Appendix B that our results do not significantly depend on redshift.In Section 3.4 we use our simulations to interpret recent observations of Ly emission in overdensities at  ∼ > 7, and we make predictions for upcoming widearea observations in Section 3.5.

Bubble size distribution as a function of UV luminosity
To first order, UV luminosity traces dark matter halo mass and thus density (e.g., Cooray & Milosavljevic 2005;Tempel et al. 2009;Mason et al. 2015).We thus expect the brightest galaxies to reside in the most massive halos in overdense regions, and therefore these galaxies are likely to sit in large bubbles which reionised early.
We quantify this in Figure 3, where we show the size distribution of ionised bubbles around galaxies of a given UV luminosity as a function of the volume-averaged IGM neutral fraction,  , in our simulations, compared to the bubble size distribution in the full volume.This is essentially the distribution at the mean density,  = 0. We measure the distribution of bubble sizes in 4  bins:  = −16, −18, −20, −22, with Δ = 0.1.We show our fiducial Gradual simulation but will compare it to the Rapid simulation in Section 3.3.
In contrast to previous literature we also include the fraction of galaxies (or randomly selected pixels for our full volume bubble size distribution) which are in neutral regions in our simulation.We mark these fractions with arrows in Figure 3.These sources may reside in ionised bubbles below our resolution limit (∼ 1 cMpc for bubble radius).Including these occurrences in our bubble size distribution leads to important insights about the environments of galaxies as we discuss below.We note that the 'full volume' bubble size distribution excluding neutral cells and those below our resolution limit is equivalent to the bubble size distributions presented in previous literature (e.g., Furlanetto & Oh 2005;Mesinger & Furlanetto 2007).
Figure 3 shows that as  decreases, the bubble size distributions shift to higher values, which is expected as ionised regions grow.Compared to the bubble size distribution in the full volume, we see three important features of the bubble size distributions which we describe below.
First, while the bubble size distribution in the full volume has a high fraction of bubbles with  1 cMpc, observable galaxies ( −16) are > 10 − 1000× more likely to be in bubbles rather than neutral regions.This is because galaxies are biased tracers of the density field and therefore trace ionised regions more closely.At the end stages of reionisation,  ∼ < 0.5, we find only ∼ < 10% of observable galaxies are in small ionised or neutral regions below our resolution limit.This is consistent with the idea of the 'post-overlap' phase of reionisation (e.g., Miralda-Escudé et al. 2000), where the majority of galaxies lie within ionised regions and only voids remain to be ionised.
We see a strong trend with UV luminosity, where the brightest galaxies are always least likely to be in small ionised or neutral regions, while UV-faint galaxies have a more bimodal bubble size distribution.The proportion of UV-faint galaxies in small ionised or neutral regions is high early in reionisation: but declines rapidly from ∼ 60% at  ∼ 0.9 to ∼ 10% at  ∼ 0.5 for  ∼ > − 18 galaxies.This is driven by the clustering properties of the UV-faint galaxies as we discuss below.This may explain the low detection rate of Ly in UV-faint galaxies at  ∼ > 8 (Hoag et al. 2019;Mason et al. 2019a;Morishita et al. 2022) compared to the higher detection rate in UV bright galaxies seen by Jung et al. (2022) at the same redshift.
Second, we see that the bubble size distribution around observable galaxies peaks at a similar size for all  bins, which indicates that, on average, these galaxies are in the same bubbles.This peak, corresponding to the mean size of ionised regions, has been described as a 'characteristic' scale,  char (e.g., Furlanetto & Oh 2005).In the following we refer to the mean size of ionised regions as the characteristic size.We see that the characteristic scale of ionised regions increases by over two orders of magnitude during reionisation2 .However, we do find an increasing characteristic scale as a function of UV luminosity: galaxies brighter than  ∼ < − 20 are expected to reside in bubbles ∼ 1.5 − 2× larger than the characteristic bubble scale in the full volume.
Finally, we see that the width of the bubble size distribution decreases as galaxy UV luminosity increases.This is due to the clustering of galaxies: UV bright galaxies are more likely to be in overdense regions which will reionise early, whereas UV faint galaxies can be both 'satellites' in overdense, large ionised regions, or 'field galaxies' in less dense regions which remain neutral for longer (see also Hutter et al. 2017Hutter et al. , 2021;;Qin et al. 2022) This figure demonstrates that UV-faint galaxies will have very significant sightline variance in their Ly optical depth, and highlights the importance of using realistic bubble size distributions for inference of the IGM neutral fraction (see also, e.g.Mesinger & Furlanetto 2008a;Mason et al. 2018b).

Bubble size distribution as a function of galaxy overdensity
In this section, we investigate the distribution of bubble sizes as a function of galaxy overdensity, /  .This distribution should directly reflect how structure formation affects reionisation.
As described in Section 2.3 an observed galaxy overdensity, /  , depends on the survey depth and volume.For our investigation here we explore expected overdensities within a medium-deep JWST observation within 1 NIRISS pointing (or 1/2 of the NIRCam field-of-view), aiming to simulate observations similar to those obtained by the JWST/NIRISS pure-parallel PASSAGE survey (Malkan et al. 2021).We thus use a survey limiting depth of  AB = 28 and area 4.84 sq.arcmin with a redshift window of Δ = 0.2.This corresponds to [ ,lim ,  survey ]= [-19, 2014 cMpc 3 ] at  = 8 ± 0.1.We follow the procedure described in Section 2.3 to create a cube of /  using these survey parameters, and then select 200,000 cells3 to measure the bubble size distribution as a function of overdensity.
Figure 4 shows the bubble size distributions for /  ≈ 5, /  ≈ 10, and /  15, along with the bubble size distribution in the full volume as a function of  , for the Gradual model.As in Section 3.1 we see the clear trend that the bubble size distributions increase to higher values as the universe reionises, but we can now identify where the reionisation process begins.We can see that the most overdense regions reionise first and inhabit the largest ionised We also show the bubble size distribution from the full volume as a thick grey line in each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different volume-averaged IGM neutral fraction,  .As the neutral fraction decreases, the bubble size distributions shift to higher values, as expected as bubbles grow as reionisation progresses.With increasing UV luminosity, the probability that a galaxy resides in big bubbles increases.
bubbles.As in Section 3.1, we investigate three clear trends in the bubble size distribution as a function of galaxy overdensity.
First, overdense regions start and finish carving out ionised bubbles earlier compared to regions at the mean density.We see a much larger proportion of overdense regions already in  ion > 1 cMpc bubbles early in reionisation.We find at  = 0.9, when only 10% of the total IGM volume is ionised, ∼ > 30% of the /  ≥ 10 regions are already in  ion > 1 cMpc bubbles.By  = 0.5, all of the /  ≥ 10 regions are in  ion > 1 cMpc bubble.This demonstrates that early in reionisation, we expect only the strongest overdensities to trace large ionised regions.
Second, ionised bubbles around overdense regions are larger than the characteristic bubble size in the full volume, particularly in the early stages of reionisation.At  = 0.8, the characteristic bubble size around /  ≥ 10 regions is  ion ∼ 10 cMpc, which is ∼ 2× larger than the mean bubble size in the full volume at that time, and large enough for significant Ly transmission (e.g., Miralda-Escude 1998; Mason & Gronke 2020;Qin et al. 2022).Detection of Ly in a highly neutral universe is thus not unexpected if the LAEs are in highly overdense regions.
The mean bubble size of overdense regions grow more slowly than that of less overdense regions.In the early stage of reionisation, bubbles around the most overdense regions grow in isolation and do not merge with similarly sized bubbles because most overdense regions are far away from each other.By contrast, bubbles created by less overdense regions are more likely to grow rapidly by merging with other bubbles.
Finally, again we see the bubble size distributions are broad, but that the strong overdensities have the narrowest distribution of bubble sizes because they are guaranteed to trace ionised environments, whereas less dense regions can be isolated, and therefore in smaller bubbles, or contained within large scale overdensities in large bubbles.

Bubble size distribution as a function of reionising source model
In the previous sections we have shown the bubble size distribution using only our fiducial Gradual faint-galaxies driven, reionisation model.Here we demonstrate how the bubble size distribution changes if instead reionisation is driven by rarer, brighter galaxies in our Rapid model.We show the bubble size distributions for the two models as a function of the IGM neutral fraction in Figures 5 and 6 for galaxies of given  and galaxy overdensities.
Both models have qualitatively similar bubble size distributions but the Rapid model predicts much large bubble sizes at fixed neutral fraction, particularly at the earliest stages of reionisation.A key prediction of the Rapid model is the existence of large (∼ 30 − 100 cMpc) bubbles at the earliest stages of reionisation,  ∼ 0.9, in order to fill the same volume with ionised hydrogen around the more biased ionising sources.
First, galaxies in the Gradual model are more likely to reside in neutral IGM at the beginning of reionisation, compared to galaxies in the Rapid model.At  = 0.9, ∼ 80% of the  = −16 galaxies have bubble sizes no greater than 1 cMpc in the Gradual model.By contrast, in the Rapid model, only ∼ 60% of  = −16 galaxies are  1 in a 4.84 arcmin 2 area (∼ (13cMpc) 3 ) with a survey limit of  A = 28, for  /  ≈ 5, 10 and ≥ 15, where  = 0.84.We also show the bubble size distribution from the full volume as a thick grey line in each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different  .More overdense regions host larger  ion early at high  .As  decreases,  ion of less overdense regions begins to catch up and ends up having similar bubble size distribution to those of the most overdense regions.This agrees with the general reionisation picture, that overdense regions are reionised first.
in such neutral regions at the same  .At the mid-point of reionisation ( = 0.5), UV-faint galaxies ( = −16) in the Gradual model (∼ 9%) are half as likely to be in small ionised/neutral regions compared to UV-faint galaxies in the Rapid model (∼ 20%).This is because the early ionised regions in the Rapid model are concentrated around the most overdense regions, compared to a more uniform coverage of bubbles seen in the Gradual model (see Figure 1).In the Rapid model isolated faint galaxies cannot create  ion > 1 cMpc bubbles around themselves, because reionisation is dominated by  ∼ < 19.5 galaxies in this model.Therefore isolated faint galaxies remain in  ion < 1 cMpc bubbles even at the mid-point of the reionisation.
Second, galaxies in the Rapid model blow out big ionised bubbles early in the reionisation.However, the bubble sizes do not grow as rapidly as those in the Gradual model.At the beginning of reionisation, the characteristic bubble size in the Rapid model is  char ≈ 10 cMpc.In the Gradual model, the characteristic bubble size ∼ 3× smaller: no more than 3 cMpc.By the late stages of reionisation ( = 0.1), the mean bubble sizes in the Rapid model are ∼ 300 cMpc.However, in the Gradual model, the mean bubble size has grown twice as rapidly, reaching ∼ 200 cMpc.The different evolutionary trends reflect the different bubble-merging histories of the two models.In the Gradual scenario, many faint galaxies create small ionised bubbles and soon merge together to form big bubbles.In the Rapid model, big bubbles form early, however, but due to the rarity of bright ionising galaxies, bubbles are less likely to merge and immediately double in size compared to those in the Gradual model.
We can see from this comparison that there can be a degeneracy between the Gradual and Rapid model.If we find evidence of a large (> 10 cMpc) bubble at high redshift, it could be explained by a bright-galaxies-driven reionisation at a high neutral fraction, or by the faint-galaxies-driven reionisation but with a lower neutral fraction.However, independent information on the reionisation history and/or information from the dispersion of bubble sizes along multiple sightlines could break this degeneracy.We discuss this in Section 4.2.

Interpretation of current observations
In this section, we use our simulations to interpret some recent observations of Ly emission at  ∼ > 7 in candidate overdensities.Here we aim to establish if the enhanced Ly visibility in these regions can be explained by the sources tracing an ionised overdensity, and how likely that scenario is given our consensus timeline of reionisation and either of our two reionisation models.
We focus on observations of  ∼ > 7 Ly emission from galaxies in 6 regions in the sky, in candidate overdensities: the COSMOS field at  ≈ 6.8 (Endsley & Stark 2022), BDF field at  ≈ 7.0 (Vanzella et al. 2011;Castellano et al. 2016Castellano et al. , 2018;;Castellano et al. 2022), EGS field with two regions at  ≈ 7.7 and 8.7 (Oesch et al. 2015;Roberts-Borsani et al. 2016;Tilvi et al. 2020;Leonova et al. 2022  x HI = 0.7 10 0 10 1 10 2 10 3 10 4 10 3 10 2 10 1 10 0 x HI = 0.9 x HI = 0.9 Bubble size, R [cMpc] Bubble size distribution, dP/dlogR Figure 5. Bubble size distributions as a function of UV luminosity for  = −16, −22, for the Gradual (solid lines) and Rapid (dashed lines) reionisation models.We also show the bubble size distribution in the full volume as a thick grey line for each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different volume-averaged IGM neutral fraction.We see that the bubble size distributions are broader for the Rapid models than for the Gradual model at  ∼ > 0.5.The bubble size distributions of the Rapid model peak at  ion 10cMpc since as early as  = 0.9.By contrast the distributions of Gradual models start with  ion 6 cMpc at  = 0.9, and gradually evolve to converge with the Rapid models as IGM becomes more ionised.x HI = 0.7 10 0 10 1 10 2 10 3 10 4 10 3 10 2 10 1 10 0 x HI = 0.9 x HI = 0.9 m AB < 28, 4.84 arcmin 2 , z = 8 ± 0.1 Bubble size, R [cMpc] Bubble size distribution, dP/dlogR Figure 6.Bubble size distributions as a function of overdensity for  /  = 5, > 15, for the Gradual (solid lines) and Rapid (dashed lines) reionising source models.We also show the total bubble size distribution as a thick grey line in each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different volume-averaged IGM neutral fraction.In the Rapid model we see that bubble size distribution of  /  > 7 already shows little bimodality at  = 0.9.Galaxies in  /  > 15 regions are mostly in bubbles of  ion > 7.In contrast, in Gradual model even galaxies in  /  > 5 regions are in bubbles of  ion < 7.  > 7 associations of Ly emitters in our Gradual and Rapid simulations (black lines).Grey lines show the range of probabilities in the full simulation volume.We use the IGM neutral fractions expected at these redshifts (Mason et al. 2019b).The plot is discussed in Section 3.4 and a summary of our simulation setup is given in Table 1.The bubble size distributions for all these fields is shown in Figure C1.It is highly likely for the observed  ∼ 7 Ly emitting galaxies to reside in large ionised bubbles.At  8, large bubbles are unexpected even in overdensities.area around the  = 10.6 galaxy GNz11 (Oesch et al. 2016;Bunker et al. 2023;Tacchella et al. 2023).
To compare with observations at known redshifts, we will switch from comoving to proper distance units.Due to the incompleteness of the observations, here we aim to create bubble size distributions for regions in our simulations that are approximately similar to those observed.Our simulations are coeval boxes at  = 7, 8, 9, 10, so in the following we use the box closest in redshift to the observations, but use the observed redshift to fix assumed IGM neutral fractions and to calculate physical distances.We demonstrate in Appendix B that our bubble size distributions do not depend significantly on redshift.
We create mock observations assuming the same area as the observed overdensities.For the regions which only have photometric overdensities, due to the large redshift uncertainties in the photometric overdensities, but motivated by the Δ ∼ < 0.2 redshift separation of Ly emitters in all of these regions, we assume a redshift window of Δ = 0.2 for our mock observations.This corresponds to ∼ 5 − 8 pMpc at  ∼ 7 − 10.In all cases we will assume the observed overdensity of Lyman-break galaxies to be the same in that smaller volume as in the true observed volume.This means we are likely overestimating the true overdensity, in that case our estimated probabilities can be seen as upper limits.
Using these assumed volumes we then use the method described in Section 2.3.1 to convolve our galaxy field with the volume and depth kernel of the observations to create a cube of overdensity matched to each observation setup.We then select regions in the overdensity cube which match the observed overdensity estimates.
We assess the probability of the observed overdensities lying in ionised regions > 1 pMpc in radius, which would allow ∼ > 30% of Ly flux to be transmitted at the rest frame Ly line centre (up to ∼ 50% transmission for emission 500 km s −1 redward of linecentre, e.g., Mason & Gronke 2020;Qin et al. 2022).While the true Ly detection rate will depend on the flux limit of the survey and the Ly flux emitted by the galaxies, before attenuation in the IGM, this threshold gives us a qualitative approach with which to interpret the observations.We defer full forward-modelling of Ly observations to a future work.At the redshift of each observation we assume a non-parametric estimate of the IGM neutral fraction,  , inferred by Mason et al. (2019b) described in Section 2.1.We then calculate the final bubble size distribution by marginalising the bubble size distribution at each  over the inferred  distribution.We also measure the characteristic bubble sizes,  char , for the observed overdensities, which indicates the mean size of ionised regions above our resolution limit around the overdensities.
We present a summary of our simulation setups to compare to these observations in Table 1, the resulting probability of each region residing in a large ionised region in Figure 7, and  char .The full bubble size distributions are described in Appendix C.

𝑧 ∼ 7 overdensities in COSMOS and BDF fields
In the COSMOS field, Endsley & Stark (2022) detected Ly in 9/10  −20.4, ≈ 6.8 galaxies in a 140 pMpc 3 volume.Using these spectroscopic confirmations, they estimate the lower limit of the over-density of this region is 3.They estimate that individual galaxies in this field can create ionised bubbles  ion ∼ 0.69 − 1.13 pMpc.Taking into account the /  ∼ 3 overdensity and the ionising contribution from  < −17 galaxies, they estimate an ionised bubble radius of  ion ∼ 3 pMpc in this volume.
We predict that almost 100% of regions this overdense at  ≈ 0.5 are in > 1 pMpc bubbles and the characteristic bubble size is  char = 6.4 pMpc at  ≈ 0.4.The high LAE fraction detected by Endsley & Stark (2022) is thus consistent with being a typical ionised region in our Gradual model.In the Rapid model, we predict even larger bubble sizes around this overdensity: the characteristic bubble size is  char = 11.2 pMpc, thus high Ly transmission would also be expected.In both cases we would expect an excess of Ly detections in neighbouring UV-faint galaxies.
In the BDF field, Vanzella et al. ( 2011) and Castellano et al. ( 2018) detected 3  ∼ 7.0 LAEs, with  = [−21.1, −20.4, −20.4].Two of the galaxies have a projected separation of only 91.3 pkpc, and the third is 1.9 pMpc away (Castellano et al. 2018).The photometric overdensity of  ∼ 7 Lyman-break galaxies within ∼ 3.86 arcmin 2 around these galaxies is 3−4× times higher than expected (Castellano et al. 2016).Based on the star formation rate and age of the galaxies, and assuming a uniform IGM  = 0.5 surrounding the sources, Castellano et al. (2018) estimated individual bubble sizes of the two galaxies at ∼ 2 pMpc separation to be  ion < 0.8 pMpc.We use a 56 arcmin 2 survey area (corresponding to the BDF field, where the sources have an angular separation of 6 arcmin, Vanzella et al. 2011) at  = 7 ± 0.1, which is > 3× overdense (Castellano et al. 2016).We see in Figure 7 that we expect nearly all regions (∼ 84% in our fiducial Gradual model) with this galaxy overdensity to be inside > 1 pMpc ionised bubbles, enabling significant Ly escape.
The non-detection of Ly with equivalent width > 25 Å in twelve surrounding UV-faint galaxies in this region may thus be surprising, but could be explained by a number of reasons, as discussed by Castellano et al. (2018).For example, even given the predicted most likely bubble size of 4 pMpc the fraction of transmitted Ly flux may be only ∼ 60% for galaxies at the centre of the bubble (Mason & Gronke 2020), thus with deeper spectroscopy Ly may be detected.It could also be possible that infalling neutral gas in this region resonantly scatters Ly photons emitted redward of systemic (which look blue in the rest-frame of the infalling gas, e.g., Santos 2004;Weinberger et al. 2018;Park et al. 2021).As UV-faint galaxies are likely to be low mass, and thus have a lower HI column density in the ISM compared to UV-bright galaxies, they may emit more of their Ly close to systemic redshift, making it more easily susceptible to scattering by infalling gas.Measurements of systemic redshifts for the galaxies in this region may help explain the complex Ly visibility.Furthermore, given the large photometric redshift uncertainties of the Castellano et al. (2016) sample, it could also be possible that the actual galaxy overdensity associated with the three LAEs is smaller, thus the expected bubble size is smaller and the faint galaxies may not lie in the same bubble as the detected LAEs.
We calculate the bubble size distributions using a setup similar to the results of Leonova et al. (2022): an area of 4.5 arcmin 2 at  = 7.7 ± 0.1, with a limiting UV magnitude  > −19.5.The result is shown in Figure 7, assuming the neutral fraction  ( = 7.7) = 0.76 +0.05 −0.09 (Mason et al. 2019b).We find ∼ 40% of regions this overdense are in large ionised bubbles in the Gradual model and 70% of regions in the Rapid model.We conclude this region is likely consistent with our consensus picture of reionisation.
In the Abell 2744 field, Morishita et al. (2022) found no Ly detections of 7  ≈ 7.89,  > −20 galaxies.These galaxies are within a circle of radius ∼60 pkpc.This area is /  ∼ 130 overdense for galaxies with  > −17.5 (Ishigaki et al. 2016).Morishita et al. (2022) estimated bubble sizes of  ion ∼ 0.07 − 0.76 pMpc for individual galaxies, based on their ionising properties derived from rest-frame optical spectroscopy with NIRSpec.We generate the bubble size distributions for a region of > 130× overdensity of  ∼ < − 17.5 galaxies within a volume of (0.9 cMpc) 3 .A bubble size of  ion ∼ 1pMpc or larger is unexpected for regions as overdense as this in our Gradual model at  ∼ 0.8: we find ( > 1pMpc) = 0.27.
The redshifts of sources in the EGS and Abell2744 fields are very similar.However, Ly has only been detected in the EGS field.We can see in Figure 7 and Table 1 that our predicted bubble size distributions for EGS are shifted towards higher bubble sizes than in Abell 2744.Although the Abell 2744 region is overdense in UV-faint galaxies, the volume of this region is very small, thus there may not be sufficient ionising emissivity to produce a large-scale ionised region.Thus non-detection of Ly in this overdensity is not surprising.

𝑧 ∼ 9 − 11 overdensities in EGS and GOODS-N fields
The highest redshift association of LAEs in the EGS field is a pair at  ≈ 8.7 (Zitrin et al. 2015b;Larson et al. 2022), which lies ∼ 4 pMpc apart.The photometric overdensity around these LAEs has been estimated to be /  ∼ 3 − 5 (Leonova et al. 2022).We calculate the bubble size distributions using a setup similar to the results of Leonova et al. (2022): an area of 27 arcmin 2 (corresponding to ∼ 6 HST/WFC3 pointings between the two sources) with Δ = 0.2, with a limiting UV magnitude  > −19.5.
At  = 8.7 the inferred IGM neutral fraction is  = 0.93 +0.02 −0.15 .We predict the probability of finding LAEs at  ≈ 0.9 should be extremely low: in the full simulation volume in our fiducial Gradual model, we obtain ( > 1pMpc) = 0.01 and there is < 0.2% probability of finding a bubble with  ion > 4 pMpc.Around regions as overdense as that observed we find ( > 1pMpc) = 0.11.Thus in our fiducial model, we find it is extremely unlikely that the  ≈ 8.7 LAE pair in the EGS field are in one large ionised region.
The visibility of Ly therefore implies some missing aspect in our understanding of this system.If  is lower, there will be a higher chance to find LAEs: we obtain ( > 1pMpc) = 0.17 in regions this overdense if  = 0.8, so  will need to be substantially lower to find a high probability of large ionised regions.Alternatively, in the Rapid model, ( > 1pMpc) = 0.42 for such an overdensity, and the bubble size distributions at  = 0.8 − 0.6 peak at  ion 3 pMpc: the two LAEs could be in one large ionised bubble.Finally, the Ly visibility of these galaxies could be boosted by high intrinsic Ly production as suggested by their other strong emission lines, and potential contribution of AGN (Stark et al. 2017;Tang et al. 2023;Larson et al. 2023), and facilitated transmission in the IGM if the Ly flux is emitted redward of systemic (e.g., Dĳkstra et al. 2011;Mason et al. 2018b).
Finally, Bunker et al. (2023) have detected Ly in GN-z11 at  = 10.6, in the GOODS-N field (Oesch et al. 2016).9 fainter galaxy candidates ( AB ≈ 29) at similar redshift are found within a (10 cMpc) 2 square centred at GN-z11 (Tacchella et al. 2023).We estimate the overdensity of  AB < 29 ( < −18.6),  = 10 ± 0.1 galaxies in this field using our  = 10 UV LF, finding that this region is ∼ 23× overdense.We obtain ( > 1pMpc) = 0.07 and 0.48 in the Gradual and the Rapid model, respectively.It is thus extremely unlikely that all of the  ∼ 11 galaxies are in one  > 1pMpc ionised region that allows significant Ly transmission, in our fiducial Gradual model.
We find  char = 0.5 and 1.1 pMpc in the Gradual and the Rapid model, respectively.The characteristic bubble size is slightly smaller than the largest distance of galaxies from GN-z11 in this field (∼ 0.6 pMpc) estimated by Tacchella et al. (2023) from photometric redshifts, implying that most of these galaxies could reside in the same (small) ionised region.
In summary, our simulations demonstrate that the regions discussed above at  ∼ 7 are extremely likely (> 90%) to be in large ionised bubbles, given their large estimated overdensities.We also find it likely ( ∼ > 40%) that the EGS region at  ≈ 7.7 is in a large ionised bubble.However, at higher redshifts we find it very unlikely that the  ≈ 8.7 Ly-emitters in EGS and the  ≈ 10.6 galaxies in GOODS-N, including GNz11, are in large ionised regions (∼ 11% and ∼ 7% respectively) in our fiducial Gradual model.
If the actual overdensities of these regions are smaller than the photometrically estimated values, we will find it even more unlikely for these galaxies to reside in large ionised bubbles, strengthening our result.These results clearly demonstrate the importance of measuring the IGM neutral fraction at  ∼ > 8 and distinguishing between reionising source models (e.g., Bruton et al. 2023), and of understanding intrinsic Ly production and escape in the ISM in these galaxies (Roberts-Borsani et al. 2022;Tang et al. 2023).

Forecasts for future observations
Anticipating upcoming large area surveys at  ∼ > 7 we make forecasts for the expected number of large bubbles in the JWST COSMOS-Web survey (Casey et al. 2022), the Euclid Deep survey (Euclid Collaboration et al. 2022;van Mierlo et al. 2022), and the Roman High-Latitude Survey (Wang et al. 2022).These surveys will detect tens of thousands of UV-bright  > 7 galaxy candidates which could be used to constrain the underlying density field and pinpoint early nodes of reionisation.
The identification of ionised regions in these large surveys has the potential to distinguish between reionisation models.We sample simulated volumes equivalent to the survey areas (0.6 sq.degrees for COSMOS-Web, 53 sq.degrees for Euclid-Deep) at  = 8 ± 0.1 and use the watershed algorithm (Section 2.3.2) to identify individual bubbles in these volumes.As we describe below the bubble size distribution in a Euclid Deep-like survey volume will suffer minimal cosmic variance, thus our Euclid forecast can be rescaled to forecast for the the Roman High-Latitude Survey (2000 sq.deg).We show in Appendix B that the expected bubble sizes, in comoving units, do not depend strongly on redshift, so our results can be easily shifted to other redshifts without expecting significant differences.As discussed above, to reduce over-segmentation we use the H-minima threshold when calculating the bubble sizes using the watershed algorithm, this sets an effective resolution of 3 cMpc.
In Figure 8 we plot our predicted 'bubble size function' down to this resolution limit: the number density of ionised bubbles as a function of bubble size, for our Gradual and Rapid model at  = [0.5, 0.7, 0.9].The number of bubbles with  ∼ > 10 cMpc can be considered a proxy for a cluster of Ly-emitting galaxies, as ∼ 30 − 50% of Ly flux should be transmitted through regions this large (Mason & Gronke 2020).As the neutral fraction decreases, as above, we expect to find an increasing number of large ionised regions, and the number of small ionised regions decreases as bubbles overlap.Figure 8 again shows the clear difference in the predicted number and size of ionised regions for the different reionisation models, as discussed in Section 3.3.We mark the survey volume of COSMOS-Web and Euclid Deep, (120cMpc) 3 and (530cMpc) 3 at  = 8, respectively, as horizontal lines.The survey volume of the Roman High-Latitude survey (not shown) is (1816cMpc) 3 .We note that when  ∼ < 0.7 we expect a significant fraction of bubbles with  ∼ > 50 cMpc (see Figures 3 and 4).COSMOS-Web is thus unlikely to capture the full extent of large ionised bubbles.Kaur et al. (2020) demonstrated that a simulated volume of >(250 cMpc) 3 is required for convergence of the 21-cm power spectrum during reionisation, so it is likely that a similar volume must be observed to be able to robustly measure the bubble size distribution, thus we expect the Euclid Deep and Roman High Latitude surveys can robustly sample the full bubble size distribution.
However, to detect  > 10 cMpc ionised bubbles, requires not only a large survey volume, but sufficient survey depth to detect the high redshift UV-bright galaxies which signpost large ionised regions.Only the Roman Space Telescope (RST) (Akeson et al. 2019) is likely to be able to carry out bubble counting.Zackrisson et al. (2020) study the number of galaxies within a V ion = 1000 cMpc 3 bubble that can be detected with upcoming photometric surveys with instruments such as Euclid, JWST, and RST.They found that the Euclid Deep survey can barely detect one  ≈ −21 galaxy in that volume at  > 7 given its detection limit, meaning that identifying large overdensities will be challenging.By contrast, a ≈ 20 deg 2 deep field observation by RST could detect ∼ 10  ∼ < − 18.5 galaxies at  = 7 − 10 in a V ion = 1000 cMpc 3 volume.The wide survey area (≈ (400 cMpc) 3 at  ∼ 8 ± 0.2) and survey depth of a RST deep field observation will allow us to identify the UV-bright galaxies which trace large ionised regions.Deeper imaging or slitless spectroscopy around the UV-bright sources, for example with JWST to confirm overdensities, followed by Ly spectroscopy of these regions would enable estimates of the number density of ionised bubbles.
These results demonstrate that counting the number of overdensities of LAEs in a volume can be a useful estimate of the bubble size distribution, as they will probe ionised regions ∼ > 1 pMpc, and thus  (especially at  > 0.5).For example, in our fiducial Gradual model we expect no  > 10 cMpc bubbles in the COSMOS-Web volume when  = 0.9.This implies detection of clusters of LAEs in this volume at a given redshift would indicate  < 0.9 (or a reionisation morphology similar to our Rapid model).We expect tens of large bubbles in this volume when  < 0.7.However, multiple sightline observations, for example, a counts-in-cells approach can be a more efficient tool to recover the distribution than a single area survey (e.g., Mesinger & Furlanetto 2008b).
However, bubble size functions in a COSMOS-Web-like survey volume have high cosmic variance, making it challenging to measure  precisely.In Figure 9 we plot the median number of bubbles that can be observed by a COSMOS-Web-like survey using 50 realisations along with the 16-84 percentile number counts for our Gradual model.The variance is large enough to make the bubble size functions at  = 0.5 − 0.7 indistinguishable.We do not plot the variance for the Rapid model for clarity, but when taking that into x HI = 0.5 x HI = 0.7 x HI = 0.9 Gradual Rapid  x HI = 0.5 x HI = 0.7 x HI = 0.9 Gradual Rapid

DISCUSSION
In the following section we compare our results to those obtained from other simulations (Section 4.1) and discuss the implications of our results for the reionisation history and identifying the primary sources of reionisation (Section 4.2).

Comparison to other simulations
In this work, we characterise the bubble size distributions around typically observed reionisation-era galaxies for the first time over the full timeline of reionisation.Previously, only the total bubble size distribution has been modelled as a function of the neutral fraction (e.g.Furlanetto & Oh 2005;Mesinger & Furlanetto 2007;McQuinn et al. 2007a;Seiler et al. 2019).
In principle, the full bubble size distribution measured in this work should agree with previous works of similar reionisation setups.However, as seen in Figure 10 our mean bubble size is significantly larger than the characteristic bubble size modelled by Furlanetto & Oh (2005).Our bigger size comes from our use of the mean-free-path (MFP) method which is capable of taking into account the size of overlapped bubbles.The Furlanetto & Oh (2005) model underestimates the typical bubble size because they calculate bubbles via the excursion set formalism: as found by Lin et al. (2016), this method can underestimate bubble sizes by an order of magnitude.Our mean bubble size is comparable to those in works which use the mean free path approximation (e.g., Mesinger & Furlanetto 2007;Seiler et al. 2019), modulo minor differences due to different assumptions for the ionising source population, as expected looking at the difference between our Gradual and Rapid models.
Other works have explored the correlation between ionised bub-ble size and galaxy luminosity.For example, Geil et al. (2017) and Qin et al. (2022) presented results from the DRAGONS simulation (Poole et al. 2016), finding more luminous galaxies are more likely to reside in large ionised bubbles, and that UV-faint galaxies have a large scatter in their host bubble size, consistent with our results in Section 3.1.However, these works only investigated a single redshift, IGM neutral fraction and reionising source model.Furthermore, the DRAGONS simulation is only (100 cMpc) 3 , meaning that it does not contain large numbers of rare overdensities and UV-bright galaxies (it contains only 2 galaxies as bright as GNz11, Mutch et al. 2016) and thus their predicted bubble sizes around UV-bright galaxies were subject to substantial Poisson noise.Yajima et al. (2018) presented a model for the sizes of ionised bubbles around galaxies by modelling cosmological Stromgren spheres around each galaxy (e.g., Shapiro & Giroux 1987;Cen & Haiman 2000), finding more massive and highly star-forming galaxies (and therefore more luminous) lie in larger ionised bubbles than low mass galaxies.However, this model does not take into account the overlapping of ionised regions, which can happen very early during reionisation (e.g., Lin et al. 2016) and thus their bubble sizes will be underestimated.
Our results in Section 3.1 highlight the importance of considering the expected ionised bubble size as a function of  calculated using the MFP method.Previous works which used the characteristic bubble size predicted by Furlanetto & Oh (2005) will thus be underestimating the size of ionised bubbles around observed galaxies at fixed neutral fraction.Jung et al. (2020) estimated the ionised bubble size required to explain the drop in Ly transmission in the GOODS-N field at  ∼ 7.6, and compared this bubble size to the Furlanetto & Oh (2005) characteristic bubble size as a function of neutral fraction to estimate  ∼ 0.49 ± 0.19.Our results in Figure 10 imply that this approach will lead to an underestimate in  .This likely explains the discrepancy between the neutral fraction estimated by Jung et al. (2020) and that inferred by Bolan et al. (2022) ( = 0.83 +0.08 −0.11 ) at a similar redshift, which was obtained by sampling sightlines in inhomogeneous IGM simulations.

Implications for the reionisation history and identification of primary ionising sources
Our results demonstrate that the visibility of Ly emission at  > 8 is unexpected given our consensus timeline for reionisation.The visibility of Ly therefore implies some missing aspect in our understanding of reionisation.As discussed in Section 3.4 there are three possibilities: (1)  is lower than previously inferred; (2) reionisation is dominated by rarer sources providing larger, rarer bubbles; (3) these galaxies have high intrinsic Ly production (Stark et al. 2017;Tang et al. 2023) and facilitated transmission in the IGM if the Ly flux is emitted redward of systemic (e.g., Dĳkstra et al. 2011;Mason et al. 2018b).These scenarios should be testable with spectroscopic observations in the field of high redshift Ly-emitters.The most important first step is confirming if the large regions really are ionised.As the Ly damping wing due to nearby neutral gas strongly attenuates Ly close to systemic velocity, detecting Ly with high escape fraction (estimated from Balmer lines) and very low velocity offset would be a key test to infer if the sources lie in large ionised regions.The  > 8 LAEs that have been detected so far have Ly velocity offset > 300 km s −1 (Tang et al. 2023;Bunker et al. 2023), thus the large ionised regions cannot be confirmed, but spectroscopy of the fainter galaxies (which are more likely to emit Ly closer to systemic velocity, Prieto-Lyon et al. 2023) in these overdensities could be used to confirm large bubbles.These observations are now possible with JWST/NIRSpec, which can also importantly spectroscopically confirm overdensities.Excitingly, recent observations have discovered strong Ly at low velocity offsets at  > 7, implying large ionised regions (Tang et al. 2023;Saxena et al. 2023), and we will discuss quantitative constraints on the sizes of ionised regions in a future work.
We have also shown the bubble size distribution around observable galaxies depends on both the average IGM neutral fraction  and the reionising source model.As the characteristic bubble size evolves strongly with  (Figure 10), we may be able to constrain the reionisation history by simply counting overdensities of LAEs as a function of redshift.Trapp et al. (2022) recently used observed overdensities of LAEs to place joint constraints on the IGM neutral fraction and underlying matter density of those regions.That work is complementary to our approach in that it demonstrates a strong link between the overdensity of a region and the expected size of the ionised region around an overdensity.
In Sections 3.3 and 3.5 we show that the reionising source models have a strong impact on the predicted number of galaxies in large ionised bubbles early in reionisation.Finding evidence for a high number density of large ionised regions ( ∼ > 10 cMpc) at high redshift would thus provide evidence for reionisation driven by rare, bright sources.However, it is clear from our work that characteristic bubble sizes in different reionisation models at different  can be degenerate, so focusing purely on observing overdensities likely to reside ionised regions will not be able to break this degeneracy.
As seen clearly in Figure 1, the Rapid model is characterised by biased, isolated large bubbles, thus it is much more likely that galaxies outside of overdensities will still be in mostly neutral regions early in reionisation in this scenario (see Figure 6).Thus to measure  and fully break the degeneracy between reionisation morphologies requires observing a range of environments over time during reionisation.For example, observing the Ly transmission from multiple sightlines to galaxies at different redshifts (e.g., Mesinger & Furlanetto 2008b;Mason et al. 2018c;Whitler et al. 2020;Bolan et al. 2022) and the 21-cm power spectrum as a function of redshift (e.g., Furlanetto et al. 2004a;Geil et al. 2016).

CONCLUSIONS
We have produced large-scale (1.6 Gpc) 3 simulations of the reionising IGM using 21cmfast and explored the size distribution of ionised bubbles around observable galaxies.Our conclusions are as follows: (i) Observable galaxies ( < −16) and galaxy overdensities are much less likely to reside in neutral regions compared to regions at the mean density.This is because galaxies are the source of reionisation.
(ii) The bubble size distribution around UV-bright ( < −20) galaxies and strong galaxy overdensities is biased to larger characteristic sizes compared to those in the full volume.
(iii) At all stages of reionisation we find a trend of increasing characteristic host bubble size and decreasing bubble size scatter with increasing UV luminosity and increasing overdensity.
(iv) As shown by prior works, we find the bubble size distribution strongly depends on both the IGM neutral fraction and the reionising source model.The difference between these models is most apparent in the early stages of reionisation,  > 0.5: if numerous faint galaxies drive reionisation, we expect a gradual reionisation with numerous small bubbles, whereas if bright galaxies drive reionisation we expect a more rapid process characterised by larger bubbles biased around only the most overdense regions, with sizes > 30 cMpc even in a 90% neutral IGM.
(v) We use our simulations to interpret recent observations of galaxy overdensities detected with and without Ly emission at  7. We find the probability of finding a large ionised region with  ion > 1 pMpc, capable of transmitting significant Ly flux, at  ≈ 7 − 8 is high ( ∼ > 40 − 93%) for large-scale galaxy overdensities, implying that Ly-emitting galaxies detected at these redshifts are very likely to be in large ionised regions.
(vi) We find a very low probability of the  ≈ 8.7 association of Ly emitters in the EGS field and the  = 10.6 galaxy GNz11, also detected with Ly emission, to be in a large ionised bubble (∼ 11% and ∼ 7%, respectively).The Ly detections at such a high redshift could be explained by either: a lower neutral fraction ( 0.8) than previously inferred; or if UV bright galaxies drive reionisation, which would produce larger bubbles; or if the intrinsic Ly production in these galaxies is unusually high.
(vii) We make forecasts for the number density of ionised bubbles as a function of bubble size expected in the JWST COSMOS-Web survey and the Euclid Deep survey.Our fiducial model predicts no ionised regions > 10 cMpc in the COSMOS-Web volume unless  < 0.9, with tens of large bubbles expected by  < 0.7, though with large cosmic variance.We find Euclid and Roman wide-area surveys will have sufficient volume to cover the size distribution of ionised regions with minimal cosmic variance and should be able to detect the UV-bright galaxies which signpost overdensities.Deeper photometric and spectroscopic follow-up around UV-bright galaxies in these surveys to confirm overdensities and Ly emission could be used to infer  and discriminate between reionisation models.
Our simulations show that in interpreting observations of  > 6 galaxies it is important to consider the galaxy environment.We showed the bubble size distribution around observable galaxies and galaxy overdensities can be significantly shifted from the bubble size distribution over the whole cosmic volume.This motivates using realistic inhomogeneous reionisation simulations, or at least tailored bubble size distributions to interpret observations.
Our results imply that the early stages of reionisation are still very uncertain.Identifying and confirming large ionised regions at very high redshift is a first step to understanding these early stages, and thus the onset of star formation.This is now possible with deep JWST/NIRSpec observations which could map the regions around  > 8 Ly emitters.The detection of Ly with high escape fraction and low velocity offset from other galaxies in the observed  > 8 overdensities could confirm whether the Ly emitters at  > 8 are tracing unexpectedly large ionised regions (e.g., Tang et al. 2023;Saxena et al. 2023). .Bubble size distributions,  / log 10 , as a function of redshift for  = 7 (solid lines) and  = 8 (dashed lines) and  = 9 (dotted lines) for  = −16, −22 at  = 0.5.We also show the total bubble size distribution as a thick grey line in each redshift.We see a minimal difference at different redshifts, but higher redshifts show slighter higher bubble sizes as we discuss in Appendix B.

APPENDIX A: MODEL UV LUMINOSITY FUNCTION
In Figure A1 we demonstrate that our model for assigning UV magnitudes to simulated halos (Section 2.2) reproduces observed UV luminosity functions over  ∼ 7 − 10 as required for our study.We note the apparent turnover at  ∼ > − 16 is not physical, but arises due to enforcing a halo mass cut-off at  halo = 5 × 10 9  in our catalogue due to memory restrictions.

Figure 3 .
Figure 3. Bubble size distributions as a function of UV luminosity for  = −16, −18, −20, −22.We also show the bubble size distribution from the full volume as a thick grey line in each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different volume-averaged IGM neutral fraction,  .As the neutral fraction decreases, the bubble size distributions shift to higher values, as expected as bubbles grow as reionisation progresses.With increasing UV luminosity, the probability that a galaxy resides in big bubbles increases.

Figure 4 .
Figure4.Bubble size distributions as a function of galaxy overdensity at  = 8 ± 0.1 in a 4.84 arcmin 2 area (∼ (13cMpc) 3 ) with a survey limit of  A = 28, for  /  ≈ 5, 10 and ≥ 15, where  = 0.84.We also show the bubble size distribution from the full volume as a thick grey line in each simulation.The fractions of galaxies in  < 0.8 cMpc bubbles (below our resolution limit) or neutral cells are marked with arrows.Each panel shows a different  .More overdense regions host larger  ion early at high  .As  decreases,  ion of less overdense regions begins to catch up and ends up having similar bubble size distribution to those of the most overdense regions.This agrees with the general reionisation picture, that overdense regions are reionised first.

‡Figure 7 .
Figure7.Probability in our models of finding a bubble size > 1 pMpc around regions similarly overdense to the observed  ∼ > 7 associations of Ly emitters in our Gradual and Rapid simulations (black lines).Grey lines show the range of probabilities in the full simulation volume.We use the IGM neutral fractions expected at these redshifts(Mason et al. 2019b).The plot is discussed in Section 3.4 and a summary of our simulation setup is given in Table1.The bubble size distributions for all these fields is shown in FigureC1.It is highly likely for the observed  ∼ 7 Ly emitting galaxies to reside in large ionised bubbles.At  8, large bubbles are unexpected even in overdensities.

Figure 8 .
Figure8.Number density of ionised bubbles for a range of  and for our Gradual (solid) and Rapid models (dashed), calculated using the watershed algorithm.We show the inverse of the survey volume for COSMOS-Web (120 cMpc) 3 and Euclid Deep (530 cMpc) 3 as horizontal lines, marking the number density where one bubble is expected in that volume.

Figure 9 .
Figure 9. Number of ionised bubbles we predict from multiple realisations of a COSMOS-Web-like survey for our Gradual and Rapid models at a range of neutral fractions.The lines show the median number counts and the shaded regions are the 16-84 percentile of the number counts, demonstrating the large cosmic variance in this volume.
Figure B1.Bubble size distributions,  / log 10 , as a function of redshift for  = 7 (solid lines) and  = 8 (dashed lines) and  = 9 (dotted lines) for  = −16, −22 at  = 0.5.We also show the total bubble size distribution as a thick grey line in each redshift.We see a minimal difference at different redshifts, but higher redshifts show slighter higher bubble sizes as we discuss in Appendix B.
The median relation  , ( h , , ) is set by calibration to

Table 1 .
Mason et al. (2019a)f  ∼ > 7 associations of Ly emitters used in our simulations.Using the non-parametric reionisation history posteriors byMason et al. (2019a)including constraints from the CMB optical depth, quasar dark pixel fraction and measurements of the Ly damping wing in quasars and galaxies.We calculate  ( > 1 pMpc) by marginalising  ( > 1 pMpc| ) over the  posterior at each redshift inferred byMason et al. (2019a).
Furlanetto & Oh (2005)acteristic' bubble sizes as a function of  for our simulations compared to previous work.We show the mean size of ionised regions in this work (black) for the Gradual and Rapid models (solid and dashed lines respectively), and the characteristic size of ionised region in Furlanetto & Oh (2005) (blue).As discussed in Section 3.3, characteristic sizes of ionised regions in the Rapid model are much larger than those in the Gradual model at fixed  .The excursion set formalism used byFurlanetto & Oh (2005)can underestimate the sizes of ionised regions by over an order of magnitude as it does not account for overlapping regions.
account, we cannot discriminate between the bubble size functions of Gradual and Rapid with a COSMOS-Web-like survey.