The Physics of Indirect Estimators of Lyman Continuum Escape and their Application to High-Redshift JWST Galaxies

Reliable indirect diagnostics of LyC photon escape from galaxies are required to understand which sources were the dominant contributors to reionization. While multiple LyC escape fraction ($f_{\rm esc}$) indicators have been proposed to trace favourable conditions for LyC leakage from the interstellar medium of low-redshift ''analog'' galaxies, it remains unclear whether these are applicable at high redshifts where LyC emission cannot be directly observed. Using a library of 14,120 mock spectra of star-forming galaxies with redshifts $4.64 \leq z \leq 10$ from the SPHINX$^{20}$ cosmological radiation hydrodynamics simulation, we develop a framework for the physics that leads to high $f_{\rm esc}$. We investigate LyC leakage from our galaxies based on the criteria that successful LyC escape diagnostics must i) track a high specific star formation rate, ii) be sensitive to stellar population age in the range $3.5-10~$Myr representing the times when supernova first explode to when LyC production significantly drops, and iii) include a proxy for neutral gas content and gas density in the interstellar medium. ${\rm O}_{32}$, $\Sigma_{\rm SFR}$, M$_{\rm UV}$, and H$\beta$ equivalent width select for one or fewer of our criteria, rendering them either necessary but insufficient or generally poor diagnostics. In contrast, UV slope ($\beta$), and ${\rm E(B-V)}$ match two or more of our criteria, rendering them good $f_{\rm esc}$ diagnostics (albeit with significant scatter). Using our library, we build a quantitative model for predicting $f_{\rm esc}$ based on direct observables. When applied to bright $z>6$ Ly$\alpha$ emitters observed with JWST, we find that the majority of them have $f_{\rm esc} \lesssim 10\%$.


INTRODUCTION
Various astrophysical (e.g.Keating et al. 2020;Kulkarni et al. 2019;Becker et al. 2021) and cosmological (Planck Collaboration et al. 2020) probes indicate that the Universe had transitioned from a neutral to an ionized state by the redshift interval 5 ≲  ≲ 6.However, the onset of reionization and the sources responsible for it, as well as the neutral fraction evolution during the transition remain uncertain.Both upper ( ∼ 20) and lower ( ∼ 15) limits on the onset of cosmic dawn are provided by the Cosmic Microwave Background (e.g.Heinrich & Hu 2021) and star formation histories (SFHs) of high-redshift galaxies (e.g.Laporte et al. 2021).Although often model-dependent, neutral fraction evolution constraints can be derived from the decreasing prevalence of Ly emitters (e.g.Pentericci et al. 2011;Stark et al. 2010;Mason et al. 2018;Jones et al. 2023;Goovaerts et al. 2023), the damping wings of high-redshift quasars (e.g.Ďurovčíková et al. 2020;Davies et al. 2018;Greig et al. 2019), and the opacity of the Ly forest (e.g.Fan et al. 2006;Bosman et al. 2022).However, only limited observational constraints exist on the sources responsible for reionization.
Understanding the sources of reionization is of key importance.The topology of reionization is strongly affected by the source model, ★ nicholas.choustikov@physics.ox.ac.uk which not only impacts the shape and amplitude of the 21 cm signal, (e.g.Zaldarriaga et al. 2004;McQuinn et al. 2007;Kulkarni et al. 2017), but also controls which dwarf galaxies and filaments are regulated by radiation feedback (Katz et al. 2020a).Furthermore, the temperature the IGM reaches during reionization is sensitive to the spectral energy distribution (SED) of the sources responsible.
Some debate exists about whether these sources are active galactic nuclei (AGN) or star-forming galaxies.AGN-dominated models of late reionization have been suggested to explain observational constraints such as the low optical depth to Thompson scattering (e.g.Madau & Haardt 2015;Chardin et al. 2017;Grissom et al. 2014;Torres-Albà et al. 2020).Furthermore, recent observations have identified numerous AGN-candidates at high redshift (Larson et al. 2023;Maiolino et al. 2023;Fujimoto et al. 2023), suggesting that there may be more of them than previously expected (Greene et al. 2023).In contrast, models (Faucher-Giguère 2020) based on the local X-ray background (Parsa et al. 2018), the fact that helium reionization was completed significantly later at  ∼ 3 (e.g.Furlanetto & Oh 2008) and an apparent drop in the AGN luminosity function with increasing redshift (Kulkarni et al. 2019) (although this is based on observations taken before the launch of JWST) all indicate that AGN had a sub-dominant effect on the reionization history of the universe.This agrees with other results indicating that AGN only became a dominant source of ionizing photons at redshifts of  ∼ 4 (e.g.Dayal et al. 2020;Trebitsch et al. 2021).
As a result, Star-forming galaxies are often considered the primary candidates for providing the bulk of the LyC photons for reionization (e.g.Robertson et al. 2015;Livermore et al. 2017;Naidu et al. 2020).Within the galaxy population, it is generally assumed that reionization was driven by dwarf galaxies due to the steep observed faint-end slopes of the high-redshift UV luminosity function (e.g.Bouwens et al. 2022a;Harikane et al. 2023).However this latter assumption is highly dependent on the amount of LyC photons that are produced per unit star formation (or UV luminosity) as well as the fraction of LyC photons that leak (  esc ) as a function of mass (or UV luminosity).
The former can be estimated from stellar population synthesis models (e.g.Leitherer et al. 1999;Stanway & Eldridge 2018).While uncertainties in the ionizing photon production rate exist due to systematic differences between stellar population models (e.g.binaries, rotation, IMF, etc.), the escape fraction is far less constrained.This is due to the fact that it emerges from complex, highly non-linear physics (describing e.g. the state of the ISM) and, more importantly, it cannot be directly detected during the epoch of reionization due to the increasingly neutral IGM (Inoue et al. 2014).For these reasons, constraints on  esc are derived indirectly, for example, by observing samples of low-redshift LyC leaking galaxies that are considered "analogs" of those that form during the epoch of reionization (e.g.Izotov et al. 2018a;Flury et al. 2022a), by directly modelling LyC leakage with cosmological radiation hydrodynamics simulations (e.g.Paardekooper et al. 2015;Rosdahl et al. 2018), or by correlating galaxies with Ly forest transmission (e.g.Kakiichi et al. 2018).
The number of observational measurements of  esc are rapidly growing.LyC photons are directly detectable with space-based facilities, such as at  ∼ 0.3 with the cosmic origins spectrograph on HST (Green et al. 2012;Leitherer et al. 2016) or at even higher redshifts with AstroSat (Saha et al. 2020).Ground-based and space-based observatories have pushed the redshift frontier of LyC measurements to  ≳ 3 (e.g.Vanzella et al. 2010;Steidel et al. 2018;Fletcher et al. 2019;Saxena et al. 2022).The Low Redshift Lyman Continuum Survey (LzLCS, Flury et al. 2022a;Flury et al. 2022b) in particular has significantly increased the total number of low-redshift galaxies with detected LyC emission.However, it remains unclear whether these "analogs" are truly representative of the high-redshift galaxy population (Katz et al. 2022b;Brinchmann 2022;Schaerer et al. 2022b;Katz et al. 2023c).Moreover, it is not always clear how to generalize results from observations of individual objects or surveys with complex selection functions to the general population of high-redshift galaxies.
Numerical simulations provide a complementary framework to understand the physics of LyC leakage.However, self-consistently modelling the production and transfer of LyC photons through a resolved, multiphase ISM remains a technically challenging problem.Nevertheless, simulations of individual or a few high-redshift galaxies are commonplace (e.g.Kimm et al. 2017;Trebitsch et al. 2017;Kimm & Cen 2014;Ma et al. 2020), though these suffer from similar generalizability arguments.Larger volume or full-box cosmological radiative transfer simulations that resolve the ISM for thousands of galaxies are now becoming technically feasible (e.g.Xu et al. 2016;Rosdahl et al. 2018).SPHINX 20 (Rosdahl et al. 2022) represents such an effort where the connection between  esc and various galaxy properties (such as stellar and halo mass, UV luminosity, star formation rate (SFR), specific star formation rate (sSFR), metallicity, etc.) can be studied across a sample of > 10, 000 galaxies per snapshot between 4.64 ≤  ≲ 15.
In this work, we address the applicability of various indirect, observationally-developed, diagnostics of LyC escape on a statistical sample of simulated high-redshift galaxies that are likely to be observable with JWST.We develop a physically motivated model for the conditions that need to be met to ensure both high  esc and a simultaneous significant production of LyC photons.Using mock observations from Version 1 of the SPHINX Public Data Release (SPDR1), we discuss how various indirect diagnostics fit within our framework to elucidate the physics of why a leakage indicator is successful (or not).We intend our results to be immediately applicable to the large samples of JWST galaxies currently being observed at  > 6 (e.g.Curtis-Lake et al. 2022;Finkelstein et al. 2022;Treu et al. 2022;Matthee et al. 2022a).
This work is organized as follows.In Section 2 we outline the numerical methods behind our new SPHINX 20 data-set.Section 3 presents and tests our new generalised framework for identifying LyC leaking galaxies.In Section 4 we use our framework to contextualise and explain known diagnostics and use our data-set to predict escape fractions from JWST spectra.Finally, caveats are given in Section 5 and we conclude in Section 6.

NUMERICAL SIMULATIONS
Due to the observational challenges of both detecting LyC radiation and being limited to individual lines of sight, we employ state-ofthe-art numerical simulations to understand both the physics driving LyC leakage and the observational signatures of a high escape fraction.More specifically, we use the SPHINX 20 cosmological radiation hydrodynamics simulation (Rosdahl et al. 2022).This simulation is ideal for our purposes because the volume (20 3 cMpc 3 ) is large enough to sample a wide diversity of galaxy stellar masses (10 4  ⊙ − 10 10  ⊙ ) and properties, while the spatial and mass resolution1 allow us to simultaneously resolve many of the low mass haloes that may be contributing to reionization as well as much of the multiphase ISM in the simulated galaxies.Full details of the simulation are described in (Rosdahl et al. 2018(Rosdahl et al. , 2022)).
We select a subset of observable galaxies from SPHINX 20 in the snapshots at 𝑧 = 10, 9, 8, 7, 6, 5, 4.64, that have 10 Myr-averaged SFRs ≥ 0.3 M ⊙ yr −1 .The SFR threshold is designed to mimic a flux limited survey and allows us to study a select group of epoch of reionization galaxies from SPHINX 20 that are most likely to be detectable by deep JWST observations 2 .The total sample contains 1,412 galaxies and in Figure 1 we show the SFR versus stellar mass for our entire sample, along with the main sequences for the subset of galaxies at  = {4.64,7, 10} in orange, red, and purple respectively.Due to the SFR threshold, the main sequence follows a curve, calculated here using a moving average.The fact that LyC leakers fall around the main sequence at their redshift has been discussed at length before.Interested readers are directed to Katz et al. (2023b).
Hereafter, data displayed will contain galaxies from each of these redshift bins.
For each galaxy we compute a total intrinsic SED by summing emission from the star particles and gas cells.Stellar emission follows the BPASS v2.2.1 SED (Stanway & Eldridge 2018), and is computed based on the mass, age, and metallicity of each star particle.Line emission from each gas cell is computed using an updated version of the method presented in Katz et al. (2022a).For each gas cell, we calculate the total ionizing flux.For cells that host star particles, we sum the contributions from all star particles within the cell 3 .For gas cells without star particles, we use the ionizing fluxes directly from the RT solver in the simulation.
Like all cosmological simulations, SPHINX 20 is limited by spatial 2 We use intrinsic UV magnitudes and best-fit cosmological parameters from Planck Collaboration et al. (2020) to find that the dimmest galaxies at  = 4.64 and 10 have intrinsic apparent magnitudes of 30.9 and 32.3 respectively.Current JWST NIRCAM surveys therefore have sufficient depth to image the majority (87%) of these galaxies (e.g.Eisenstein et al. 2023;Casey et al. 2023;Bagley et al. 2023).Comparisons between mock SPHINX 20 and JADES photometry are shown in Figure 15 of Katz et al. (2023a), showing that this is a good comparative sample. 3This assumes that the local star particles dominate the ionizing photon budget of the gas cell.This assumption may fail when a cell has neighbours with much higher star formation efficiencies.
resolution, which in this case can pose a problem for the few radiation fronts from individual star particles which are not completely resolved.In the case of such un-resolved Stromgren spheres, we need to be careful as the temperature of the gas cell ends up being an average between those of the ionised and neutral phases.This leads to a lower effective gas temperature in the H II region, which primarily impacts collisionally excited lines (although in many cases not their ratios) and to a lesser-degree, recombination lines, while having almost no affect on IR fine structure lines or resonant scatter.In order to mitigate this issue, we apply a sub-resolution model.First, we identify all cells that host star particles where the Stromgren radius ( S ) is unresolved (i.e. S < Δ/2).For cells without an unresolved Stromgren sphere, we proceed as normal and line emissivities are calculated with CLOUDY v17.03 (Ferland et al. 2017) as these cells are unaffected by the issues discussed above4 .We tabulate a grid of one-zone slab models varying the gas density, metallicity, ionization parameter, and electron fraction.All models are iterated to convergence and the shape of the SED varies with metallicity but is assumed to have an age of 10 Myr5 .To calculate the emission from unresolved Stromgren spheres, we run a second grid of CLOUDY models assuming a spherical geometry, varying stellar age, metallicity, gas density, and total ionizing luminosity.Stellar age is computed as the ionizing luminosity-weighted average stellar age in each gas cell.Finally, we use the line emissivities as calculated by the appropriate spherical CLOUDY model, which now implicitly uses the corrected temperature.By definition, radiation from the unresolved Stromgren sphere does not leak to surrounding gas cells, meaning that this fix does not affect them.For a further discussion, the reader is directed to Katz et al. (2023a).The total intrinsic emission line luminosity of a galaxy is then the sum of all gas cells within the virial radius.
To better compare with observations we also account for dust scattering and absorption.Following Katz et al. (2022b), we use RASCAS (Michel-Dansac et al. 2020) and the effective SMC dust model from Laursen et al. (2009) to attenuate the SEDs and line emission.Since the attenuation depends on viewing angle, we employ the peeling algorithm (Yusef-Zadeh et al. 1984;Zheng & Miralda-Escudé 2002;Dĳkstra 2017) to compute the full dust-attenuated SED for ten different uniformly distributed directions.Hence our full data set consists of 14,120 simulated spectra, though we will in some cases discuss angle-averaged versions of these quantities.We then use these spectra to extract dust-attenuated observables, including line luminosities, equivalent widths, UV spectral index (, by fitting to   ∝   around 1500 Å), UV attenuation given by the Balmer decrement (E(B − V)), UV magnitude (M UV ), and the star formation rate surface density (Σ SFR ).In order to compare like-for-like, Σ SFR is measured by a SFR converted from H luminosity (Kennicutt 1998) as well as the dust attenuated half-light radius at 1500 Å.
We have also post-processed the simulation to measure  esc for all galaxies in our sample.Angle-averaged LyC escape fractions are calculated with RASCAS (Michel-Dansac et al. 2020) by ray-tracing LyC emission from star particles (see Rosdahl et al. 2022).While this value can be used to measure the instantaneous contribution of each halo to reionization, it cannot be easily measured with observations due to both the wavelength coverage and the deep exposure times needed.For this reason, we measure a second value of  esc for each line of sight using the H luminosity such that (Izotov et al. 2016): This allows us to make a more fair comparison with observational surveys such as LzLCS (Flury et al. 2022b) where such a method is used.A comparison between the two  esc measurements is shown in Figure 2.While there is a strong correlation between the two quantities for the highest values of  esc , in general, the H method tends to over-predict the true value.This is due to the fact that the normalization constant of 4.86 × 10 −13 which is commonly used in the literature (e.g.Matthee et al. 2022a) is not fully representative of the value in our simulation 6 .This bias will not qualitatively impact the general trends we find between  esc and various observational quantities.Henceforth, any reference to  esc invokes the angle-averaged value derived by RASCAS, while  H esc refers to the value derived by Equation 1.

A GENERALISED FRAMEWORK FOR LYC LEAKAGE
Numerous diagnostics for identifying galaxies with high  esc have been suggested in the literature (see Section 1); however the vast majority have been shown to be "necessary but insufficient" conditions for LyC leakage (Flury et al. 2022b).We therefore aim to provide a more general framework describing the conditions necessary for galaxies to both produce and leak a significant amount of ionizing LyC radiation.
In order for a galaxy to meaningfully contribute to reionization it must simultaneously be producing copious amounts of ionizing 6 By fitting intrinsic H to total ionizing flux for our galaxies we find a value of 4.05 × 10 −13 .However, in order to best compare to observational methods we use the theoretical value.(e.g.Schaerer 2003) photons and a fraction of those photons must be able to leak into the low-density IGM where the recombination timescale is longer than the Hubble time.While simulations have yet to quantitatively agree on the average escape fractions of galaxies as a function of various galaxy properties (e.g.mass, Ma et al. 2020;Rosdahl et al. 2022), qualitatively they nearly all find that  esc is a feedback-related quantity (e.g.Trebitsch et al. 2017).Energetic feedback from stars (e.g.supernovae (SN)) disrupt the ISM, creating ionized channels through which photons leak into the IGM.Thus, based on such a scenario we propose that in order to produce a significant amount of LyC leakage there needs to be: • A strong burst of star formation such that a significant quantity of ionizing photons are produced.
• Either a significant reduction in the neutral gas content of the ISM such that photons can leak relatively isotropically (densitybounded case) or a creation of ionized channels (ionization-bounded case).This is achieved through photoionization and mechanical feedback.
• A timescale synchronization such that stars continue producing significant amounts of LyC photons after feedback has disrupted the ISM.
Therefore, we argue that a good diagnostic to identify LyC leakers should simultaneously encapsulate: (i) High sSFR so that significant numbers of ionizing photons are produced and feedback has a chance of disrupting the ISM.
(ii) A stellar population age ≳ 3.5 Myr such that SN have had time to explode but ≲ 10 Myr so that the LyC production rate remains high.
(iii) A proxy for neutral gas content and the ionization state of the ISM to identify when feedback has efficiently coupled to gas in order to create ionized channels or disrupt/ionize the entire medium.
In Figure 3 we demonstrate that each of these conditions alone are insufficient to select a sample of only LyC leaking galaxies.In the left panel we show  esc versus sSFR (coloured by total stellar mass) and while many of the leakers have high sSFR, in general there is no trend between the two quantities (see also Rosdahl et al. 2022).Similar behaviour is also seen in observations (e.g.Flury et al. 2022b;Saxena et al. 2022).The centre panel shows  esc versus mean stellar age weighted by LyC luminosity (to highlight the age of stars which contribute to LyC flux).In order to reach an escape fraction of 20% (above the dashed orange line), the age must be ≳ 3.5 Myr as indicated by the left-most vertical red line.However, selecting by age alone clearly results in significant contamination.Our final criteria is a representative proxy for the state of the ISM.It is difficult to find a proxy that only contains information about the ISM and not the SFR or age as these are highly coupled.Nevertheless, for demonstrative purposes we have chosen a composite parameter where we multiply the angleaveraged UV attenuation E(B − V), (which has already been shown by Saldana-Lopez et al. (2022) to empirically correlate with  esc ) by the gas density weighted by the intrinsic [O II] 3727 luminosity7 .While there is clearly a strong trend and all galaxies with  esc > 20% have a value log 10 ( ISM ) ≲ 1.3, there are many non-leakers with such low values as well.These findings are reinforced by the histogram above each subplot showing the fraction of galaxies with (right), all coloured by stellar mass.Escape fractions of 20% and 5% are marked with dashed and dotted horizontal orange lines, respectively.The top panels show a histogram of the fraction of galaxies with  esc > 5% in each bin with more than 5 such galaxies.Systems with the highest escape fractions tend to have high sSFR, fall within 3.5 Myr and 10 Myr (indicated by red lines), low neutral gas attenuation parameter, and have low stellar masses.However, each of these requirements is a necessary but insufficient diagnostic for LyC leakers.
escape fractions higher than 5% (i.e.above the dotted orange line) in each bin containing more than five such leakers.
It is clear that none of these three conditions alone are sufficient for identifying a contamination-free sample of LyC leakers.However, together, they provide a robust framework for identifying LyC leakers.
In Figure 4 we plot angle-averaged  ISM versus sSFR for SPHINX 20 galaxies with 3.5 ≤ ⟨Stellar Age⟩ LyC /[Myr] < 10.The galaxies with high  esc are biased towards having high sSFR and low  ISM .By selecting galaxies with we generate a sample based on angle-averaged quantities that is biased towards having high  esc .In the bottom panel of Figure 4 we show a cumulative distribution function for the escape fractions of galaxies that satisfy our selection criteria.This consists of 227 galaxies (16% of our sample), ∼ 74% of which have  esc > 5%.These leakers account for ∼ 65% of all such leakers in our sample.This can be contrasted with LzLCS where only 12/66 galaxies (18%) have  esc > 5%.While the efficiency of our selection is not yet perfect, our framework provides a theoretically motivated model with minimal contamination from galaxies with low  esc .Despite the effectiveness of such a selection method, it is important to note that thus far this discussion has focused on galaxy properties that are not directly observable.Nevertheless, for completeness, in Figure 5 we plot  esc versus halo mass, stellar mass, metallicity, 10 Myr-averaged SFR, 100 Myr-averaged SFR, and the sSFR surface density.As a sub-sample of SPHINX 20 galaxies, the data set presented here displays many of the trends described in Rosdahl et al. (2022) relating to which galaxy properties correlate with  esc .However, there are subtle differences because we have selected only star-forming galaxies.There is a minor tendency for  esc to decrease with increasing halo mass and stellar mass, consistent  2022).Because of our cut in SFR, we do not sample a downturn in  esc that is seen at lower stellar and halo masses in some simulations (Ma et al. 2020;Rosdahl et al. 2022).
It is well established observationally that gas-phase metallicity scales with stellar mass (e.g.Lequeux et al. 1979;Tremonti et al. 2004).Such a trend holds in SPHINX 20 and for this reason, the trend of  esc with metallicity is also similar to that of  esc with stellar mass.Likewise, there exists a star formation main sequence that correlates stellar mass and SFR (e.g.Brinchmann et al. 2004;Salim et al. 2007), as shown for our sample in Figure 1.Hence we see similar behaviour between SFR 10 and  esc as for stellar mass and  esc .Though we see the same broad behaviour for SFR 100 (albeit with more scatter), low longer-term SFR seems to better select for LyC leakers.Furthermore, the prevalence of star formation burstiness (using SFR 10 /SFR 100 ) in LyC leakers has already been discussed in depth using the same data-set.For this, the reader is directed to Katz et al. (2023b).
While few trends exist between these fundamental galaxy properties and  esc , the histogram of each sub-Figure shows that there are certain regions of parameter space where one is more likely to find LyC leakers.For example, there are a higher fractions of leakers at low virial masses, stellar masses, and gas metallicities, indicating that the conditions needed for leakage are more often accessible in these environments.Similar behaviour is also observed in LzLCS where even though a galaxy property may not be predictive of the value of  esc , the detection fractions of LyC emitters may be higher when certain conditions are met (e.g.high O 32 , high EW(H), etc.).JWST data will be key for determining whether such conditions are more often met in the high-redshift Universe compared to locally (e.g.Katz et al. 2023c;Cameron et al. 2023).2) to produce a sample highly enriched with leakers, 74% of which have  esc > 5% is shown as the red line.(Bottom) Cumulative distribution for escape fractions for galaxies that satisfy our selection criteria.For comparison, the red star shows that 82% of galaxies selected to be part of LzLCS have  esc < 5%, while 26% of SPHINX 20 galaxies within these criteria have  esc < 5%.

DISCUSSION
In this section, we contextualize numerous indirect observational diagnostics for  esc and develop a new model that can be used to quantitatively infer  esc from high-redshift observations.

Contextualising Existing Diagnostics
Using the theoretically motivated model for LyC leakage presented in Section 3 we proceed to predict whether individual indirect  esc diagnostics suggested in the literature work well based on whether they match our three criteria.

O 32
There remains significant debate in the literature on the applicability of O 32 as a diagnostic of  esc .While 1D ISM models (e.g.Jaskot & Oey 2013; Nakajima & Ouchi 2014) and certain observations (e.g.Faisst 2016;Izotov et al. 2018b) indicate a strong correlation between O 32 and  esc , numerous theoretical models (e.g.Katz et al. 2020b;Barrow et al. 2020) and other observations (Bassett et al. 2019;Nakajima et al. 2020;Flury et al. 2022b) demonstrate that there are complications with viewing angle, ionization parameter, and metallicity such that high O 32 is perhaps a necessary but insufficient condition for LyC leakage.Within our framework, O 32 is particularly complex.
As an ionization parameter diagnostic (e.g.Kewley & Dopita 2002), high O 32 is likely indicative of high sSFRs as seen in the top-left panel of Figure 6.There is significant scatter in the relation based on geometric effects, as well as variations with metallicity and other ISM properties, nevertheless, the highest observed values of O 32 in our simulation traces the highest sSFR, and hence O 32 satisfies our first criteria.
O 32 is expected to strongly vary with stellar cluster age due to the evolution of the ionizing sources (both in terms of brightness and spectral shape).This behaviour is shown in Figure 2 of Barrow et al. (2020) and is sensitive to the chosen SED model as well as the presence of Wolf-Rayet stars.The galaxies with the highest O 32 also have the youngest LyC luminosity-weighted stellar ages (see the top-second panel of Figure 6).O 32 peaks at ages < 3.5 Myr and thus fails our second criterion as it preferentially traces objects that have yet to surpass the SN time scale.Similarly, O 32 is not an indicator of neutral ISM density and only marginally traces dust, thus also failing our third criteria (see the top-third panel of Figure 6).For these reasons, by itself, O 32 is not a good predictor of  esc .
In Figure 7 we show the observed and de-reddened O 32 versus  H esc,obs .As expected, no obvious trend emerges.We find reasonably good agreement with LzLCS (red points) in that there is a preference for galaxies with high  esc to be biased towards high O 32 , but high O 32 does not imply high  H esc,obs .Hence this confirms our previous assertion that high O 32 is a necessary but insufficient condition for high  esc .We argue that the reason why there is a bias for leakers to have high observed O 32 is two-fold.First there is a clear, albeit with significant scatter, correlation between O 32 and sSFR which represents one of our three conditions.Second, for galaxies older than 3.5 Myr, high ionization parameter can help make feedback more efficient.Nevertheless, there are a few galaxies with high  esc and lower O 32 compared to the typical SPHINX 20 galaxy.The majority of these galaxies appear as light blue points on Figure 7 because they have the oldest stellar ages.Such galaxies tend to be intrinsically much fainter than the galaxies with high  esc and high O 32 and represent the population of post-starburst Remnant Leakers described in Katz et al. (2023b).
We highlight that there seems to be a preferred O 32 between 3 and 10 where galaxies are most likely to have significant LyC leakage.This range corresponds to the typical values of O 32 that a galaxy reaches when it has evolved past the SN timescale, and can be easily seen in the top histogram of Figure 6.The exact details of this peak are sensitive to Wolf-Rayet modelling as well as dust treatment, ISM gas densities, and star formation model in the simulation; all of which will need to improve before we are confident in the robustness of this particular scale.

Spectral Index 𝛽
Recently, Chisholm et al. (2022) have suggested that the UV slope, , is a strong indicator of  esc , such that galaxies with bluer  have higher  esc . is a readily observable quantity in the high-redshift Universe as it can be estimated for large samples of galaxies from both photometry or more accurately with JWST spectroscopy (e.g.Topping et al. 2022;Endsley et al. 2022;Cullen et al. 2023).Hence, should it be a good diagnostic of  esc ,  may be an exciting probe of LyC leakeage directly in the EoR.While  is not necessarily a strong indicator of sSFR, in order to have a steep slope, the young stellar population must outshine the older stellar populations in the galaxy.We empirically find (middleleft panel of Figure 6) that galaxies with the bluest  (i.e.< −2.5) are also very strongly biased towards having the highest sSFRs.A high sSFR does not guarantee a blue  and there exists strong scatter due to dust; however as an  esc indicator,  satisfies our first criteria.
The UV slope is predicted to stay approximately constant for the first 10 Myr of evolution (Stanway et al. 2016).Hence a blue  does not reveal much about the stellar population age apart from the fact that it may still be producing significant quantities of ionizing photons.Thus  marginally satisfies our second criteria in that it picks out galaxies that can contribute to reionization but we expect some contamination from low  esc galaxies with stellar ages younger than 3.5 Myr (middle-second panel of Figure 6).
Finally, the observed  is strongly sensitive to dust.While not a density indicator,  can easily select for galaxies with very low E(B − V) and hence  also marginally satisfies our third criteria.For these reasons, in a metal-enriched environment we expect  to be a relatively good indicator of  esc , albeit with significant scatter.
In Figure 8 we plot  H esc,obs versus the observed  for SPHINX 20 galaxies and we find a strong trend that high  H esc,obs galaxies tend to have bluer  (as can be clearly seen in the middle histogram of Figure 6).We do find systems that also have redder  and high  H esc,obs but some of this is a sight-line effect and likewise, Remnant Leakers, which do not contribute meaningfully to reionization, will populate this region of the plot.The orange points and dashed orange line (Middle) Same as above, but for spectral index .Values of  < −2.5 are highlighted (dashed orange line) as they fulfill all three criteria, empirically selecting for high sSFR, ideal stellar age, and lower neutral gas densities in the ISM.These galaxies are the strongest leakers, as can be seen in the histogram.(Bottom) Same as above, but for the UV attenuation E(B − V).Values of E(B − V) < 0.01 are highlighted (dotted orange line) as they appear to fulfill 2/3 criteria, empirically selecting for galaxies with the correct mean stellar population age and low  ISM .
show both the data and relation predicted by Chisholm et al. (2022) which is in very good agreement with our median relation (blue).
We emphasize that this result holds only when metals/dust are present as without dust obscuration, our third criterion would not be satisfied.Indeed the galaxies with the lowest  esc at the bluest  are the most metal-poor galaxies in our simulated sample.Our model naturally predicts that this could become problematic at / ⊙ < 0.01.Here we have assumed that the dust-to-gas mass ratio scales linearly with metallicity; however, observations show that dust content may fall off as a power-law with decreasing metallicity (e.g.Rémy-Ruyer et al. 2014).In that case, we expect that the critical metallicity where  no longer becomes a good  esc indicator occurs at a higher metallicity than / ⊙ = 0.01.Hence to be conservative, we advocate that observed  is likely to be a good  esc indicator at / ⊙ > 0.1 and the details at lower metallicity (and the ability to apply this relation in the epoch of reionization) are sensitive to how the dust-to-gas mass ratio scales with metallicity and the timescales of dust formation at high-redshift.

E(B − V)
The problem of whether dust attenuation strongly affects  esc is not completely understood.Chisholm et al. (2018) suggest that even small dust attenuation removes significant numbers of ionizing photons.However, simulations have shown that dust tracks neutral hydrogen which has a much more important impact on  esc (Katz et al. 2022b).Therefore, it is natural to explore the use of UV attenuation (e.g.E(B − V), derived from the Balmer decrement) as a potential  esc diagnostic.Saldana-Lopez et al. ( 2022) found a strong anticorrelation between the UV dust-attenuation and LyC escape fraction for galaxies in the LzLCS sample, suggesting that LyC leakers tend to have a dust-poor ISM.Unsurprisingly, we find no significant dependence of E(B − V) on sSFR (bottom-left panel of Figure 6), with any residual relationship introduced by the fact that stellar mass is the denominator of sSFR and high-mass galaxies tend to be more metal-enriched and thus have more dust.E(B − V) does not pass our first criterion.Interestingly, we find that galaxies outside our stellar age criterion tend to have  significantly more UV attenuation (bottom-second panel of Figure 6).This points to the fact that SNe are able to destroy dust (in our case by destroying neutral hydrogen) through mechanical feedback (e.g.Priestley et al. 2021).Therefore, E(B − V) marginally satisfies our second criterion.Finally, it is clear that (by construction), there is a strong correlation between E(B − V) and  ISM (bottom-third panel of Figure 6).Therefore, we conclude that E(B − V) should be a good indicator of LyC leakage.
In Figure 9 we show the observed  esc as a function of E(B − V) compared to LzLCS galaxies from Saldana-Lopez et al. (2022).We find a strong trend between the two quantities.Furthermore, galaxies with lower ⟨ H ⟩ [OII] exhibit less scatter.However, it is likely that such comparisons are sensitive to the dust model used in the simulation.Similarly, Saldana-Lopez et al. ( 2022) assume a uniform dust screen which is not representative of the dust distribution in our simulation.2020) assumed that galaxies with the highest Σ SFR also have the highest  esc , which can result in a reionization scenario dominated by the most massive galaxies.
As SFR is the numerator of Σ SFR , there is unsurprisingly a strong correlation between sSFR and Σ SFR (top-left panel of Figure 10).This is due to the weaker dependence of galaxy size on stellar mass (Kawamata et al. 2018;Bouwens et al. 2022b).As with previous diagnostics, we similarly find significant scatter due to the variable impact of dust on the H emission and projected galaxy size as a function of sight line.Σ SFR undoubtedly passes our first criterion.
In contrast we find that the highest values of Σ SFR tend to cor- for each bin that contains at least 5 such galaxies.Though Σ SFR correlates weakly with sSFR (as both depend explicitly on SFR), it selects for young stellar ages and does not trace  ISM .Therefore, Σ SFR by itself is not a reliable diagnostic for the LyC escape fraction.(Middle) Same as above, but for H equivalent widths.EW(H) traces sSFR very well, but greater values select for stellar ages < 3.5 Myr and do not trace the state of the ISM.We therefore expect no strong relation with the LyC escape fraction, though values of 100 Å are weakly preferred.(Bottom) Same as above, but for M UV .We find that M UV does not trace the sSFR, but bright magnitudes select for the correct stellar ages.However, M UV shows no dependence on ISM density.As a result, leakers show a weak bias towards brighter magnitudes, but M UV is not a useful indicator of the LyC escape fraction.
respond to galaxies with ages younger than the SN timescale (topsecond panel of Figure 10), although there are a significant number of galaxies that have ages > 3.5 Myr and Σ SFR > 10 M ⊙ /yr/kpc 2 (the canonical value reported in Flury et al. (2022b) as the threshold for strong leakage).Galaxy size in our simulations is also relatively stable beyond 10 Myr.Thus, Σ SFR alone does not satisfy our second criterion and may be biased towards galaxies with too young stellar ages.Finally, we find no relation between Σ SFR and the state of the ISM (top-third panel of Figure 10).We then might expect that galaxies with high  esc might also have high Σ SFR due to the correlation with sSFR, but we expect there to be no strong correlation.
Indeed in Figure 11 we find no trend between the quantities.The relation suggested by Naidu et al. (2020) does not envelope our data, nor does it represent the LzLCS galaxies, which are consistent with those presented here.Similarly, the assumption by Sharma et al. ( 2017) that all galaxies with Σ SFR > 0.1 M ⊙ /yr/kpc 2 have  esc = 20% is clearly an inaccurate representation of both SPHINX 20 galaxies and LzLCS.Interestingly, in our SFR-limited sample, the lowest halo mass galax-ies have the highest Σ SFR > 0.1 M ⊙ /yr/kpc 2 as they tend to fall above the main sequence of star formation.Flury et al. (2022b) do note a trend between Σ SFR and  esc .It is possible that one emerges due to their selection criteria which are not fully representative of the general galaxy population.Hence trying to reproduce their correlation coefficient with SPHINX 20 may result in the correct value for the wrong reasons.Finally, we find that by selecting with Σ sSFR one is more likely to find a galaxy with significant leakage, which can be seen by comparing the histogram in Figures 5 and the top histogram of 10.This is in contrast with findings in the LzLCS, which report that the addition of stellar mass did not improve selection power (Flury et al. 2022b).

H𝛽 Equivalent Width
The connection between H equivalent width (EW(H)) and  esc remains debated.Green Pea galaxies are among the most studied low-redshift galaxy populations that contain a significant number of  2020) is shown in orange.We find that SPHINX 20 galaxies agree well with LzLCS observations, while disagreeing with the proposed relation of Naidu et al. (2020).
LyC leakers (e.g.Izotov et al. 2016Izotov et al. , 2018b,a) ,a) and these systems often exhibit extreme emission line ratios and equivalent widths (e.g.Yang et al. 2017).In contrast, as the escape fraction approaches 100%, EW(H) should tend towards zero as none of the ionizing photons are absorbed (Zackrisson et al. 2017).While LzLCS find no strong trend between EW(H) and  esc , galaxies with high  esc tend to also have high EW(H).EW(H) is a very strong tracer of sSFR (middle-left panel of Figure 10) as Balmer lines have long been known to track SFR and the strength of the continuum is sensitive to total stellar mass (e.g.Kennicutt 1998).However, like O 32 , the highest values of EW(H) trace ages < 3.5 Myr and there is no correlation between EW(H) and the state of the ISM (middle-second and middle-third panels of Figure 10) Hence we expect no strong correlation between EW(H) and  esc ; although, the connection with sSFR would explain the trend seen in LzLCS that the LyC leaker fractions increases with EW(H).We note that similar to our findings for O 32 , there appears to be a characteristic EW(H) ∼ 100Å for which galaxies tend to show elevated  esc (see the middle histogram of Figure 10).However, for the same reasons as before it is difficult to claim a robust value.Flury et al. (2022b) recently reported a weak correlation between  esc and M UV , in agreement with other observations that indicate LyC leakers tend to be lower mass, fainter galaxies (Steidel et al. 2018;Pahl et al. 2021).In contrast, (Rosdahl et al. 2022;Saxena et al. 2022) find no relation between between  esc and M UV , although they agree that lower luminosity galaxies are likely the sources of reionization.

M UV
In general, M UV is a weak indicator of sSFR (bottom-left panel of Figure 10) while high M UV could also indicate ages < 3.5 Myr (bottom-second panel of Figure 10).Similarly we find no correlation  SPHINX 20 galaxies.We find that when all masses are considered, M UV is not a reliable diagnostic for the LyC escape fraction.
between M UV and ISM state (bottom-third panel of Figure 10) so our model would predict no correlation between M UV and  esc as shown in Rosdahl et al. (2022).Figure 12 demonstrates this, comparing our galaxies to those of the LzLCS (Flury et al. 2022b).Here, we find that the majority of this scatter is introduced by a strong dependence of M UV on stellar mass.Figure 13 explores this further, by comparing M UV with M * .Points are coloured by  esc such that brighter galaxies at fixed stellar mass are biased towards having higher  esc .When stellar mass is fixed, high intrinsic M UV correlates with a high sSFR.However, like the observed  at fixed stellar mass, observed M UV becomes a strong indicator of dust attenuation which represents one of the tracers of our combination parameter on ISM state.Thus at fixed stellar mass the observed M UV satisfies two out of three criteria with scatter introduced due to stellar age.This demonstrates that sample selection is key for the emergence of trends between  esc and various galaxy properties.

A New Combined Diagnostic
Inspired by the success of our theoretical selection criteria in Figure 4 (top), we now aim to reproduce the ability of a three-point criterion at isolating systems with high LyC leakage, albeit with quantities which are directly observable.Furthermore, we can make use of the fact that several quantities satisfy multiple criteria to find the best set of observables from which to construct our diagnostic.
Given the fact that galaxies with blue dust-attenuated  marginally satisfy criteria 1 and 2, while strongly satisfying 3 (see middle row of Figure 6) we choose to start here.We continue with our secondstrongest individual diagnostic, E(B − V), which weakly indicates criterion 2 and strongly satisfies criterion 3 (see bottom row of Figure 6).Finally, within this set of diagnostics we note that the most loosely constrained is mean stellar age, for which we now select the H-to-FUV flux ratio, given the fact that it has been previously shown to We find that more massive galaxies tend to be more luminous in the UV, and that for a fixed galaxy mass bin, brighter galaxies tend to have higher LyC escape fractions.Therefore, at constant stellar mass, M UV can be used as a LyC diagnostic.
indicate stellar age both observationally (Weisz et al. 2012) and in simulations (Sparre et al. 2017), though this is by no means settled (c.f.Rezaee et al. 2022).Therefore, we predict that this set of three diagnostics can be used to reliably select a sample of galaxies which are greatly enriched with LyC leakers.
Figure 14 shows this, plotting line-of-sight  as a function of E(B − V) for all galaxies with log 10 (H/L 1500 ) < 1.68 coloured by  esc , with galaxies with  esc > 20% shown as larger points.Here, we find that by selecting galaxies with we generate a sample that is highly enriched with LyC leakers.Namely, this reduced set consists of 1931 observations (16.9% of our sample), ∼ 67% of which have  esc > 5%.This accounts for 62% of all such galaxies in our sample.In contrast, the theoretical selection criteria given in Figure 4 is 74% enriched by such galaxies, accounting for 65% of the overall sample.We stress that this is a theoretically motivated set of directly observable diagnostics which successfully selects for the majority LyC leakers in our sample.

Predicting 𝒇 esc for bright Ly𝜶 emitters observed with JWST
There remains debate in the literature over the contribution of bright Ly emitters (typically defined as having  Ly > 10 42.2 erg/s and EW(Ly) > 25Å) to reionization.For example, based on lowerredshift stacks, Naidu et al. (2022) assumed that bright Ly emitters with low peak velocity separation and line-centre flux all have escape  3) to produce a sample of highly enriched leakers, 67% of which have  esc > 5%.Those selected by this set of criteria account for 62% of the total population of such systems in our sample.
fractions of 20%.Furthermore, Matthee et al. (2022b) showed that if half of bright Ly emitters have an escape fraction of 50%, then one can match observational constraints on the neutral fraction evolution.
In contrast, based on a small stack of  > 7 galaxies, Witten et al. (2023) inferred that bright Ly emitters have  esc ≲ 10%.One of the primary advancements of JWST compared to earlier observations is the ability to obtain high resolution spectra of the rest-frame UV and optical for large numbers of galaxies at  > 6.While our previous discussion has focused on which indirect indicators are likely to identify enriched samples of LyC leakers, here we use the observable properties of galaxies to quantitatively predict the value of the escape fraction, particularly for a sample of high-redshift Ly emitters.A similar exercise was performed in Maji et al. (2022); however, the main difference here is that we focus only on observable quantities such that our models are immediately applicable to available galaxy spectra.
We consider eight observable quantities: , E(B − V), H, EW(H), M UV , R 23 , O 32 , and the half-light radius measured at 1500 Å.We then run a generalized linear model on scaled data9 using L1 regularization to limit the number of needed measurements and maintain simplicity so that the model is interpretable.To avoid over-fitting, we have split10 the data such that the model is trained on 80% and the remaining 20% is used for validation.Six of the eight initial parameters have non-zero coefficients, with both EW(H) and the half-light radius proving irrelevant for our model.Using these six Values for all coefficients are listed in Table 1.The median absolute error on the training and validation sets are identical at 0.39 dex indicating that the model generalizes well.This can also be seen in Figure 15, where we show predicted  esc as a function of true  esc for our sample.
The coefficients provide insight into how each parameter correlates with  esc .For example, the coefficients for  and E(B − V) are strongly negative indicating that blue UV slopes and low dust content are strong indicators of leakage.Counter-intuitively, O 32 negatively scales with  esc in our model.We believe this is due to the fact that once galaxies with blue UV slopes and low dust are selected, O 32 becomes an age indicator and lower O 32 indicates older ages.We emphasize that this anti-correlation exists only in the framework that contains these other parameters.
As a first application, we apply this relation to JADES-GS-z7-LA, a  = 7.3 Ly emitter (Saxena et al. 2023a) recently discovered as part of the JWST JADES GTO program.Based on a variety of spectroscopic features, the authors concluded that the current value of  esc is not substantially high, despite the high Ly  esc .Using our model, we predict a value of 3% which is considerably lower than the value needed to inflate an ionized bubble such that Ly is not completely attenuated by the IGM (Saxena et al. 2023a).Hence it is more likely that faint nearby dwarf galaxies (such as those which have already been spectroscopically confirmed, see Figure 5 of Witstok et al. 2023) are likely responsible for the local ionized bubble.Furthermore, Equation 4 has been used to infer  esc for 16 faint Ly emitters at  > 5.8 (Saxena et al. 2023b).
Beyond  ∼ 9.5 [O III] 5007 drops out of NIRSpec; however, GN-z11, a spectroscopically confirmed  = 10.6 galaxy was recently discovered to be a bright Ly emitter (Bunker et al. 2023).Because H, O 32 , and R 23 have not been measured for this galaxy, we create a custom model using H and [O III] 4363 rather than [O III] 5007.We find an escape fraction of 11%, significantly greater than the 4% reported as the Ly escape fraction.While it is theoretically difficult to obtain a LyC escape fraction greater than that of Ly and similarly this is rarely observed (e.g.Izotov et al. 2022;Verhamme et al. 2017), there is undoubtedly scatter in our model, which more than accounts for this discrepancy.IGM attenuation can also play a role in extinguishing the observed Ly.Hayes & Scarlata (2023) argue that the Ly escape from GN-z11 into the IGM is as high as 50%, which makes our measured LyC escape fraction fully consistent.Similarly the SED fit for GN-z11 shows a marginal   = 0.17.If we input this into our model (rather than the fiducial parameters that assumed no dust as suggested in Bunker et al. 2023), the estimated LyC escape fraction decreases to 6%.It is important to note that the [O III] 4363 line contains slightly different information than [O III] 5007 on account of it being an auroral line and therefore may be a useful tool in searching for LyC leakers.However, we have chosen not to include it in our general model on account of the fact that it is generally difficult to detect, particularly at high redshifts (e.g.Laseter et al. 2023).
In general, our model seems to point to bright Ly emitters having LyC escape fractions ≲ 10%.This result agrees with Witten et al. (2023) but contradicts Naidu et al. (2022) where it was assumed that bright Ly emitters with low peak velocity separation and linecentre flux all have  esc = 20%.The origin of this difference seems to be related to UV slope.Within the context of our model,  is the strongest indicator of  esc .We re-emphasize that this seemingly also holds true for low-redshift "analogs" (Chisholm et al. 2022).The  values between the assumed high and low  esc galaxies in Naidu et al. (2022) are formally consistent within the scatter, with a slight preference for the higher alleged  esc sample to be more blue.Using the stacks they provide, we have estimated  esc for their two samples and found values of 4% and 0.4% for the stacks with low peak velocity separation and high line-centre flux and high peak velocity separation and low line-centre flux, respectively.Applying the model from (Chisholm et al. 2022), which only depends on , we derive escape fraction values of 5% and 3%, respectively.The difference in  esc in our model is primarily driven by the E(B − V) difference between the two samples.
Our result does not indicate that bright Ly emitters are insignificant for reionization.Despite their estimated lower escape fractions, their intrinsic production of ionizing photons is very high, and if 5% of the ionizing photons leak, this may represent an important contribution to the emissivity budget.For the galaxies considered in Matthee et al. (2022b) to dominate reionization one would need to increase the assumed  ion to reconcile the lower escape fractions.Future JWST observations will undoubtedly provide new constraints on both the  esc of Ly emitters and  ion .

Comparison with Other Simulations
SPHINX 20 has not been the only attempt to study escape fractions in a cosmological context.Numerous works have been carried out studying how ionizing photons leak out of galaxies (e.g.Xu et al. 2016;Barrow et al. 2020;Trebitsch et al. 2021;Rosdahl et al. 2022;Hassan et al. 2022).Our approach however has focused primarily on observational signatures.This is important, as we can directly study mock-observed galaxy properties, accounting for the anisotropic nature of LyC leakage.
Simulations regularly show that  esc is sensitive to stellar age.For example, haloes with stellar populations younger than 5 Myr in the FIBY simulations have higher escape fractions across a larger solid angle (Paardekooper et al. 2015).FIRE-2 find a lag between the timing of a star burst and an increase in  esc , due to the time needed for feedback to clear channels (Ma et al. 2020), similar to what we find in SPHINX 20 .They also found a telltale geometry when  esc is high.Star-forming regions are surrounded by an accelerated, dense gas shell.Within this shell, young stars with ages 3 − 10 Myr are able to ionize low column-density channels through which radiation can leak.Kimm & Cen (2014) also find evidence for lags between star formation and high  esc .However, in their model, the lag was 10 Myr.This highlights the sensitivity of this exact time delay to the SN model being used.In Kimm & Cen (2014), star particles undergo SNe exactly 10 Myr after their birth, following Schaller et al. (1992).In contrast, the feedback recipe followed in SPHINX 20 includes a number of staggered SNe more accurately representing a single stellar population (Kimm et al. 2015).Specifically, these begin as early as ∼ 3 Myr.
The need for ISM disruption has also been explored at great length.Ma et al. (2020) find that star particles in galaxies with high escape fractions tend to be situated in regions with low column densities out to the virial radius.This is also corroborated by the findings of Trebitsch et al. (2017) (see Figure 14).Moreover, Paardekooper et al. (2015) show that the neutral gas column density within 10 pc of a source is the defining quantity of escape fractions.Kimm & Cen (2014) agree, finding that galaxies with the lowest escape fractions tend to have the highest optical depths out to 100 pc and that the location of feedback is of vital importance.Particularly, the inclusion of runaway OB stars increases average escape fractions, due to the fact that these stars tend to move to lower density regions where the efficiency of feedback is greater.In our model, local gas density is included in our  ISM parameter.Because we have weighted the gas density by the [O II] 3727 luminosity, we specifically pick out the densities in star-forming regions.The anti-correlation we find between  ISM and  esc in SPHINX 20 is in agreement with these previous models.

CAVEATS
Like all numerical simulations, SPHINX 20 employs a series of subgrid models for star formation, feedback, and ISM processes that could impact our results.For example, SPHINX 20 samples a distribution of SN time scales rather than assuming a fixed value which is why we find a critical time scale of 3.5 Myr for LyC leakage to begin.While assuming a delay time distribution is likely more realistic than a fixed value, a different delay time distribution would undoubtedly change which galaxies in SPHINX 20 are leakers, shifting the exact stellar age dependence stated.Likewise, our model for star formation assumes a variable efficiency of conversion from gas to stars.Changing this value will impact the clustering of stars and SN as well as the gas densities near young star particles.This will simultaneously affect the intrinsic emission line luminosities, the observed values (through a changing amount of dust as it is tied to the gas), and the efficiency of SN and radiative feedback.
Since SPHINX 20 does not follow the formation and distribution of dust, we have employed an effective model where the dust-tometal ratio is fixed and dust primarily tracks neutral gas (Laursen et al. 2009).In contrast, observations show that the dust-to-gas mass ratio decreases following a power-law as a function of metallicity (Rémy-Ruyer et al. 2014).While the dust is often a sub-dominant contribution to the optical depth to ionizing photons at these redshifts (e.g.Katz et al. 2022b) and therefore does not impact our escape fractions, the dust model does affect emission line luminosities and UV slopes.This motivates the study of IR lines as a probe of  esc (e.g.Katz et al. 2020b;Ramambason et al. 2022) as they are significantly less sensitive to dust content.
In this work, we have employed the BPASS SED which crucially extends the period over which ionizing photons are released due to binary interactions, compared to other SEDs.Similarly, one could change the model for Wolf-Rayet stars or include X-ray binaries which would similarly impact emission line fluxes.Our models also assume that metal abundance ratios match that of solar, whereas observations demonstrate that they likely vary as a function of metallicity.This will impact gas cooling and the state of the ISM as well as the emission line luminosities.For this reason, we have only worked with oxygen emission lines and their ratios which are likely well captured by our assumptions.
While currently the state-of-the-art for full-box reionization simulations, SPHINX 20 remains subject to limited spatial and mass resolution.This most importantly manifests in our inability to always fully resolve the Stromgren spheres of star particles.We have attempted to remedy this by using complex post-processing methods.However, the gas cells are still limited to have a fixed density on ∼ 10 pc scales.Subgrid density structure will impact emission line fluxes as well as dust attenuation and potentially change the impact of pre-SN feedback.
SPHINX 20 does not resolve the ISM to the same degree as simulations such as those in Kimm et al. (2019Kimm et al. ( , 2022)).In particular, Kimm et al. (2019) suggest that radiation feedback begins to disrupt the ISM of giant molecular clouds to the point of allowing LyC leakage at around 2 Myr, before SNe begin.In such a scenario, the window for effective LyC leakage would begin even earlier, inviting the need for further work with higher resolution simulations and better ISM and dust physics.However, it remains unclear whether gas outside the immediate molecular cloud prevents LyC photons from escaping into the IGM.
SPHINX 20 also neglects some potentially important physical processes such as stellar winds and cosmic rays.This could help lower the local densities around star particles and provide local metal enrichment.Both can impact emission line luminosities and the escape fraction.Likewise, we have neglected the nebular continuum.While properties such as O 32 and Σ SFR are unaffected, the nebular continuum can reduce observed equivalent widths and make  appear redder.This is unimportant when  esc ∼ 100%.Furthermore, the highest EW galaxies in our sample are non-leakers because they have not yet reached the SN time scale and it is these galaxies where the nebular continuum will be the most important.
Finally, SPHINX 20 does not include AGN and therefore captures neither the contribution of their hard radiation spectra nor of their feedback on the LyC escape fraction.As a result, it is important to note that the model derived in Section 4.3 should be used with caution on observations of galaxies with a confirmed AGN presence.In these cases, it is reasonable to expect that active feedback will clear ionized channels and therefore increase line-of-sight LyC escape fractions along the outflows, thus increasing the angle-averaged value for these galaxies.We leave such discussions to future work.
Despite these caveats, SPHINX 20 has been successful in reproducing numerous observations of the high-redshift Universe such as the UV luminosity function (Rosdahl et al. 2022) and Ly luminosity function (Garel et al. 2021).The agreement we find with LzLCS is a promising sign that in many ways SPHINX 20 provides an adequate representation of the physics that leads to LyC leakage.

CONCLUSIONS
We have post-processed a sample of 1,412 star-forming galaxies in the SPHINX 20 cosmological radiation hydrodynamics simulation with CLOUDY and RASCAS to produce a diverse library of 14,120 simulated and dust-attenuated high-redshift galaxy spectra.These galaxies have been specifically selected to potentially be bright enough to be observable with JWST.Using this data-set, which represents part of SPHINX Public Data Release v1, we presented a new generalised framework for observational signatures of LyC leakage.Specifically, we argue that a good diagnostic to identify galaxies with significant LyC leakage should: • Track high sSFR; • Select for stellar populations with ages 3.5 Myr ≲ ⟨Stellar Age⟩ LyC ≲ 10 Myr; • Include a proxy diagnostic for neutral gas content as well as the state of the ISM.This framework can successfully identify samples of galaxies that are highly enriched with LyC leakers.By applying our method to existing indirect  esc diagnostics, we can predict the reasons why each diagnostic will be successful (or fail) and why.For example, we find that high O 32 is a necessary but insufficient criterion for high escape fractions, due to the fact that it traces high sSFR, but also selects for galaxies with dense, dusty ISMs with stellar populations that are too young to disrupt it.Observed UV slope, , is empirically found to marginally satisfy two and strongly satisfy one criteria for / ⊙ > 0.01.It is thus a relatively good diagnostic for the escape fraction.Similarly, E(B − V) is found to satisfy 2/3 criteria and thus traces  esc reasonably well, albeit with significant scatter.In contrast, galaxy properties such as EW(H), Σ SFR , Σ sSFR , M UV , sSFR, and M * are all found to be poor indicators of  esc if used in isolation as they satisfy one or none of our criteria.
We can also satisfy all three criteria with multi-dimensional diagnostics.Selecting galaxies with log 10 (H/L 1500 ) < 1.6 while combining  and E(B − V) (with Equation 3) produces a sample of galaxies of which 67% have  esc > 5%, accounting for 62% of all such galaxies in our data-set.Similarly, we have constructed a generalized linear model that utilizes spectral properties of galaxies to quantitatively predict  esc (see Equation 4).Applying our model to high-redshift Ly emitters observed with JWST, we find LyC escape fractions less than or equal to the observationally estimated Ly es-cape fractions.Our results suggest that bright Ly emitters tend to have LyC escape fractions ≲ 10%.
Though our framework for the physics of indirect estimators of LyC escape has been tested by a robust data-set of mock observations, we recognise that such mock data are dependent on the nebular emissivity and dust absorption models used, inviting future work on better-resolved cosmological simulations with more realistic subgrid physics.The hardest of our three criteria to select for is the correct mean stellar population age, suggesting that this needs to be explored in the context of, for example, SED fitting.Nevertheless, our framework highlights the potential of existing and future JWST data to understand the physics of LyC escape and cosmological reionization.

Figure 1 .
Figure 1.Star formation rate (averaged over 10 Myr) as a function of stellar mass.Each data-point is coloured by the LyC escape fraction, with those with fractions above 10% enlarged.This shows that LyC leakers tend to fall above the main sequence fitted for galaxies at  = {4.64,7, 10} (orange, red, and purple respectively), representing starburst periods.

Figure 2 .
Figure 2. Line-of-sight H-derived LyC escape fractions as a function of true LyC escape fraction for SPHINX 20 galaxies, coloured by the gas metallicity.Metal-rich systems tend to over-predict  esc as compared to the true value.

Figure 3 .
Figure 3. LyC escape fraction as a function of specific star formation rate (left), ionizing flux-weighted mean stellar age (center) and angle-averaged composite neutral gas attenuation parameter  ISM = E(B − V) × ⟨n H ⟩ [OII](right), all coloured by stellar mass.Escape fractions of 20% and 5% are marked with dashed and dotted horizontal orange lines, respectively.The top panels show a histogram of the fraction of galaxies with  esc > 5% in each bin with more than 5 such galaxies.Systems with the highest escape fractions tend to have high sSFR, fall within 3.5 Myr and 10 Myr (indicated by red lines), low neutral gas attenuation parameter, and have low stellar masses.However, each of these requirements is a necessary but insufficient diagnostic for LyC leakers.

Figure 4 .
Figure 4. (Top) angle-averaged neutral gas parameter  ISM as a function of specific star formation rate for galaxies with mean stellar ages between 3.5 Myr and 10 Myr coloured by LyC escape fraction.Systems with  esc > 20% are enlarged.A selection criteria (given by Equation2) to produce a sample highly enriched with leakers, 74% of which have  esc > 5% is shown as the red line.(Bottom) Cumulative distribution for escape fractions for galaxies that satisfy our selection criteria.For comparison, the red star shows that 82% of galaxies selected to be part of LzLCS have  esc < 5%, while 26% of SPHINX 20 galaxies within these criteria have  esc < 5%.

Figure 5 .
Figure 5. LyC escape fraction as a function of halo virial mass (top-left), stellar mass (top-center), gas metallicity (top-right), 10 Myr averaged SFR (bottom-first), 100 Myr averaged SFR (bottom-center), and sSFR surface density (bottom-right) coloured by the LyC luminosity-weighted mean stellar age.Where available, LzLCS data is shown in red for comparison.For each quantity, a histogram is given for the number density of galaxies with  esc > 5% in each bin (shown by the dashed orange line).

Figure 6 .
Figure 6.(Top) Observed dust-corrected (by the Balmer decrement) O 32 as a function of sSFR (left), LyC luminosity-weighted mean stellar age (second) and neutral gas attenuation parameter  ISM (third) coloured by the LyC escape fraction.The histogram (right) shows the fraction of galaxies for a given O 32 with esc > 5% for each bin which contains at least 5 such galaxies.There is a preference for leakers to have 3 < log 10 (O 32 ) < 10.Higher values of O 32 correlate with high sSFR, yet select for younger stellar populations and do not trace the neutral gas content of the ISM.Hence, O 32 by itself does not reliably predict  esc .(Middle) Same as above, but for spectral index .Values of  < −2.5 are highlighted (dashed orange line) as they fulfill all three criteria, empirically selecting for high sSFR, ideal stellar age, and lower neutral gas densities in the ISM.These galaxies are the strongest leakers, as can be seen in the histogram.(Bottom) Same as above, but for the UV attenuation E(B − V).Values of E(B − V) < 0.01 are highlighted (dotted orange line) as they appear to fulfill 2/3 criteria, empirically selecting for galaxies with the correct mean stellar population age and low  ISM .

Figure 7 .Figure 8 .Figure 9 .
Figure 7. LyC escape fraction as a function of observed and de-reddened O 32 , coloured by the ionizing luminosity weighted mean stellar age in each galaxy.Over-plotted are observational results from the LzLCS.We find that LyC escape fractions peak qualitatively at values of log 10 (O 32 ) between 3 and 10.Larger values of O 32 are dominated by young stellar populations which have yet to disrupt the ISM, producing low escape fractions.
4.1.4Σ SFR Star formation rate surface density is perhaps intuitively a good indicator of  esc .Since  esc is predicted to be feedback-regulated (e.g.Trebitsch et al. 2017; Kimm et al. 2017), concentrated star formation may help increase the local efficiency of mechanical feedback, creating optically thin, low-density channels in the ISM.With limited empirical data, Sharma et al. (2017); Naidu et al. (

Figure 10 .
Figure10.(Top) Observed Σ SFR (as defined in Section 2) as a function of sSFR (left), LyC luminosity-weighted mean stellar age (second) and neutral gas attenuation parameter  ISM (third) coloured by the LyC escape fraction.The histogram (right) shows the fraction of galaxies for a given Σ SFR with  esc > 5% for each bin that contains at least 5 such galaxies.Though Σ SFR correlates weakly with sSFR (as both depend explicitly on SFR), it selects for young stellar ages and does not trace  ISM .Therefore, Σ SFR by itself is not a reliable diagnostic for the LyC escape fraction.(Middle) Same as above, but for H equivalent widths.EW(H) traces sSFR very well, but greater values select for stellar ages < 3.5 Myr and do not trace the state of the ISM.We therefore expect no strong relation with the LyC escape fraction, though values of 100 Å are weakly preferred.(Bottom) Same as above, but for M UV .We find that M UV does not trace the sSFR, but bright magnitudes select for the correct stellar ages.However, M UV shows no dependence on ISM density.As a result, leakers show a weak bias towards brighter magnitudes, but M UV is not a useful indicator of the LyC escape fraction.

Figure 11 .
Figure 11.LyC escape fraction as a function of Σ SFR coloured by stellar mass.LzLCS data is shown in red, while the relation suggested by Naidu et al. (2020) is shown in orange.We find that SPHINX 20 galaxies agree well with LzLCS observations, while disagreeing with the proposed relation ofNaidu et al. (2020).

Figure 12 .
Figure12.LyC escape fraction as a function of line-of-sight M UV coloured by the stellar mass.Data from LzLCS is shown in red, agreeing with bright SPHINX 20 galaxies.We find that when all masses are considered, M UV is not a reliable diagnostic for the LyC escape fraction.

Figure 13 .
Figure 13.Line-of-sight observed UV magnitude for each mock observation as a function of the galaxy stellar mass, coloured by the LyC escape fraction.We find that more massive galaxies tend to be more luminous in the UV, and that for a fixed galaxy mass bin, brighter galaxies tend to have higher LyC escape fractions.Therefore, at constant stellar mass, M UV can be used as a LyC diagnostic.

Figure 14 .
Figure14.Line-of-sight measurements of  as a function of E(B − V) for all observations of galaxies in our sample with observed log 10 (H/L 1500 ) < 1.6 coloured by the true LyC escape fraction.Systems with escape fractions greater than 20% are enlarged.A selection has been made (red, given by Equation3) to produce a sample of highly enriched leakers, 67% of which have  esc > 5%.Those selected by this set of criteria account for 62% of the total population of such systems in our sample.

Figure 15 .
Figure 15.Histogram of predicted  esc as a function of true  esc for our entire sample, as estimated by the generalized linear model given by Equation 4 with parameters from Table 1.Points are coloured by the number of sightlines in each bin.The one-to-one relation is shown in red.

Table 1 .
Coefficients and constants required to solve Equation 4 to quantitatively predict the value of  esc from an observed spectrum.