Elevated ionizing photon production efficiency in faint high-equivalent-width Lyman-alpha emitters

While low-luminosity galaxies dominate number counts at all redshifts, their contribution to cosmic Reionization is poorly understood due to a lack of knowledge of their physical properties. We isolate a sample of 35 z~4-5 continuum-faint Lyman-alpha emitters from deep VLT/MUSE spectroscopy and directly measure their Halpha emission using stacked Spitzer/IRAC Ch. 1 photometry. Based on Hubble Space Telescope imaging, we determine that the average UV continuum magnitude is fainter than -16 (~0.01 L_star), implying a median Lyman-alpha equivalent width of 249 Angstroms. By combining the Halpha measurement with the UV magnitude we determine the ionizing photon production efficiency, xi_ion, a first for such faint galaxies. The measurement of log (xi_ion [Hz/erg]) = 26.28 (+0.28; -0.40) is in excess of literature measurements of both continuum- and emission line-selected samples, implying a more efficient production of ionizing photons in these lower-luminosity, Lyman-alpha-selected systems. We conclude that this elevated efficiency can be explained by stellar populations with metallicities between 4e-4 and 0.008, with light-weighted ages less than 3 Myr.


INTRODUCTION
Although recent observations have provided ever more certainty about the timing and duration of the last significant phase transition in the Universe, cosmic Reionization (e.g Planck Collaboration et al. 2018;Bañados et al. 2018), much faint galaxies are numerous at all redshifts (e.g. Atek et al. 2018), few direct observational (spectroscopic) constraints on the efficiency of their ionizing photon production exist. This difficulty is primarily due to the observability of spectral features, particularly those in the rest-frame optical: ground-based spectroscopy can only cover features such as H α in the near-IR until z ≈ 2.8. H α in particular is crucial to understand the contribution of galaxies to Reionization as it is directly related to the intrinsic production rate of ionizing photons (compared to the resonantly-scattered Lymanα). The ratio of the H α flux to the flux of non-ionizing (UV) photons, when combined with the escape fraction of ionizing photons and the number density of galaxies, can be used to determine the total production rate of ionizing photons in the early Universe.
Although detecting H α is currently not possible with traditional spectroscopy for most systems at z > 2.8 (and detections of H β are often infeasible due to its faintness in low-luminosity galaxies), photometric techniques have been developed to measure H α and other rest-frame-optical emission lines in longer wavelength imaging data. For example, the existence of bright, high-equivalent width (EW) optical emission lines in z 4 galaxies is inferred by measuring strong excesses in broad-band Spitzer /IRAC Ch.1 (3.6 µm) and/or Ch. 2 (4.5 µm) photometry (e.g. Shim et al. 2011;González et al. 2012;Labbé et al. 2013;Smit et al. 2014;Roberts-Borsani et al. 2016;Rasappu et al. 2016). These studies have demonstrated that the typical rest-frame EWs of [O iii] and H α in ∼ L galaxies at high z often exceed 300Å. Moreover, the highest-EW sources at z > 7 have ionizing photon production efficiencies that are a factor of 2.5 higher than the "canonical" value, implying a diversity in the ionization properties of the full galaxy population and that strong line emitters could be important contributors to cosmic Reionization (Matthee et al. 2017b;Stark et al. 2017).
The vast majority of the total galaxy population, namely those at sub-L UV luminosities, are much more difficult to understand at these redshifts. The low spatial resolution (FWHM > 1.5 arcsecond) and broad filter widths (> 6000Å) make shallow IRAC spectro-photometry for individual objects useful only when they are isolated and have relatively bright emission lines. Photometric stacking, though, can be used in order to detect fainter emission lines so long as there are no nearby contaminating sources, or those sources can be accurately modeled. As shown in Lam et al. (2019), this work can be extended to faint (M UV > −18; 0.05 L at z = 4) galaxies when using MUSE spectroscopy and ultradeep Spitzer /IRAC imaging with new techniques for modeling contamination from nearby sources (Labbé et al. 2015).
Moving to even fainter sources is necessary to fully understand the "budget" of ionizing photons in the early Universe. Sources fainter than M UV ≈ −17 do not appear in the deepest broad-band imaging taken with Hubble unless they have been gravitationally lensed. Yet, an abundant population of sources with M UV ≈ −15 (0.01 L ) at z ≈ 2.9 − 6.7 have recently been discovered spectroscopically via Lymanα emission in un-targeted surveys with the MUSE spectrograph (Bacon et al. 2017;Maseda et al. 2018). Their bright Lyman-α emission and faint UV continuum (detected only in stacks) implies that they have extreme Lyman-α EWs, in excess of 150Å. Their number density is consistent with a simple extrapolation from the general population of starforming Lyman-α-emitters at these redshifts. Their high EWs can only be produced in stellar populations with ages less than 10 Myr and metallicities less than a few per cent Z (e.g. Raiter et al. 2010;Hashimoto et al. 2017a, cf. Neufeld 1991, physical conditions which are also conducive to efficiently producing ionizing photons. Here we aim to study the ionizing photon production efficiency in these high-EW Lyman-α emitters (LAEs) by using stacked Spitzer /IRAC photometry to measure the H α emission. We can put these results into context by comparing to the numerous studies at similar redshifts that have probed more luminous galaxies: LAEs (e.g. Harikane et al. 2018;Lam et al. 2019), H α emitters (HAEs; Matthee et al. 2017a), and continuum-dropouts (e.g. Bouwens et al. 2016). Our values will also be compared to larger samples of "typical" ≈ 0.3−3 L star forming galaxies at z ≈ 2 (e.g. Shivaei et al. 2018), and theoretical stellar population models for the evolution of ξ ion with physical properties.
This article is organized as follows. In Section 2 we describe the MUSE and IRAC datasets used for this study. In Section 3 we discuss how the data are used to determine the H α luminosity and hence ξ ion , with a discussion of these results given in Section 4. Finally, in Section 5 we summarize the primary results and give an outlook for future studies. We adopt a flat ΛCDM cosmology (Ω m = 0.3, Ω Λ = 0.7, and H 0 = 70 km s −1 Mpc −1 ) and AB magnitudes (Oke 1974) throughout.

DATA
Using data from the MUSE spectrograph (Bacon et al. 2010) on the Very Large Telescope, we select high-EW LAEs from the MUSE UDF survey (Bacon et al. 2017;Inami et al. 2017). This survey covers approximately 9 square arcminutes to a depth of 10 to 30 hours at optical wavelengths (4750 − 9300Å). The un-targeted nature of MUSE spectroscopy means that objects do not need to be pre-selected based on their continuum properties, and we can search the data cubes for spectral features such as emission lines without requiring the detection of a continuum counterpart. From these catalogs, we only consider LAEs in the range 3.829 < z < 4.955 with confident redshifts (i.e. CONFID of 2 or greater), where H α emission lies within the IRAC Ch. 1 bandpass.
In order to select a sample of high-EW LAEs, we must select not only on the Lyman-α flux but also on the UV continuum level. Using the aperture flux measurements from Maseda et al. (2018), based on the combined 11-band Illingworth et al. (2013) and Oesch et al. (2018) reductions, we select only MUSE LAEs from the Inami et al. (2017) catalog that have no >3-σ HST detections in bands redwards of Lyman-α. To wit, objects detected only in the HST band that contains Lyman-α are still included in the sample. This results in a sample of 41 objects. This selection differs from that presented in Lam et al. (2019), which is also based on Lyman-α-emitters from the MUSE UDF survey, as they require continuum detections for individual objects. The threshold of 3-σ compared with 5-σ as in Maseda et al. (2018) is to restrict ourselves to only the highest-EW sources by limiting the acceptable detection limit in the UV contin-uum (the average F850LP 3-σ magnitude limit in the field is approximately 30.0; Illingworth et al. 2013). We note that the qualitative conclusions of this work are independent of this choice.
A key addition to the HST photometry is near-infrared observations with Spitzer /IRAC. Here we utilize the combination of data from the "GOODS Re-ionization Era wide-Area Treasury from Spitzer " (GREATS, PI: I. Labbé; M. Stefanon, in prep.) as well as all previous IRAC observations taken over the larger area around the UDF (Labbé et al. 2015;Lam et al. 2019). The average exposure time of the IRAC Ch. 1 imaging is 167 hours (26.6 magnitude using the photometric procedure of Labbé et al. 2015, 5-σ), with some regions in the UDF receiving 278 hours of coverage. In IRAC Ch. 2 the exposure time is similar (average 139 hours, deepest 264 hours, 26.5 magnitude).
From our candidate sample of 41, we manually remove 6 objects with severely contaminated IRAC photometry, and those sources with significant residual flux in the cleaned images (see Section 2.1). This leaves a final sample of 35 high-EW LAEs with a median redshift of z = 4.52, for which HST/ACS F775W cutouts are shown in Figure 1. This represents 19 per cent of all LAEs in this redshift range from the MUSE UDF catalogs (Inami et al. 2017, and Bacon et al. in prep.). Based on the UV continuum slope measured for these galaxies described in Section 3.4 and the median Lyman-α flux from the PSF-weighted MUSE spectra (which does not include extended emission; Inami et al. 2017), the average rest-frame Lyman-α EW of the sample is 259Å 1 .

De-blended IRAC photometry
The large spatial point spread function (PSF) of IRAC (FWHM ≈ 2 arcsec) presents a challenge when dealing with compact sources in crowded fields. This is especially true in deep imaging data, where the wings of the PSF from neighboring sources often overlap. To correct for this effect, a higher spatial resolution image can be combined with a model PSF in order to de-blend the photometry in a crowded field. In this case, we use HST F850LP imaging as the highresolution prior image and follow the procedure of Labbé et al. (2015) and Lam et al. (2019). We create a model IRAC flux distribution for all continuum-detected sources in the field, which we then subtract from the IRAC data. We are therefore left with a residual image that should only contain flux from the primary (continuum-undetected) source All 12 × 12 arcsecond cutouts are visually inspected for defects, which can be caused by e.g. poorly-modeled bright sources or can occur in exceptionally crowded fields. Any object that is deemed to have contaminated photometry due to model residuals is removed from the sample. We note that these contamination effects are independent of the intrinsic properties of the primary source. The resulting "clean" sample has residuals in the range of 0.01 to 0.3 nJy (cf. the mean from the "contaminated" sources of 0.6 nJy), determined via the standard deviation of pixel values within an annular region centred on the source and extending from 1.5 to 3 arcseconds.

DETERMINATION OF ξ ION
The three ingredients required to measure ξ ion are the flux of H α (a proxy for the intrinsic ionizing photon production rate), the flux of the UV, and the escape fraction of ionizing photons: where Q(H 0 ) is the intrinsic rate of ionizing photons with units of s −1 and the intrinsic UV luminosity, L UV,int , has units of erg s −1 Hz −1 . Below we outline the determinations of each of these quantities from our data.

H α from IRAC stacking
We create a three-dimensional array of the cleaned IRAC cutouts for the LAEs, each of which are centered on the peak of the Lyman-α flux. We calculate the mean of the array at each spatial pixel, incorporating a sigma-clipping procedure with a threshold of 2-σ in order to ensure that individual bright pixels do not dominate the mean. This "stacking" procedure results in a two-dimensional image, representing an average of the input sources. As the IRAC PSF varies across the field due to the combined nature of the GREATS dataset (see Section 3.2 of Labbé et al. 2015), we also create a 3D array of the local PSFs at the position of each LAE. This stack of PSFs is combined in the same way as the stack of the science data arrays, using the same mask derived from sigma-clipping. We fit this PSF to the stacked image to determine the total flux, under the assumption that our sources are unresolved compared to the 2 arcsecond IRAC PSF (cf. the HST images). The fit includes uncertainties in the flux determined from stacking the IRAC noise images in the same way as the data. This stacking procedure is used to measure the photometry and create the images shown in Figure 2.
For the remainder of this work, we deal with the distribution of the output IRAC Ch. 1 and Ch. 2 fluxes using random subsets of the 35 individual LAEs. Namely, we create a series of 10,000 data arrays made up of a random set of 35 LAEs, where we sample the full set with replacement. Each of these arrays is analyzed as above to determine flux in Ch. 1 and Ch. 2. This bootstrap procedure allows us to measure the distribution of H α fluxes within the sample while taking into account photometric uncertainties.
We determine the H α fluxes for each of the 10,000 bootstrap iterations by subtracting the Ch. 2 flux from the Ch. 1 flux, assuming Ch. 2 probes the underlying stellar continuum (see Section 3.1.1). We attribute all of the excess flux in Ch.1 to Hα emission: in the models of Gutkin et al. (2016), for metallicities 10 per cent Z the ratio of [N ii] ([S ii]) to H α never exceeds 0.01 (0.1) over a wide range in physical ISM conditions. While a detailed determination of the metallicities of these systems is beyond the scope of this paper and will require additional spectroscopic data, we stress that only models of normal stellar populations with extremely young ages (< 3 Myr) and/or low metallicities (Z < 0.008; 0.4 Z ) can reproduce such large Lyman-α  EWs (Hashimoto et al. 2017a,b, and Section 4). Similar conclusions are drawn by Trainor et al. (2016) for the faintest z ≈ 2.5 LAEs, where they have spectroscopic access to strong optical emission line ratios.

The continuum level around H α
In order to properly measure the emission line flux from a photometric broad-band measurement, we need to establish the local spectral continuum level. For the aforementioned studies that perform measurements of H α EW at high-z, the continuum level at the position of H α is typically established with photometric detections in the bands bluewards/redwards of H α, such as the K s -band or IRAC Ch. 2. In our case, the ground-based K s -band data (from zFOURGE; Straatman et al. 2016) are not deep enough for reliable detections of the continuum flux bluewards of H α.
As the IRAC Ch. 2 stacks are deeper and are subject to similar systematics as the Ch. 1 data, we use them to constrain the continuum redwards of H α directly. We assume a flat continuum slope in f ν (van der Wel et al. 2011). Any residual contamination, if present in both Ch. 1 and Ch. 2, would not bias our determination of the line flux (and hence ξ ion ) since this is determined via the difference between both measurements. Furthermore, residual contamination would lower the measured H α EW for a spectrum that is flat in f ν . We also assume no contribution to the Ch. 2 flux from emission lines such as [S iii], which is present for the z < 4.51 subset of LAEs. As shown in Lam et al. (2019), [S iii] λ9069 can be strong in LAEs, with EW values in excess of 100Å. Constraints on [S iii] do not currently exist for galaxies as faint as the LAEs probed here, but at metallicities below 0.3 Z the ratio of [S iii] to H α is predicted to be less than 0.05 (Charlot & Longhetti 2001). Furthermore, any contribution of [S iii] to the IRAC Ch. 2 photometry would mean that we are over-estimating the continuum level and hence under-estimating the strength of H α (and ξ ion ).
We determine a mean H α EW for the sample of 632Å. The 68 per cent confidence interval, weighted by the signalto-noise of each bootstrap measurement of the H α EW, is between 210 and 1600Å. The large range in EW measurements is primarily driven by the lower signal-to-noise in the IRAC Ch. 2 stacks, which is typically a factor of 1.9 lower than that derived in IRAC Ch. 1. In the case of measurements at low-EW (i.e. < 210Å), the typical signal-to-noise in the EW determination is 2.1.  the H α luminosity is 1.2 M yr −1 (0.4 − 2 M yr −1 ; 68 per cent) when using the Murphy et al. (2011) conversion. This is larger than the implied UV-based SFR of 0.1 M yr −1 , likely due to the fact that H α emission probes star formation on shorter timescales than the UV, namely < 10 Myr, which is important given the implied young ages for these systems: see Section 4.

UV luminosity from HST stacking
Following the same procedure as for the IRAC stacks, for each bootstrap sample we create stacks of the HST images in order to determine M UV ; for more details on this procedure, see Maseda et al. (2018). For the majority of galaxies in our sample, ACS/F775W probes the rest-UV continuum directly. However, for the highest-z objects this band contains flux from Lyman-α: 3 of the 35 objects have Lyman-α at a position where the throughput of F775W is greater than 33 per cent. The major results of this paper are consistent within the errors (< 1-σ) when these higher-z sources are included. Throughout, we assume that F775W probes the UV continuum alone for our sample. Any potential contamination to this flux from Lyman-α emission would mean we are underestimating ξ ion . Additionally, an extrapolation assuming a UV continuum slope of -2.5 from the measured F850LP magnitude is consistent with the measured F775W flux (see Section 3.4). Such a continuum slope is expected for faint LAEs (Hashimoto et al. 2017a), and therefore this implies F775W is a good tracer of the rest-frame UV continuum. We convert the measured M UV into a luminosity assuming the median redshift for the LAEs in the bootstrap iteration.

Ionizing photon escape fraction
The escape fraction of ionizing photons relates the observed ionizing photon flux with the intrinsically-produced ionizing photon flux: For brevity, we will refer to f ion esc as f esc throughout the remainder of the text. This parameter is difficult to measure directly as Q(H 0 ) is not a readily-observable quantity. Studies such as Ono et al. (2010) suggest using SED fitting to the broadband photometry to estimate f esc indirectly. At z = 6 − 7 they find f esc to be consistent with zero and constrain it to be < 0.6. Using a similar method, Harikane et al. (2018) fit a relation to f esc versus EW Lyα , finding a value consistent with zero for rest-frame EWs in excess of 100Å.
We therefore assume f esc = 0 throughout, but note that a higher value of f esc will result in a larger ξ ion .

Dust attenuation
The true amount of dust extinction in these systems is difficult to establish with SED fitting as even in stacked images we only observe the rest-UV portion of the galaxies in addition to the H α emission. Many studies, particularly at these redshifts z ≈ 4 − 5, instead rely on a measurement of the UV continuum slope β and assume a dust law, as in Meurer et al. (1999). We measure a median β = -2.43 from power-law fits to the rest-UV photometry from each of the bootstrap iterations, with a 68% confidence interval from -3.00 to -2.04, from a power-law fit to the stacked HST photometry for the full stacked sample. We defer a detailed treatment of the β slopes of the full sample of high-EW LAEs to a forthcoming paper, but we would like to highlight that the intrinsic stellar UV continuum slope for systems with the highest Lyman-α EWs must be even bluer than the observed value due to the contribution of nebular continuum emission (Raiter et al. 2010).
For β < -2.23, Meurer et al. (1999) estimate zero dust correction. As our best-estimate of β (and 68 per cent of all bootstrap iterations) is below this, we determine that we do not need to include an additional term for dust attenuation when measuring line fluxes. Although our measurement is uncertain due to the intrinsic faintness of the sample, we a priori expect such a correction to be negligible based on results from (UV-brighter) LAEs at similar or higher redshifts (e.g. Stark et al. 2015;Hashimoto et al. 2017a;Harikane et al. 2018) and the result from Trainor et al. (2016) which shows an anti-correlation between nebular reddening and EW Lyα . Similarly, Tang et al. (2019) measure the dust attenuation via the Balmer decrement for high-EW [O iii]-and H α-emitters, finding negligible extinction for the highest-EW objects. Finally, such a result is also expected when considering the implied low stellar masses and gas-phase metallicities required to power the Lyman-α emission (Garn & Best 2010).

The sample distribution of ξ ion
For each bootstrap iteration, we convert the H α flux into a luminosity assuming the median redshift for the LAEs that contributed to the stack. We then convert the luminosity (without a dust correction, as explained above) into the ionizing photon production rate Q(H 0 ) assuming Case B recombination at a temperature of 10 4 K according to: as in e.g. Murphy et al. (2011). We can therefore calculate ξ ion according to the usual formula given in Equation 1. The weighted mean of the distribution from the bootstrap iterations assuming zero f esc is log 10 (ξ ion / Hz erg −1 ) = 26.28, with a 68 per cent confidence interval from 25.89 − 26.56, weighted by the individual uncertainties on each measurement.
In Figure 3, we show our estimate of ξ ion versus M UV , compared to literature samples of line-selected samples (HAEs and LAEs Matthee et al. 2017a;Harikane et al. 2018;Lam et al. 2019) and continuum-selected galaxies ) at similar redshifts, as well as "normal" M > 10 9 M , L ≈ 0.3−3 L star-forming galaxies at z ≈ 2 from Shivaei et al. (2018). We use their value derived from an SMC extinction curve for consistency with the other studies presented here. The bottom panel shows the relationship between the measured luminosities of H α and the rest-frame-UV, which are the two components that go into ξ ion ; any trend in ξ ion versus M UV could simply be because the UV luminosity goes into both the ordinate and the abcissa. While the information content in both panels is the same, the axes in the top panel are necessarily correlated.
While all of the LAE-and LBG-derived data points are within 2-σ (assuming the bin widths correspond to a measurement uncertainty on L UV ) of the Shivaei et al. (2018) relation or its extrapolation in either L Hα or L UV , the point derived here is more significantly above the relation for L Hα and L UV . Therefore, relative to their UV luminosity, the MUSE high-EW LAEs are more efficient at producing ionizing photons than the general population of more luminous LBGs and LAEs at these redshifts.

The Lyman-α escape fraction
As discussed in Section 3.3, it is difficult to directly measure f esc in these galaxies, although based on other samples we expect it to be low. We can, however, measure the escape fraction of Lyman-α photons:

DISCUSSION
Qualitatively, several studies have found a trend towards having higher ξ ion values in galaxies with bluer UV continua, a proxy for galaxies with more dominant young stellar populations (Duncan & Conselice 2015;Bouwens et al. 2015bBouwens et al. , 2016. However, Matthee et al. (2017a) show that this is a product of using the UV slope itself (or galaxy SEDs that predominantly trace the rest-UV) as a proxy for dust attenuation. This is also sensitive to the assumed dust model, as can be seen in Figure 4 of Shivaei et al. (2018) where using different dust laws can produce a marginally positive or a negative correlation between ξ ion and M UV . Nevertheless, Matthee et al. (2017a) find that ξ ion increases with decreasing UV luminosity, increasing specific star formation rate, and increasing H α EW in their sample of z ≈ 2 H α emitters.
In Figure 4 we compare the equivalent width of H α with the derived ξ ion value for the emission line-selected samples. In addition, we plot the fit to the relation from Tang et al.
(2019) derived from z ≈ 1−2 "extreme emission line galaxies" (EELGs), which are selected purely on the basis of high-EW optical emission lines ([O iii] and/or H α). These systems have gas-phase metallicities < 0.3 Z and mass doubling times on the order of 100 Myr (Maseda et al. 2013(Maseda et al. , 2014. While these physical properties are broadly similar to those of literature samples of LAEs at higher redshifts, the offset compared with the high-EW LAEs presented here argues for a difference in ξ ion at a fixed age. This is due to the fact that the EW of Balmer lines scales inversely with the age of the stellar population in these types of systems (e.g. van der Wel et al. 2011). Here and throughout, we specifically refer to the light-weighted age, which is dominated by the most recent generation of star formation: an older (> 1 Gyr) stellar population could contribute to the total mass of the galaxy but would not contribute significantly to the UV or H α luminosities. The highest-EW objects in the Tang et al. (2019) sample, with ages < 10 Myr, have the highest values of ξ ion . As these galaxies and the high-EW LAEs presented here have little to no dust attenuation (see Section 3.4), any correlation between ξ ion and age is not driven by using the rest-UV to correct for dust attenuation. At a fixed H α EW, our sample presents a larger value of ξ ion than the literature LAE/HAE results, implying a lower gas-phase metallicity. However, in all samples a trend with higher ξ ion at younger ages is found. Qualitatively, for a constant star formation history ξ ion is roughly constant for the first 1 Myr before monotonically declining until the population is approximately 1 Gyr old, . The error bars are not independent as a higher H α EW necessarily implies a higher ξ ion . The relationship between the EW of H α and ξ ion is consistent between the literature samples, with a strong trend to higher ξ ion at higher H α EWs. As the H α EW is inversely proportional to the age, this smooth trend suggests that younger ages are the main driver of elevated ξ ion . Our elevated value at fixed H α EW suggests that these LAEs have a lower (gas-phase) metallicity than other samples presented in the literature (see Figure 5).
when it plateaus again ( Figure 5). This is related to differences in the star-formation timescales probed by H α and the UV, and the effect is strongest between 1 and 10 Myr (or longer when binary stellar evolution is included in the models, e.g. Stanway et al. 2016). For an instantaneous burst of star formation, the value of ξ ion drops strongly after 1 Myr. At a fixed age (and hence UV luminosity) for both star formation histories, decreasing metallicity results in an increasing ξ ion or H α luminosity. So, while young ages result in higher ξ ion values for the (bursty) HAEs compared to the general population of star-forming galaxies, the observed offset in ξ ion at fixed H α EW (and hence fixed age) for the high-EW LAEs shown in Figure 4 suggests that the metallicities must be lower and/or f ion esc must be higher. Independent constraints on the metallicity from e.g. rest-optical spectroscopy will be required to determine the contribution of each.
In Figure 5, we show the predictions from the Raiter et al. (2010) stellar population models for the evolution of the H α, Lyman-α, and UV luminosities at different metallicities for a constant SFR and an instantaneous burst (their "cs5" and "is5" models, respectively). These models use the stellar tracks, atmospheres, and prescriptions for nebular (line and continuum) emission from Schaerer (2003). For the nebular emission, they assume ionization-bounded nebulae with constant electron temperature and densities, and that all photons are absorbed inside the H ii regions (i.e. f ion esc = 0). We consider models with a Salpeter (1955) initial mass function, with a high-mass cutoff of 500 M .
At each age and for a given star formation history, the fixed-metallicity models predict our observable quantities: ξ ion , the EW of Lyman-α, and the UV continuum slope β. We can compare these predicted values to the posterior distributions derived from our bootstrapping procedure and derive a (relative) likelihood for each model. This grid of likelihoods is shown in Figure 6. While we do not have independent constraints on the star formation history, in both cases the most likely models have metallicities of 0.001 (4×10 −4 − 0.008) and ages less than 3 Myr. Lower metallicity models are also permitted with older ages, up to 300 Myr. Although the specific metallicity threshold is dependent on our choice of stellar population model and IMF (cf. Stanway et al. 2016), our median value of ξ ion is difficult to reproduce with any current set of models even when the high-mass cutoff of the IMF is set to 500 M (T. Nanayakkara in prep.). This could be related to the lack of very hard ionizing photons in these models which struggle to reproduce observations of high-z, low-metallicity galaxies (e.g. Nanayakkara et al. 2019;Kewley et al. 2019). Trainor et al. (2016) determine a maximum ξ ion value of 10 25.6 Hz erg −1 at 7 per cent Z from their sample of z ≈ 2.5 LAEs, some of which have Lyman-α EWs in excess of 100Å and have UV magnitudes fainter than −18. Based on their measurement of a high ionization parameter across their full sample, they conclude that age alone does not drive the elevated ξ ion that they measure and hence these galaxies should be producing ionizing photons in a steady state. Indeed, even at 100 Myr the models shown in Figure 5 with metallicities below 3 per cent Z are elevated with respect to the canonical value. Metallicity can indeed play a role in having an elevated ξ ion , as the extreme Lyman-α EWs require metal-poor (< 0.02 Z ) stellar populations with ages of 10 Myr or less ( Figure 5 and Ono et al. 2010;Hashimoto et al. 2017a).
The EW of Lyman-α is observed to scale with f Lyα esc (which is correlated with f ion esc , albeit with large scatter Dijkstra et al. 2016), and for our sample of high-EW LAEs we would expect to have f Lyα esc values close to 1 (Sobral & Matthee 2019). However, we do not observe such large values of f Lyα esc : as shown in Section 3.6, the mean value is 0.217 assuming Case B recombination (cf. Raiter et al. 2010). This matches the value of 0.3 found in Trainor et al. (2015) despite that sample having a mean Lyman-α EW of only 43Å. Interestingly, recent observations of an EELG at z ≈ 1.8 by Erb et al. (2019) find f Lyα esc to be 0.097, which is also low for its Lyman-α EW. Jaskot et al. (2019) do not find a strong correlation between H α EW, Lyman-α EW, and f Lyα esc in "green pea galaxies," which have extreme ionization fields that are plausibly similar to the galaxies presented here. They note that the highest H α EWs reflect high intrinsic Lyman-α production and hence a large escape fraction is not required to produce the observed EWs of Lyman-α. In fact, the Sobral & Matthee (2019) Raiter et al. (2010) models to reproduce our observations as a function of age and metallicity for a constant star formation history (left) and an instantaneous burst (right). As each combination of metallicity, age, and star formation history produces an estimate for each of our three measured quantities (ξ ion , Lyman-α EW, and β; Figure 5), we can calculate a likelihood for that combination by comparing the predictions to our posterior probability distributions for each quantity. The total likelihood is then the product of the three individual probabilities, scaled to a maximum of 1. As shown by the colored contours, the most likely models have metallicities of 4×10 −4 − 0.008 and ages less than 3 Myr, regardless of the star formation history. We cannot, however, conclusively rule out more metal-poor stellar populations with older ages (> 10 Myr for constant star formation or > 3 Myr for an instantaneous burst).
values measured here (259Å; median). In addition, Smith et al. (2019) find that the escape of Lyman-α photons lags behind the star formation activity by several tens of Myrs, hence our selection of the youngest star formation episodes via high-EW Lyman-α could preferentially select periods of relatively low escape.

Alternatives to Star Formation
So far we have only considered star formation as the source of ionizing photons in these systems. Overall, with the current data it is difficult to determine what is the main source of ionizing photons inside these galaxies. As previously mentioned, existing models of normal stellar populations can generally reproduce our observables (i.e. Figure 5). Similar conclusions are drawn by Trainor et al. (2016) for z ≈ 3 LAEs, where they use strong optical emission line ratios to demonstrate that star-formation is the dominant source of ionizing photons. In addition, we have demonstrated in Maseda et al. (2018) that the undetected sample plausibly represent the high-EW extension of the distribution of LAEs at these redshifts, a population which is not dominated by AGN. Thus, we do not believe that AGN are the only or even the dominant contributor to these systems. However, we cannot conclusively rule out some contribution, either as the primary source in a subset of objects or a secondary source in all objects. Further spectroscopy is required in both the restframe-UV and restframe-optical.

OUTLOOK AND CONCLUSIONS
In this work, we have identified a population of 35 z ≈ 4 − 5 high-EW LAEs using deep MUSE spectroscopic data in the UDF, with a median Lyman-α EW of 249Å. By combining this with ultra-deep Spitzer /IRAC photometry, we have measured H α emission with an EW of 632Å (210 − 1600 A; 68 per cent) from these systems, and used it to calculate the ionizing photon production efficiency, ξ ion (Equation 1). Our primary conclusions are as follows: • Using Spitzer /IRAC photometry to ≈ 200 hour depth and sophisticated techniques for source deblending, we detect H α emission in stacked data for objects with intrinsic UV magnitudes fainter than −16 (Figure 2).
• The mean value of ξ ion for these high-EW LAEs is 10 26.28 Hz erg −1 for an escape fraction of zero, with a 68 per cent confidence interval from 10 25.89 − 10 26.56 Hz erg −1 (Section 3.5). At a UV magnitude of −15.7, this value is a factor of 8.7 in excess of the "canonical" value from literature studies of emission line-selected and continuum-selected samples at similar and higher redshifts, and also higher than the most extreme values for z ≈ 7 galaxies from Stark et al. (2017) or local galaxies from Chevallard et al. (2018).
• While the values of ξ ion from the literature are consistent with a constant value at all M UV (e.g. Lam et al. 2019), our observation lies significantly above this relation ( Figure  3). This naturally follows from selecting objects with bright Lyman-α luminosities (and hence ionizing photon production rate) compared to the UV continuum, i.e. selecting on the EW of Lyman-α.
• Based on the observed trend between H α EW and ξ ion , as well as models of the evolution of ξ ion from stellar population synthesis, we determine that an elevated H α luminosity compared with the UV luminosity is likely a natural consequence of a gas-phase metallicity between 4×10 −4 and 0.008 and an age younger than 3 Myr, regardless of the star formation history (Figures 5 and 6).
Typical galaxies selected via the Lyman-break technique (e.g. Bouwens et al. 2017) require detections in the rest-UV, and hence are biased against the youngest (< 10 Myr) galaxies which necessarily have not produced much UV flux. This is true for LAE selections as well (e.g. Lam et al. 2019), even in samples derived from narrow-band imaging (e.g. Harikane et al. 2018), as the relative depth of the emission line detection threshold with respect to the continuum detection threshold prevents these studies from finding the youngest systems. Our MUSE Lyman-α selection, on the other hand, is closer to an H α selection considering it is also a measurement of (emergent) ionizing photons; by requiring non-detections in the rest-frame-UV, we preferentially select the youngest systems. With a selection based on maximal line emission and minimal continuum, it is logical that we have found a population of galaxies with elevated values of ξ ion compared with older, more massive galaxies.
We expect that observing continuum-faint LAEs with lower line EWs would result in a lower determination of ξ ion as these systems could have older stellar populations, higher metallicities, or both. Indeed, lower-z observations of ξ ion show significant scatter at low luminosities and stellar masses (M. Paalvast et al. submitted), which is potentially due to a distribution in the ages, star-formation histories, and metallicities in this regime. These galaxies are currently unobservable in emission at high-z, but it is possible to measure the product of f esc and ξ ion for systems detected in absorption down to M UV ≈ −16 (Meyer et al. 2019) which could select a population of galaxies that are distinct from LAEs.
Various hydrodynamical simulations predict that the star formation histories of low-mass galaxies in the early Universe is episodic in nature, with periods of star formation followed by more quiescent phases (e.g. Shen et al. 2014;Muratov et al. 2015). Although the precise duty cycle of these star formation episodes is unknown and could vary with properties such as the galaxy halo mass (van der Wel et al. 2011), a single galaxy could undergo multiple episodes of efficient ionizing photon production without significantly building up its stellar mass (e.g. Domínguez et al. 2015). Although a phase with extreme Lyman-α EW may not occur during the lifetime of every galaxy, depending on their star formation and metal enrichment histories, all galaxies should produce an excess of ionizing photons at young ages for several Myrs. Further work is required to fully characterize the duty cycle of intense star-formation episodes at high-z, but the steep faint-end slope of the UV luminosity function and the observed high number density of high-EW LAEs implies that a significant number of such events should be taking place across cosmic time: at z = 4 − 5, high-EW LAEs as selected here represent 6.5 per cent of the full galaxy population at a UV magnitude of −16, based on the Bouwens et al. (2015a) luminosity functions. Such a large cumulative number of these episodes could have a significant contribution to cosmic Reionization, especially as LAEs have higher values of f ion esc compared to continuum-selected samples (e.g. Trainor et al. 2015;Marchi et al. 2018) and have number densities at z > 6 that are high enough that they can plausibly contribute significantly to Reionization (Drake et al. 2017). As stated above, understanding the distribution of Lyman-α EWs and the relation between LAEs and the general galaxy population at these redshifts is critical, along with the duty cycle, to fully understand the potential contribution of these faint sources to Reionization.
A detailed discussion of the implications for Reionization is beyond the scope of this work. Many studies combine the measured UV luminosity density ρ UV , derived from UV luminosity functions, with ξ ion and f ion esc in order to infer the ionizing emissivity N ion (e.g. Bouwens et al. 2015b). As we show in Figure 3, ξ ion can vary with the UV luminosity (e.g. Duncan & Conselice 2015) and the dust content or metallicity of the galaxy (Shivaei et al. 2018). Additionally, ρ UV is not well-determined down to M UV < −15 as the majority of the observational constraints come from strong gravitational lensing where the associated systematic uncertainties are large (Bouwens et al. 2017;Atek et al. 2018). Finally, the relationship between LAEs and the general population of galaxies, particularly at these extremely low luminosities, is unclear (e.g. Mesinger et al. 2015).
While these observations are the first step in understanding the ionizing photon production efficiency in such extreme systems, our method necessarily determines the average properties: individual objects could differ significantly. Similar high-EW LAEs at z < 4 could be studied from the ground in the near-IR with detections of H β in extremely long integrations with current instruments, or with future near-IR capabilities provided by the next generation of Extremely Large Telescopes. With the advent of new spacebased facilities such as JWST, however, these measurements can be done out to higher redshifts using the brighter H α emission line. Combined with deep rest-UV imaging and spectroscopy to potentially measure f ion esc , we will obtain a more complete picture of the production and escape of ionizing photons from the abundant low-luminosity population of galaxies in the early Universe.