The NIRVANDELS Survey: a robust detection of $\alpha$-enhancement in star-forming galaxies at $z\simeq3.4$

We present results from the NIRVANDELS survey investigating the gas-phase metallicity ($\mathrm{Z}_{\mathrm{gas}}$, tracing O/H) and stellar metallicity ($Z_{\star}$, tracing Fe/H) of 33 star-forming galaxies at redshifts $2.95<z<3.80$. Based on a combined analysis of deep optical and near-IR spectra, tracing the rest-frame far ultraviolet and rest-frame optical respectively, we present the first simultaneous determination of the stellar and gas-phase mass-metallicity relationships (MZRs) at $z\simeq3.4$. In both cases, we find that metallicity increases with increasing stellar mass ($M_{\star}$), and that the power-law slope at $M_{\star} \lesssim 10^{10} \mathrm{M}_{\odot}$ of both MZRs scales as $Z \propto M_{\star}^{0.3}$. Comparing the stellar and gas-phase MZRs, we present direct evidence for super-solar O/Fe ratios (i.e., $\alpha$-enhancement) at $z>3$, finding $\mathrm{(O/Fe)}\simeq (2.54 \pm 0.38) \times \mathrm{(O/Fe)}_{\odot}$, with no clear dependence on $M_{\star}$.


INTRODUCTION
The metal content of galaxies is affected by past and current star formation, gas accretion and galactic winds, and therefore constrains all aspects of the cosmic baryon cycle. Of particular interest is the evolution of galaxy metallicity with cosmic time, and the scaling relations between metallicity and other galaxy properties, most notably stellar mass ( ★ ). Accurately determined, these relations and their redshift evolution serve as powerful tests of theoretical models of galaxy evolution (Maiolino & Mannucci 2019). Current constraints on galaxy metallicities primarily come from observations of strong nebular emission lines emitted at rest-frame optical wavelengths ( 3500 − 7000Å) . When measured from integrated spectra of star-forming galaxies, the ratios of these emission lines are sensitive to the gas-phase oxygen abundance (O/H) of galactic H regions (Kewley et al. 2019). Using this technique, gas-phase metallicities ( g ) have been measured for sizable samples of galaxies from the local Universe out to 4 (e.g. Tremonti et al. 2004;Sanders et al. 2020a). Based on these observations, it is now wellestablished that g is primarily correlated with ★ at all redshifts (i.e., the mass-metallicity relationship, or MZR) with strong evidence for a secondary dependence on star-formation rate (SFR) or molecular gas fraction (i.e., the 'fundamental metallicity relationship', or FMR; Mannucci et al. 2010).
An alternative approach to measuring galaxy metallicities at > 2 utilizes observations of the stellar continuum at far-ultraviolet wavelengths (FUV; 1000 − 2000Å). In contrast to nebular emission-line ★ E-mail: fc@roe.ac.uk measurements, these estimates of the stellar metallicity ( ★ ) are sensitive to the photospheric iron abundance (Fe/H) of O-and B-type stars within galaxies (Leitherer et al. 2010). Early efforts to determine stellar metallicities from FUV spectra at high redshift were limited by small samples sizes (e.g., Halliday et al. 2008;Sommariva et al. 2012). However, in Cullen et al. (2019), building upon this early work, we utilized the large number of ultra-deep rest-frame FUV spectra provided by the VANDELS survey (McLure et al. 2018) to publish the first determination of the stellar mass-metallicity relationship at > 2. We found that the stellar mass-metallicity relationship at 3.5 has a similar shape but lower overall normalization when compared to the local relation, mirroring the redshift evolution observed in the gas-phase relation (see also Calabrò et al. 2021).
Access to deep rest-frame FUV and optical spectra at > 2 (from ground and space-based optical and near-IR spectroscopy respectively), offers a unique opportunity to move beyond single-element abundances, enabling the study of abundance ratios at early cosmic epochs. Specifically, the combined analysis of FUV+optical spectra allows for the simultaneous determination of g and ★ , tracing the ratio of oxygen to iron (O/Fe) in young stars and the surrounding ISM (Steidel et al. 2016). The O/Fe ratio is of interest because it is a sensitive probe of the star-formation and chemical enrichment history of galaxies. Typical O/Fe ratios at high redshift are expected to be enhanced relative to the solar value due to the fact that the element abundance ratios in relatively young star-forming systems will be dominated by core-collapse supernova (CCSNe) yields (Maiolino & Mannucci 2019). Indeed, this result has already been reported in the literature. Based on a simultaneous rest-frame FUV + optical analysis of 30 star-forming galaxies at 2.4, Steidel et al. (2016) found O/Fe ≈ 4 − 5 × O/Fe . Similar levels of O/Fe-enhancement have been reported for individual galaxies at 2.3 in more recent studies (Topping et al. 2020a,b).
The observed O/Fe enhancement (also referred to asenhancement) has a number of important implications. Firstly, it places robust constraints on the typical star-formation histories at early cosmic epochs, confirming results from previous photometric analyses that the stellar populations at high redshift are typically < 1 Gyr old (Reddy et al. 2012). Secondly, a deficit of Fe relative to O will mean that the stellar ionizing spectrum of typical high-redshift galaxies is harder at fixed oxygen abundance compared to galaxies in the local Universe. This relative hardening of the ionizing continuum at fixed O/H offers a natural explanation for the observed offset of > 2 star-forming galaxies relative to 0 galaxies in the common line ratio diagrams (e.g., the BPT diagram; Topping et al. 2020a;Runco et al. 2021). Non-solar /Fe ratios will also force us to re-think current stellar population techniques when applied to high-redshift galaxies. Almost universally, current stellar population synthesis models assume solar abundance ratios; accurate analyses of FUV-optical spectra at > 6 with JWST will require new models allowing for non-solar abundance ratios.
In this paper, we expand upon a number of previous works at 2.5 (Steidel et al. 2016;Topping et al. 2020a,b) and present a simultaneous analysis of FUV and optical spectra for a sample of 33 star-forming galaxies drawn from the VANDELS survey at 3.4. Combining ultra-deep optical VIMOS/VANDELS spectroscopic observations (tracing the rest-frame FUV) with MOSFIRE − and −band near-IR follow-up (tracing the rest-frame optical), our analysis provides the first investigation of g and ★ for galaxies at > 3, and we present the first estimates of O/Fe at these redshifts. In addition, we present both the stellar and gas-phase MZRs for our sample, tracing O/Fe as a function of stellar mass.
The structure of this paper is as follows. In Section 2 we discuss our combined VANDELS+MOSFIRE spectroscopic dataset and describe our final 3.4 galaxy sample (referred to throughout the rest of this paper as the NIRVANDELS sample). In Section 3 we outline the methods used to determine ★ and g from the rest-frame FUV and optical spectroscopy respectively. In Section 4 we present our determination of the stellar and gas-phase MZRs at > 3 along with an estimate of the typical O/Fe ratios of our sample. In Section 5 we discuss some of the implications of our results before summarizing our main conclusions in Section 6. Throughout this paper, metallicities are quoted relative to the solar abundance taken from Asplund et al. (2009) which has a bulk composition by mass of * = 0.0142. We assume the following cosmology: Ω = 0.3, Ω Λ = 0.7, 0 = 70 km s −1 Mpc −1 .

Rest-frame UV VANDELS sample and observations
The star-forming galaxy sample presented in this paper was initially drawn from the VANDELS ESO public spectroscopic survey (Pentericci et al. 2018;McLure et al. 2018). VANDELS is an ultra-deep, optical, spectroscopic survey of the CANDELS CDFS and UDS fields (Grogin et al. 2011;Koekemoer et al. 2011) undertaken using the VIMOS spectrograph on ESO's Very Large Telescope (VLT). The three categories of VANDELS targets were massive passive galaxies at 1.0 ≤ ≤ 2.5, bright star-forming galaxies at 2.4 ≤ ≤ 5.5 and fainter star-forming galaxies at 3.0 ≤ ≤ 7.0, with the main focus being star-forming galaxies at > 2.4 (85% of targets). Observations were obtained using the VIMOS medium-resolution grism covering the wavelength range 4800Å < obs < 10000 Å at a resolution of = 580 (with 1.0 slits) and a dispersion of 2.5 Å per pixel. At the redshifts of interest for our study (2.95 ≤ ≤ 3.80) the VIMOS spectra cover rest-frame UV emission at 1000 − 2000Å, a wavelength range sensitive to various continuum and emission-line features that trace the properties of young, massive stellar populations in star-forming galaxies (Cullen et al. 2019(Cullen et al. , 2020Calabrò et al. 2021). The observations and reduction of the VIMOS spectra are described in detail in the VANDELS data release papers (Pentericci et al. 2018;Garilli et al. 2021). The selection of VANDELS targets for near-IR follow-up with MOSFIRE is described below.

Rest-frame Optical MOSFIRE sample and observations
In order to characterize simultaneously the properties of massive stars and ionized gas in 3.4 star-forming galaxies, we selected a sample of galaxies from the VANDELS survey for near-infrared spectroscopic follow-up observations with the Multi-object Spectrometer for Infrared Exploration (MOSFIRE; McLean et al. 2012) on the Keck I telescope. The requirement for strong rest-frame optical features to fall within windows of atmospheric transmission translates into discrete allowed redshift ranges for targets for ground-based near-infrared spectroscopic follow-up, including 2.95 ≤ ≤ 3.80 for ∼ 3 targets and 2.09 ≤ ≤ 2.61 for those at ∼ 2. For most of our MOSFIRE mask design (i.e., for masks observed in November 2019), we prioritized VANDELS galaxies with AB ≤ 25.5, and robustly measured redshifts (e.g., characterized by redshift flags 3, 4, 9, and 14, as defined in Pentericci et al. 2018) at 2.95 ≤ ≤ 3.80. Slightly higher priority was given to sources with measured C ] 1907, 1909 rest-UV nebular emission. VANDELS sources with robustly measured redshifts (same redshift flags as above) at 2.09 ≤ ≤ 2.61 were targeted with lower priority, and, finally, VANDELS sources with AB > 25.5 and either 2.95 ≤ ≤ 3.80 or 2.09 ≤ ≤ 2.61 were considered the lowest priority. A slightly different priority scheme was used for the gs_al1 mask, which was designed a year earlier, prior to the completion of the final VANDELS redshift measurements. Accordingly, for this mask, galaxies at AB ≤ 25.5 and robustly measured redshifts (again, as defined above) at either 2.95 ≤ ≤ 3.80 or 2.09 ≤ ≤ 2.61 were prioritized, followed by VANDELS targets with redshifts yet to be measured. In attempting to optimize the number of VANDELS sources per MOSFIRE pointing that satisfied the above criteria, we found that the best mask configurations contained 15-20 such targets.
We obtained Keck/MOSFIRE -, and -band rest-frame optical spectra for the selected VANDELS targets in the GOODS-S and UDS fields. (1.0 ), the typical seeing in our MOSFIRE observations (0.5 -0.6 ) was also better than the typical seeing for the VANDELS VIMOS observations (0.7 ).

3
within the three masks. The MOSFIRE observations are summarized in Table 1. We reduced the raw data to produce two-dimensional science and error spectra using the pipeline described in Kriek et al. (2015). We then optimally extracted one-dimensional science and error spectra from the two-dimensional spectra. Flux calibrations and slit-loss corrections for each filter were applied as described in Kriek et al. (2015) and Reddy et al. (2015). Of the 50 VANDELS sources targeted, we obtained rest-frame optical spectra that yielded robust, science-grade redshifts for 35 sources at 2.95 ≤ ≤ 3.8 and 10 sources at 2.09 ≤ ≤ 2.61. In this paper, we focus exclusively on the ≥ 2.95 sources.

Rest-frame optical emission line fluxes and redshifts
Measurements of rest-frame optical emission-line fluxes were obtained by fitting Gaussian profiles to the flux-calibrated onedimensional MOSFIRE spectra. The 1 uncertainty on each line flux was estimated by perturbing the spectra 500 times according to the error spectra, remeasuring the line flux, and taking the standard deviation of the resulting distribution. The absolute flux calibration of the MOSFIRE spectra is accurate to within 18% and the relative calibration between -, and -band filters is accurate to within 13% (Kriek et al. 2015). These line flux measurements therefore provide robust estimates of absolute line luminosity (e.g., for estimating SFRs) and cross-filter line ratios (e.g., [O ]/[O ]). The same error simulations were used to determine line centroids and errors, and the associated redshift and redshift uncertainty for each emission line. The individual line redshifts were combined using a weighted average to determine the final spectroscopic redshift and its associated uncertainty.

Stellar masses and star-formation rates
We estimated stellar masses using multiwavelength photometry from the VANDELS photometric catalogs (McLure et al. 2018), taking into consideration the fact that some photometric filters could be contaminated by rest-frame optical emission-line flux. At high redshift ( 2), the large equivalent widths of the rest-frame optical emission lines (e.g., typical rest-frame [O ] 5007 equivalent-width values of ≥ 300Å at 10 10 M ; Reddy et al. 2018) can contaminate broadband photometry and result in an overestimation of galaxy masses. In order to derive more accurate stellar masses, we first corrected the -, and -band photometry for the emission-line fluxes measured from the MOSFIRE spectra. To perform this correction we constructed a model emission-line-only spectrum for each galaxy based on the line fits described above. The flux contributed by emission lines to the -, and -band photometry was then determined by integrating the model over the appropriate filter profiles, and this flux was subtracted from the original photometry. Corrections to the -and -band photometry ranged from −0.3 < Δ /mag < 0.0 (median Δ = −0.07 mag) and −1.8 < Δ /mag < 0.0 (median Δ = −0.35 mag) respectively.
The emission-line corrected photometry was modeled using FAST++ 2 (Schreiber et al. 2018), an SED-fitting code closely based In fact, the VIMOS and MOSFIRE spectroscopic observations presented here probe comparable intrinsic regions of our target galaxies. on the original FAST software (Kriek et al. 2009). We adopted Conroy et al. (2009) flexible stellar population synthesis models and assumed solar metallicity 3 , a Chabrier (2003) initial mass function (IMF), constant star-formation histories and the Calzetti et al. (2000) dust attenuation curve. The redshift was fixed to the measured spectroscopic redshift. These SED-fitting parameters were chosen to facilitate a direct comparison between our results and the recent study of gas-phase metallicities at 3.3 by Sanders et al. (2020a). The resulting fits yielded an estimate of the galaxy stellar mass ( ★ ), star-formation rate (SFR SED ), stellar continuum dust attenuation E(B-V) ★ ) and a model of the stellar continuum. Using the stellar continuum model we corrected the H line fluxes for underlying stellar absorption resulting in a median increase of 3% to the original flux values.

Dust-correcting nebular emission line fluxes
In order to determine accurate optical line ratios for estimating gasphase metallicities, the observed line fluxes need to be corrected for nebular extinction. Ideally, the nebular dust correction is determined directly using the Balmer decrement. However, this requires the detection of both the H and H emission lines and is therefore not possible for our 3.4 sample. Instead, we used the best-fitting value 3 Although our analysis later in the paper (Section 4) places the -element abundance of our sample at 0.3 − 0.5 Z and the Fe/H abundance at 0.1−0.2 Z , adopting solar metallicity models is crucial for dust-correcting the nebular emission lines using the relation derived by Sanders et al. (2020a) (a different metallicity would bias the best-fitting continuum dust attenuation, see Secion 2.3.3). This metallicity assumption will inevitably introduce some bias into the stellar mass determination (see e.g., Theios et al.   of the stellar continuum attenuation to dust-correct the emission-line fluxes for each galaxy. We employed the calibration between stellar attenuation, SFR SED , redshift and nebular extinction described in Sanders et al. (2020a), given by: This calibration is based on observations of galaxies at 2.3 with both E(B-V) neb (measured via the Balmer decrement) and E(B-V) ★ , and yields an unbiased estimate of E(B-V) neb with an intrinsic scatter of 0.23 magnitudes and is directly applicable to our sample since we assume the same SED-fitting parameters used in the derivation of the calibration. Based on the estimate of E(B-V) neb given by equation 1, the observed emission line fluxes were corrected for reddening assuming the Cardelli et al. (1989) extinction law 4 .
To assess the reliability of the nebular dust correction we derived star-formation rates from the dust-corrected H line fluxes assuming an intrinsic ratio of H /H = 2.86 and applying the Hao et al. (2011) H -SFR conversion modified for a Chabrier (2003) IMF 5 . In Fig. 1 we show the resulting comparison between the H -derived SFRs and the original photometrically-derived SFR estimate for galaxies with a > 2 detection of the H line. The agreement is generally excellent, with a median offset of 0.06 dex in log(SFR) and a scatter of = 0.3 dex, where is derived from the median absolute deviation (MAD) ( = 1.4826 × MAD).

The effect of emission lines on SED-derived stellar masses
and star-formation rates at 3.4 In many instances, studies that rely on stellar mass estimates at 2.95 ≤ ≤ 3.8 do not have access to spectroscopic line flux measurements and as a result lack accurate corrections to the -and -band photometry. It is therefore interesting to assess the effect of contamination from nebular emission lines on the derived stellar masses and SFRs at these redshifts. In Fig. 2 we compare the values of ★ and SFR(SED) estimated from corrected and uncorrected photometry. If ★ and SFR are derived using uncorrected -and -band photometry we find that, on average, log( ★ ) is overestimated by 0.26 ± 0.16 dex and log(SFR) is underestimated by 0.11 ± 0.10 dex. These offsets are consistent with previous results in the literature. In particular, the overestimation of ★ is a well-known effect of the contamination of broad-band photometry by high-equivalent width rest-frame optical emission lines at high redshift (e.g., Schaerer et al. 2013;Amorín et al. 2015;Onodera et al. 2016). As can be seen from redshift star-forming galaxies closely follows the Cardelli et al. (1989) Fig. 2, the effect is strongest at low ★ where the equivalent widths are largest (e.g. Reddy et al. 2018). Indeed, this bias due to optical line contamination is expected to be ubiquitous at > 2 due to the known evolution towards larger emission-line equivalent widths as a result of increasing specific star-formation rates and lower metallicity (Mármol-Queraltó et al. 2016;Reddy et al. 2018). However, as can be seen from Fig. 2, unbiased estimates of ★ and SFR can be obtained by simply excluding the photometric bands that are known to include strong optical emission lines (in the case of 3 galaxies, the -and -bands). With the contaminated bands excluded, the offsets in log( ★ ) and log(SFR) are consistent with zero (−0.06 ± 0.13 dex and 0.05 ± 0.15 dex respectively). In our particular case, the fact that we can recover these properties reliably in the absence of photometric anchors at rest-frame optical wavelengths (between 3000Å−6000Å) is due to the data at longer wavelengths provided by the Spitzer IRAC 3.6 m and 4.5 m imaging (covering rest-frame wavelengths between 8000Å−10000Å). In the absence of accurate line flux corrections, and with photometric data redward of the -band, our results suggest that − for the SED fitting assumptions described above − simply excluding contaminated photometric bands from the SED-fitting process yields unbiased ★ and SFR estimates that are consistent with the emission line-corrected values to within 10 − 15%.

The z ∼ 3.4 NIRVANDELS sample
Our final sample was drawn from the 35 galaxies with spectroscopic redshifts in the range 3.0 ≤ ≤ 3.8 (Table 1). We identified one galaxy in which the presence of active galactic nuclei (AGN) ionization was indicated by strong emission from the high-ionization species N 1238, 1242, C 1548, 1550 and He 1640 in the rest-frame UV spectrum. This galaxy was removed from our sample. No further AGN were identified based on rest-frame UV and optical spectral properties. Furthermore, for the remaining sample, we ruled out the presence of significant AGN ionization based on their mid-IR SED shapes and X-ray properties (see McLure et al. 2018). We also removed one galaxy in which the only optical emission feature detected was [O ] 3726, 3729. There were two reasons for this decision. Firstly, it is not possible to determine a gas-phase metallicity from the [O ] 3726, 3729 doublet alone, and therefore this object could not be included in our individual galaxy analysis. Secondly, in our rest-frame optical stacking analysis (described in Section 2.4.1 below) we required the detection of the [O ] 5007 line in order to normalise the spectra, which was not available for this object.
Our final NIRVANDELS sample therefore consisted of 33 galaxies with secure spectroscopic redshifts in the range 2.95 ≤ ≤ 3.8. Each object in the sample has MOSFIRE spectra in the -and -bands covering a number of rest-frame optical emission lines sufficient for deriving gas-phase metallicity ( g ), and a VIMOS/VANDELS restframe FUV spectrum from which the stellar metallicity ( ★ ) can be determined.
In Fig. 3  2) plotted against stellar masses derived from the original, non-corrected, photometry in which the and bands are included (open red circles) and excluded (filled black circles). The bottom panel shows the same for SED-derived star-formation rates. In each panel the grey solid line is the oneto-one relation and median error bars are shown in the top left-hand corner. The mean, and standard deviation, of the offset from the one-to-one relation is indicated in the legend. The true, emission line corrected, values of ★ and SFR can be reasonably recovered by simply excluding the contaminated and bands are excluded from the fit. the NIRVANDELS galaxies compared to the full VANDELS sample of N = 791 star-forming galaxies in the redshift range 3.0 ≤ ≤ 3.8. The stellar masses of the full sample have been derived using the same SED fitting procedure described in Section 2.3.2. As the photometry for the full sample cannot be corrected for emission-line contamination, we have excluded the -and -band photometry from these fits for the reasons described in Section 2.3.4 (all objects benefit from longer-wavelength Spitzer IRAC data). The properties of the sample with respect to the full 3.4 star-forming galaxy population are discussed further in Section 2.4.2.

Composite spectra
Our study is focused on determining stellar metallicities ( ★ ) and gas-phase metallicities ( g ), and, wherever possible, we estimated these quantities on an individual galaxy basis. However, in order to include galaxies for which such measurements were not possible, we also made use of stacked spectra. In order for an object to be included in the stacking sample, we required coverage of all rest-frame optical emission lines in the MOSFIRE spectra ([O ] 3726,3729,[Ne ] 3870, H and [O ] 4959, 5007). In total, 5 objects were removed from the stacking sample due to fact that the H line was not covered by the detector 6 . The final stacking sample therefore contained only 28/33 galaxies, and is fully representative of the complete sample.
In this paper, we focus on the correlation between metallicity and stellar mass, and we therefore constructed stacks in two bins of ★ : high-★ and low-★ , split at the median ★ of the stacking sample ( ★ = 10 9.4 M ). We employed the method described in Cullen et al. (2019) to produce FUV stacks from the VANDELS data and refer readers to that paper for further details. For the MOSFIRE spectra, we first converted each spectrum into luminosity density units using its spectroscopic redshift and corrected for nebular dust attenuation using E(B-V) neb assuming the Cardelli et al. (1989) Milky Way extinction curve. To avoid biasing the stacks in favour of the brightest objects we normalised each spectrum using its measured [O ] 5007 luminosity 7 . The normalised spectra were then re-sampled onto a common rest-frame wavelength grid, and the final stacked spectrum was produced by taking the median value at each wavelength. The error at each wavelength was calculated via a bootstrap re-sampling of the individual values. Finally, the normalised stacks were converted back into absolute luminosity density units by multiplying by the median [O ] 5007 luminosity of the contributing galaxies.
Due to the fact that the optical emission line profiles in the stacked spectra were not necessarily well-described by a single Gaussian profile, line luminosities for the stacks were determined via direct integration after subtracting off the local stellar continuum. The stacked H line luminosities were corrected for stellar absorption using the median correction of the galaxies contributing to the stack. These correction factors were relatively small, at the 3% level.
To estimate uncertainties on the various properties derived from the stacked spectra (line ratios/luminosities, ★ , g ) we employed a re-sampling methodology. The ★ values of galaxies in the stacking sample were perturbed according to their 1 uncertainties and the high-★ and low-★ bins re-populated with replacement. The E(B-V) neb values for each galaxy were also perturbed according to their 1 uncertainties. Finally, the 1D spectra were perturbed using their error spectra. The stacks were then re-constructed using the methodology described above and the various quantities of interest were remeasured. This process was repeated 500 times. Uncertainties , normalised stellar mass distribution (centre) and SFR -★ relation (right) for the star-forming galaxies in our NIRVANDELS sample. In the left-hand and centre panels, the open blue histogram represents all galaxies at 3.0 < < 3.8 in the VANDELS parent sample (N=980) and the filled grey histogram shows the subset of galaxies with MOSFIRE follow-up analysed here (N=33). In the right-hand panel the black filled circular points are galaxies with a H detection and therefore a SFR estimate based on the H flux; the open circular points represent galaxies without a H detection and therefore a SED-derived SFR estimate. The median uncertainty on ★ and SFR is displayed in the lower right corner. The square data points show the high-★ and low-★ stacks (see Section 2.4.1). The small background data points shown the full VANDELS sample. The solid and dashed red lines show two paramterizations of the SFR-★ relation at = 3.3 derived by Sanders et al. (2020a) and Speagle et al. (2014) respectively. on all quantities measured from stacked spectra were then calculated using the 16th and 84th percentiles of the resulting distributions.

A representative sample?
If the results of this study are to be applied in general to star-forming galaxies at 3.4, it is necessary to demonstrate that our sample is not a highly-biased subset of the general galaxy population at this epoch. In particular, when considering the mass-metallicity relation, it is important to assess whether the galaxy sample is biased in terms of SFR at fixed ★ . Numerous studies in the literature, both at = 0 and higher redshifts, have found strong evidence for an anticorrelation between SFR and metallicity at fixed ★ (i.e, the FMR; Mannucci et al. 2010;Sanders et al. 2018). Therefore, in order to avoid SFR biases, the galaxies in our sample should be representative of the 'star-forming main sequence' (SFR-★ relation) at 3.4. In the right-hand panel of Fig. 3 we show the location of the NIR-VANDELS galaxies on the SFR-★ plane compared to two literature determinations of the SFR-★ relation at 3.4 (Speagle et al. 2014;Sanders et al. 2020a). We also show the underlying distribution of all VANDELS galaxies at 2.95 ≤ ≤ 3.80 to illustrate the typical range of SFRs observed at fixed ★ . Above ★ 10 9.4 M (the median stellar mass of our sample), the scatter of the individual NIRVAN-DELS galaxies about the main sequence is consistent with that of the underlying population and there is no evidence that our sample is strongly biased in terms of SFR at these stellar masses. Moreover, the high-★ stack is consistent with the Speagle et al. (2014) main sequence relation within 1 . Therefore, for galaxies with ★ 10 9.4 M (our high-★ sample), we conclude that the results presented in this paper are applicable to the general 3.4 star-forming population.
At ★ < 10 9.4 M , the individual NIRVANDELS galaxies predominantly fall above the literature relations. The low-★ stack also sits above the main sequence by 0.3 dex. Although the stack is formally consistent with the Speagle et al. (2014) relation within 2 , we clearly cannot rule out the possibility that our sample is biased towards high SFR at the lowest stellar masses. This in turn could lead to a bias in the MZR. Based on the results of Sanders et al. (2018) the g bias at fixed ★ due to an offset from the main-sequence can be estimated as Δ g −0.15 × Δlog(SFR). An offset of 0.3 dex therefore represents a bias of −0.05 dex in g 8 . We chose not to correct for this potential bias in this paper because (i) the magnitude of the offset is similar to the typical uncertainties on our individual g measurements; and (ii) the main sequence offset in itself is not highly significant (< 2 ). Nevertheless, we caution that at ★ 10 9.4 M our sample may include a small bias towards objects with elevated SFRs and lower metallicities.

DETERMINATION OF GALAXY METALLICITIES
The primary focus of this study is the determination of gas-phase metallicities ( g ) and stellar metallicities ( ★ ) as a function of galaxy stellar mass at 3.4. Below we describe in detail how each of these parameters was measured from the spectroscopic data.
Before going into the details of these methods, it is worth again emphasizing that g and ★ , as measured in this paper, are not sensitive to the same element abundances (see also Section 1). Gas-phase metallicities, determined from rest-frame optical nebular emission lines, are sensitive to the elements that act to cool the 10 4 K ionized gas, namely oxygen (O/H). On the other hand, stellar metallicities derived from rest-frame FUV spectra are sensitive to the elements that dominate the FUV opacity of massive stars, namely iron (Fe/H) (Leitherer et al. 2010). We will discuss the implications of this difference in detail in Section 4.3.

Determination of the gas metallicity (Z g )
Estimates of the gas-phase metallicity ( g , or O/H) were derived using the ratios of rest-frame optical emission lines measured in the MOSFIRE spectra. In the redshift range investigated  . From left to right: VANDELS rest-frame FUV spectrum, MOSFIRE -band spectrum, and MOSFIRE -band spectrum for the N = 6/33 individual galaxies in our sample for which were able to estimate ★ . These galaxies have a median SNR per resolution element >= 5 in the VANDELS rest-frame FUV spectrum. In each panel, both 1D and 2D spectra are shown in the lower and upper portion of the panel, respectively. For the 1D spectra, the observed spectrum is shown in black, with the best-fitting model over-plotted in blue and the error spectrum shown in orange. For the VANDELS rest-frame FUV 1D spectra, regions of the spectrum dominated by ISM absorption lines/nebular emission lines are masked out (lighter shading) as these wavelength regions are not included in the stellar continuum fits used to determine ★ . In the MOSFIRE panels, dotted vertical lines indicate the positions of the nebular emission lines used to determine  are available ). Using these calibrations, we calculated the best-fitting value of g via a 2 minimization using: .
(2) where = 12 + log(O/H) = log( g /Z ) + 8.69, the sum over represents the line ratios used, R obs,i is the logarithm of the -th observed line ratio, R cal,i ( ) is the predicted value of R at from the Bian et al. (2018) calibrations, obs,i is the uncertainty on R obs,i , and cal,i is the calibration uncertainty. The best-fitting g solution is found by minimizing the value of 2 in equation 2. Uncertainties were estimated by perturbing the observed line ratios by their 1 error values and re-calculating g 500 times. The 1 uncertainty on g was derived from the 68th percentile width of the resulting distribution.
In practice, not all lines were detected for a given object. In this case, the minimum requirement for a metallicity estimate was the detection of the [O ] Shapley et al. 2017) but in practice there were no instances in which this was the only line ratio available 9 .
For each galaxy we utilised the maximum number of line ratios possible, which resulted in 6/33 metallicities determined using all three line ratios, 11/33 using O 32 and [O ] 5007/H , and 4/33 using O 32 alone. Using more line ratios improves the constraints on g but does not bias the solution on average ). The remaining 12/33 galaxies do not have individual g determinations; in all cases this was due to a non-detection of the [O ] 3726, 3729  line. For both the low-★ and high-★ composite spectra all lines were detected. The gas-phase metallicities estimates resulting from this procedure are listed in Tables 2 and 3. Examples of the nebular emission line spectra for individual objects are shown Fig. 4 and a comparison between the two composite spectra are shown in Fig. 5.

Determination of the stellar metallicity (Z ★ )
Stellar metallicities ( ★ , or Fe/H) were estimated from the rest-frame FUV spectra using a full spectral fitting technique within the wavelength range 1221−2000 Å as described in Cullen et al. (2019Cullen et al. ( , 2020. The method involves fitting stellar population synthesis models to all portions of the FUV spectra dominated by stellar continuum emission (avoiding regions contaminated by ISM absorption lines and/or nebular emission lines). The metallicity of the model ( ★ ) is constrained while marginalising over three nuisance parameters related to dust attenuation. Below we briefly describe this technique but refer readers to Cullen et al. (2019) for full details.
For the stellar population models we adopted the Starburst99 (SB99) high-resolution WM-Basic models described in Leitherer et al. (2010). To construct the models we assumed constant star formation over timescales of 100 Myr and adopted the weaker-wind Geneva tracks with stellar rotation and single-star evolution at the following metallicities: ★ = (0.001, 0.002, 0.008, 0.014, 0.040). The models were fitted using the nested sampling algorithm implemented in the python package dynesty (Speagle 2020). The free parameters in the fit were ★ and three nuisance parameters that define the shape of the FUV attenuation curve using the parameterisation described in Salim et al. (2018). As the models are provided at five fixed ★ values, we linearly interpolated the logarithmic flux values between the models in order to generate a model for any ★ value within the prescribed range. The models were then convolved to the resolution of the VANDELS spectra and appropriately re-sampled.
We used a log-likelihood function of the form, where is a constant, is the observed flux, ( ) is the model flux for a given set of parameters , and is the error on the observed flux. The likelihood was computed using only those wavelength pixels free from ISM absorption or nebular emission-line contamination. For this purpose we adopted the 'Mask 1' windows defined in Steidel et al. (2016). The dynesty algorithm provides estimates of the posterior probability distributions for each of the free parameters in the fit. For a given fit, the best-fitting ★ value was calculated from the 50th percentile of the resulting ★ posterior distribution and the uncertainty derived from the 68th percentile width.
Deriving stellar metallicities for individual galaxies is generally more difficult than deriving gas-phase metallicities due to the requirement for a high S/N detection of the continuum. In this paper, we only report ★ values for the six individual galaxies for which the average S/N per resolution element of the VANDELS spectrum is ≥ 5 in the relevant wavelength range. At lower S/N, we find that the typical 1 uncertainties on ★ are > 50%. Moreover, below a certain threshold S/N estimates of ★ can become biased. For example, Topping et al. (2020b) find that unbiased estimates of ★ require S/N ≥ 5.6 per resolution element (five of the six galaxies with individual ★ estimates satisfy this slightly higher threshold). Stellar metallicity estimates resulting from this procedure are listed in Tables 2 and 3, and examples of fits to the individual and stacked FUV spectra are shown in Figs. 4 and 6. In Fig. 6 we also show a zoom-in of continuum-normalised versions of the stacks in the region of the '1719' index, recently advocated by Calabrò et al. (2021) as a useful ★ -sensitive index in the FUV. In agreement with the results of Calabrò et al. (2021), we find that the higher metallicity, high-★ , stack shows increased absorption in the '1719' index region.

Trends with emission-line ratios
Since the gas-phase metallicity is estimated directly from emissionline ratios, we first examine the empirical trends between the observed dust-corrected line ratios, stellar mass and redshift. In Fig. 7 ) as a function of ★ for our sample. Also shown is an independent dataset of star-forming galaxies at 3.3 from Sanders et al. (2020a), and a sample of local star-forming galaxies drawn from the Sloan Digital Sky Survey (SDSS). The SDSS galaxies are taken from the Andrews & Martini (2013) sample as described in Sanders et al. (2020a). To facilitate a direct comparison with our 3.4 galaxies, the SDSS line ratios have been corrected, where possible, for diffuse ionized gas (DIG) emission 10 . DIG is not associated with H regions and therefore biases nebular emission line flux measurements in the integrated spectra of galaxies at 0 (Oey et al. 2007;Sanders et al. 2017;Vale Asari et al. 2019). A DIG correction is not required for the 3.4 sample because the DIG contribution becomes negligible at the typical star-formation surface densities of high-redshift galaxies (Sanders et al. 2017;Shapley et al. 2019). Fig. 7 clearly demonstrates that each of the three line ratios increases with decreasing stellar mass. The negative correlation between line ratio and ★ applies to both the galaxies at 3.4 and the local SDSS sample. Based on the assumption that the galaxies lie on the upper-branch of the [O ]/Hg relation 11 , these trends indicate that g increases with increasing ★ . Crucially, this correlation holds irrespective of exactly which emission-line calibration is adopted (e.g. Maiolino et al. 2008 line ratios, which are based on a much larger sample of N = 245 galaxies drawn from the star-forming main sequence, provides further evidence that our sample is not a highly-biased subset of the 3.4 11 star-forming galaxy population at 3.4. As an aside, we also note that the position of our sample in the [O ]/H -★ plane overlaps with the star-forming galaxy region of the Mass-Excitation (MEx) diagnostic diagram proposed by Juneau et al. (2014), providing further evidence that our sample is not strongly contaminated by AGN excitation.
For  Sanders et al. (2020a) 3.4 relation within the uncertainties. We note that the reason the individual detections lie predominantly above the stacked value is due to selection effects (i.e., the stacks also contain objects that are not individually detected in [Ne ] and/or [O ]). This selection effect is also present in the other two line ratio diagrams, but does not affect the general trends described above.
It can also be seen from Fig. 7 that all of the 3.4 line ratios are elevated with respect to SDSS galaxies at fixed ★ . Again, this empirical trend strongly suggests an evolution towards lower g at higher redshift, at all stellar masses. However, when interpreting the redshift evolution of emission lines, the known evolution towards more extreme H region conditions at high redshift must also be considered (e.g., Steidel et al. 2014;Shapley et al. 2015). Most recent results, as well as the results presented in this paper, suggest that the primary cause of this evolution is the harder ionizing spectra emitted by oxygen-enhanced, low-metallicity (i.e., iron-poor), stars at high redshift (e.g., Steidel et al. 2016;Strom et al. 2017;Topping et al. 2020b, see Section 4.3 for further discussion). As a consequence, not all of the observed redshift evolution in line ratios can be attributed purely to changes in g (e.g., Cullen et al. 2016). Nevertheless, these empirical line-ratio diagrams provide useful qualitative indications of the likely evolution of g with ★ and redshift, which we discuss quantitatively below.

The ★g relation
The NIRVANDELS 3.4 gas-phase mass-metallicity relationship (MZR g ) is shown in Fig. 8. It can be seen that the NIRVANDELS galaxies follow a clear MZR g , evident for both the individual galaxies and the stacks. The difference in log( g ) between the high-★ and low-★ stacks is 0.25 ± 0.08, representing an increase in g of 1.78±0.35 across 1 dex in stellar mass ( 10 9 −10 10 M ). It is also clear from Fig. 8 that the NIRVANDELS sample is fully consistent with the Sanders et al. (2020a) MZR g determination at 3.3. This is unsurprising given the similar ★ -dependence of the emissionline ratios illustrated in Fig. 7. Given this excellent agreement, we do not attempt to refit our new data here. The functional form of the Sanders et al. (2020a) MZR g (converted into units of log( g /Z ) 12 ) is, log( g /Z ) = (0.29 ± 0.02) 10 − (0.28 ± 0.03), where 10 = log( ★ /10 10 M ). It can also be seen from Fig. 8 that the slope of the 3.4 MZR g is also fully consistent with the low-mass slope at 0 (0.28 ± 0.01; Sanders et al. 2020a). As a result, there is a constant offset of −0.36 dex in log( g ) (a factor 0.44 in g ) between = 0−3.4 in the relevant 12 The conversion between log( g /Z ) and the widely-used 12+log(O/H) is simply 12 + log(O/H) = log( g /Z ) + 8.69. mass range ( 10 8.5 − 10 10.5 M ). We note that the size of this offset does critically depend on the g calibration used; to this end, the 0 relation of Sanders et al. (2020a) was derived using a local calibration that should be non-biased with respect to g measurements at > 1 derived from the Bian et al. (2018) calibration, effectively accounting for the harder ionizing radiation field at high redshifts discussed above in Section 4.1.1, and later in Section 4.3 (see also Sanders et al. (2020a) for full details). Finally we note that, although not shown in Fig. 8, a number of other previous studies at > 3 have derived ★g relations with a lower overall normalization due to the use of a different metallicity calibration (e.g., Maiolino et al. 2008;Mannucci et al. 2009;Troncoso et al. 2014;Onodera et al. 2016, see Sanders et al. 2020a for a full discussion); crucially, however, the slope of these other literature relations are generally consistent with the results presented here, indicating that the scaling of metallicity with stellar mass is independent of the chosen calibration.

The stellar mass-metallicity relation at z 3.4
The NIRVANDELS 3.4 stellar mass-metallicity relationship (MZR ★ ) is also shown in Fig. 8. Again, the individual galaxies and stacks appear to follow a clear MZR ★ . The evolution in log( ★ ) between the low-★ and high-★ stack is 0.18 ± 0.10 dex (a factor of 1.51 ± 0.35). In contrast to MZR g , the trend for individual objects is less clear due to the fact that the error bars on the individual measurements are large, and the number of objects much reduced.
Nevertheless, these individual measurements are formally consistent with the trend observed in the stacks.
Also shown in Fig. 8 is a determination of MZR ★ derived from the much larger sample of N = 681 VANDELS galaxies presented in (Cullen et al. 2019). The ★ values in Cullen et al. (2019) were derived using − and − band photometry that had not been corrected for optical emission line contamination, and used a set of SED-fitting assumptions different to the ones that are assumed in this paper. Therefore, the relation shown in Fig. 8 is a re-derivation of the Cullen et al. (2019) MZR ★ with ★ values derived excluding -and -band photometry and using the same methodology described in Section 2.3.2. This relation is in good agreement with our NIRVANDELS data, and is essentially consistent with the original relation shifted to lower ★ . The functional form of this MZR ★ is given by, log( ★ /Z ) = (0.27 ± 0.06) 10 − (0.68 ± 0.04). (5) where 10 = log( ★ /10 10 M ).
There are two striking aspects to the comparison between MZR ★ and MZR g in Fig. 8. Firstly, MZR ★ is offset to lower values at all stellar masses, by −0.4 dex. This is a result of the fact that g traces O/H while ★ traces Fe/H and will be discussed in detail in Section 4.3 below. Secondly, the slopes of both relations are consistent: d(log )/d(log ) 0.3. Moreover, we note that the low-mass slope of the MZR ★ measured in local star-forming galaxies is 0.38 (Zahid et al. 2017) 13 . This result, combined with the results discussed in Section 4.1.2, imply that the power-law slope of the ★ − relation is very similar for both massive stars and H region gas, and is also not strongly redshift-dependent. This in turn would suggest that the physical processes responsible for determining the power-law slope of the ★ − relations below the turnover do not evolve strongly, out to at least 4 (a lookback time of 12 Gyr).

Enhanced O/Fe ratios at high redshift
As derived in this paper, ★ traces Fe/H in massive stars while g traces O/H in ionized gas (see Section 3). Crucially, the two quantities are linked via the fact that ionized gas exists in the vicinity of massive stars, and is therefore likely to have very similar chemical abundance properties. Indeed, due to the short lifetimes of Oand B-type stars, the ionized gas is probably the remnant of their original stellar birth clouds. In this sense, O/H in ionized gas should also be representative of O/H in massive stars. Therefore, to a reasonable approximation, combining optically-derived estimates of g with FUV-based estimates of ★ traces the O/Fe abundance ratio of the massive stellar populations in star-forming galaxies (Steidel et al. 2016;Topping et al. 2020b). More generally, because O is an − process element, the observed O/Fe enhancement traces the enhancement of elements, and is also referred to as − enhancement.
If the NIRVANDELS galaxies have solar-like abundance ratios, we would expect that g = ★ . However, it is clear from Fig. 8 that g > ★ across the full range of ★ . That is, the O abundances of the NIRVANDELS galaxies are consistently larger than the Fe abundances, implying enhanced O/Fe ratios relative to the solar value.  . g versus ★ for star-forming galaxies at = 2 − 3. Small square data points show the individual galaxies in our sample with a measurement of both quantities. Large filled square data points represent the low-★ and high-★ stacks. The grey circular data points show the data for star-forming galaxies at 2.3 presented in Topping et al. (2020b). The various lines show constant g / ★ ratios as indicated by the labels, with the solid line representing the one-to-one relation.
with each other within 1.2 , with an average value of 0.41 ± 0.06. Assuming the average value of the two stacks, we find thatthe offset translates into an enhancement of O/Fe relative to the solar value of (O/Fe) 2.57 ± 0.38 × (O/Fe) . Within the uncertainties, our results indicate that this offset is constant as a function of ★ between 10 9 − 10 10 M .
This result is clarified further in Fig. 9 where we plot, on a linear scale, ★ against g (in units of solar metallicity). It is clear from Fig. 9 that both stacks sit well above the one-to-one relation that would indicate a solar O/Fe ratio (red solid line). Moreover, each of the four individual galaxies with a measurement of both ★ and g also sit above the one-to-one relation. Given the large uncertainties the individual results are not highly significant (< 2 ); nevertheless, based on the more robust stacked measurements, Figs 8 and 9 present strong evidence for enhanced O/Fe ratios in typical star-forming galaxies at > 3 with ★ 10 10 M .
We note that the evidence presented here for O/Fe enhancement is consistent with other findings in the literature at slightly lower redshift ( 2.4, see below), although our results represent the first direct demonstration that (i) O/Fe enhancement exists at > 3, and ii) is not strongly dependent on stellar mass below 10 10 M .

Comparison with the literature
The most directly comparable works in the literature are the analyses of Steidel et al. (2016) and Topping et al. (2020a,b) both of whom performed a similar joint FUV+optical spectral analysis to the one presented in this paper. Steidel et al. (2016)

13
(O/Fe) to > 5 × (O/Fe) . A direct comparison to the Topping et al. (2020b) results is shown in Fig. 9 where it can be seen that the basic result (i.e., g > ★ ) is fully consistent with the result presented here. However, the median enhancement in the Topping et al. (2020b) sample (≈ 4.5 × (O/Fe) ) is clearly larger than the value we derive (≈ 2.5 × (O/Fe) ). We discuss the main reason for this difference in more detail in the following below. In addition to the direct method presented here, other indirect methods based on reconciling photoionization models with observed optical emission-line ratios have been used to investigate O/Fe enhancement at high redshift. In these methods, ★ is constrained mapping Fe/H to the shape of the ionizing continuum spectra used to generate the predicted emission line ratios. Using such an indirect approach, Strom et al. (2018) found (O/Fe) 2.6 × (O/Fe) for a sample of 150 main-sequence star-forming galaxies at 2.3. Similarly, Sanders et al. (2020b) also found evidence for significant O/Fe enhancement in a small sample of extreme emission line galaxies with direct temperature-based g estimates at = 1 − 4.
We note that although the precise value of the enhancement varies between studies -due to a combination of selection effects, and systematics related to measuring both g and ★ -the basic result remains consistent. That is, star-forming galaxies at high redshift are enhanced in O/Fe, with (O/Fe) 2 × (O/Fe) . Moreover, as discussed below, there is no known systematic that can easily explain away this result.

A discussion of systematic effects
The observed (O/Fe)-enhancement is subject to systematic uncertainties affecting the measurement of both g and ★ . Sytematics related to the measurement of g are well-known and have been discussed extensively in the literature (e.g. Kewley & Ellison 2008). The Bian et al. (2018) calibration used in this paper is arguably the most reliable calibration for galaxies at high redshift, having been built using local analogues of high-redshift galaxies; furthermore, it is known to be consistent with the (currently small) sample of 2 galaxies for which direct-method O/H estimates are available ). The Bian et al. (2018) calibration returns systematically larger estimates of g compared to the commonly-adopted calibration of Maiolino et al. (2008), which has typically been the calibration of choice when determining g at > 3 (Mannucci et al. 2009;Troncoso et al. 2014;Onodera et al. 2016). However, even assuming the Maiolino et al. (2008) calibration, the g values reported here only decrease by a factor 1.4, resulting in (O/Fe) 1.8 × (O/Fe) . We find similar offset using the updated version of these calibrations described in Curti et al. (2017). In addition, we find that another commonly-used calibration based on the [O ], [O ] and H lines -the calibration of Kobulnicky & Kewley (2004) -returns values consistent with Bian et al. (2018) for our sample. In summary, all standard high-redshift g calibrations produce estimates of g that imply (O/Fe) > (O/Fe) .
Another crucial systematic related to g is the choice of fundamental abundance scale. The Bian et al. (2018) calibration (and indeed the vast majority of empirical g calibrations) is tied to temperaturebased metallicities measured from auroral lines (often referred to as 'direct' method metallicities). However, there is a well-known discrepancy between g determined via the direct method and g measured using an alternative method based on oxygen recombination lines (the RL method). Estimates of g using the RL method are typically 0.24 dex higher than g estimated using the direct method, with this so-called abundance discrepancy factor (ADF) being roughly constant at all metallicities (Peimbert 1967;Esteban et al. 2014). It is worth nothing that the results of Steidel et al. (2016) discussed above are tied to the RL abundance scale, and this in part explains their larger (O/Fe) estimate. Reasons for favoring the RL abundance scale are discussed in detail in Steidel et al. (2016) and Strom et al. (2017). Again, however, accounting for the ADF would only serve to increase g (by factor 2) and therefore strengthen the observed (O/Fe)-enhancement in our sample.
With respect to ★ , the major source of systematic uncertainty is the choice of SPS model used to fit to the observed stellar continuum. Along with the SB99 models used in this paper, the Binary Population and Spectra Synthesis (BPASS) models (Eldridge et al. 2017;Stanway & Eldridge 2018) are also commonly used in the analysis of high-redshift galaxies. The primary way in which the BPASS models differ from SB99 is via the inclusion of massive binary star evolution, which can have a strong effect on the predicted UV spectrum, particularly at low metallicities. In Cullen et al. (2019) we found that the BPASSv2.1 models return systematically lower ★ values compared to SB99 by 0.1 dex (see also Chisholm et al. 2019). Fitting our current sample with the latest BPASSv2.2 models (including models at stellar metallicities down to 0.1% solar) yields log( ★ )= −3.41 ± 0.15 dex and log( ★ )= −2.61 ± 0.10 dex for the low-★ and high-★ stacks respectively. This represents a much larger offset with respect to the SB99 value for the low-★ stack (Δ log( ★ ) −0.6 dex) but is consistent with the ★ values reported in Topping et al. (2020b) ( 5% solar). We conclude that the systematically lower metallicities derived using the BPASS models are the primary cause of the difference between our data and those of Topping et al. (2020b) in Fig. 9. Crucially, however, use of the BPASS models would only decrease the measured ★ and hence increase the implied (O/Fe)-enhancement of our sample.
Finally, another concern is the depletion of O onto dust grains in the ISM, which would cause g to be an underestimate of the true O/H in stars. Steidel et al. (2016)  In summary, while the exact value of (O/Fe)-enhancement is clearly subject to number of systematic uncertainties, with current estimates in the range (O/Fe) ≈ 2 − 10 × (O/Fe) , we find no systematic effects that are large enough to explain away the basic result. Accordingly, we can state with some confidence that the abundance ratios in typical high-redshift star-forming galaxies are -enhanced.

DISCUSSION
In this section, we focus on the implications of the main result of this work: O/Fe enhancement (or -enhancement) in high-redshift star-forming galaxies. As discussed in previous works (e.g., Steidel et al. 2016;Cullen et al. 2019), O/Fe enhancement is actually expected for galaxies with constant/rising star-formation histories and relatively young ages (< 500 Myr − 1 Gyr). This is a result of the fact that while O is produced predominantly via core-collapse supernovae (CCSNe), Fe is produced via a mixture of both CCSNe and Type−Ia SNe. Therefore, as a result of the delayed onset of Type−Ia SNe after the start of star formation, young galaxies will initially be under-abundant in Fe relative to O, only reaching the solar ratio on timescales 1 Gyr (e.g., Maoz & Mannucci 2012;Maiolino & Mannucci 2019). Given the substantial evidence to suggest that star-forming galaxies at > 2 are characterised by constant/rising SFHs and ages 1 Gyr (Reddy et al. 2012;Topping et al. 2020b), we therefore expect enhanced O/Fe ratios at this cosmic epoch. Below we discuss how the observed (O/Fe)-enhancement might relate to local observations, and its consequences with respect to excitation conditions in high-redshift galaxies.

Abundance patterns compared to Milky Way stars
It is interesting to consider the link between high-redshift stellar populations and local stellar populations that we know (based on their age) formed at high redshift. For example, if (O/Fe)-enhancement is ubiquitous among galaxies at > 2, then we would expect to observe enhanced abundance ratios in local stars that formed at these cosmic epochs (i.e., in stars with ages 10 Gyr). In the subsequent discussion, we use the following definitions for consistency with local conventions:  (Tolstoy et al. 2009). Current galactic chemical evolution models − constrained by observations of stars in the Milky Way − predict an initial plateau at [O/Fe] 0.6 ( Fig. 10; e.g., Kobayashi et al. 2020) for stars in our Galaxy. Other estimates of CCSNe yields give similar values, typically within the range [O/Fe] 0.6 − 0.75 depending on the assumed stellar metallicity and IMF (e.g., Limongi & Chieffi 2003;Chieffi & Limongi 2004;Nomoto et al. 2006 Kobayashi et al. (2020), highlighting these trends.
The model of Kobayashi et al. (2020), and other so-called 'Galactic Chemical Evolution Models,' are constrained by observations of Milky Way stellar abundance ratios. For example, studies of FKG stars in the solar neighborhood have revealed a tight correlation between [ /Fe] and age, with stars on the knee/plateau having formed 10 − 12 Gyr ago (i.e. = 2 − 4; Haywood et al. 2013Haywood et al. , 2019. These older, high-[ /Fe] populations, are predominantly located in the thick disk and halo (Bensby et al. 2003;Haywood et al. 2013;Zhao et al. 2016;Kobayashi et al. 2020). Evidence also exists for a distinct high− population in the bulge at large scale heights (Bensby et al. 2013;Lian et al. 2020), and is thought to indicate an intimate link between the bulge and disk populations (Haywood et al. 2018). To illustrate these various populations, we show in Fig. 10 the position of a selection of F-and G-type stars drawn from the thin disk, thick disk and halo of the Milky Way using data from Zhao et al. (2016) and Amarsi et al. (2019), as well as a selection of micro-lensed dwarf and K-giant stars drawn from the bulge taken from Bensby et al. (2013). As can be seen from Fig. 10, stars in the thick disk and halo are found to have enhanced [O/Fe] while the stars with solar-like abundances are drawn predominantly from the thin disk. A similar sequence is also evident for the bulge population, with stars at higher [O/Fe] having larger scale heights (Amarsi et al. 2019).
It can also be seen from Fig. 10 Haywood et al. 2018Haywood et al. , 2019. This similarity suggests that these local 10 − 12 Gyr old stellar populations likely formed from similar material (in terms of enrichment properties) to the massive O/B-type stellar populations we observe in the rest-frame FUV spectra of galaxies at = 3.5 (lookback time 11.7 Gyr). The basic interpretation of this result would be that our = 3.5 observations probe the formation of the stellar populations that dominate the largest scale heights (thick disk, subsets of the halo and bulge) in spiral galaxies at = 0. We note that the potential systematic effects discussed in Section 4.3.2 do not affect this observation. In fact, converting our g measurements to a RL abundance scale would shift [O/Fe] by +0.24 dex, somewhat improving the overlap between our data and the Milky Way halo/bulge population. These results can also be compared to the element abundance ratios of damped Lyman− systems (DLAs) at > 2, which also exhibit -enhanced abundance ratios at low metallicities ([Fe/H] 1.5) (e.g. Cooke et al. 2015;De Cia et al. 2016 Fig. 10 at similar metallicity. Based on their observations, Cooke et al. (2015) argue that the typical star-formation and chemical enrichment histories of DLA's are not representative of the galaxy population that formed the Milky Way stellar halo. Following the same argument, our results indicate that stellar populations drawn from the main sequence of star-formation at 3.4 are representative of at least a fraction of MW halo stars.
Improved observations of [Fe/H]−[O/Fe] at high redshift have the potential to provide futher constraints on the typical star-formation and chemical evolution histories of early galaxies. For example, the combined FUV+optical observations presented here can be performed at = 2−4, and offer the opportunity to observe the evolution of [Fe/H] − [O/Fe] across ≈ 2 Gyr of cosmic time (and a wide range in stellar mass), placing unique constraints on the early evolutionary pathways of galaxies and providing unique clues as to the connection between various local and high-redshift stellar populations.

Excitation conditions at high redshift
The observed (O/Fe)-enhancement in 3 NIRVANDELS galaxies also has important implications for our understating of the excitation conditions in high-redshift H regions. Below, we briefly examine our results in relation to the much-discussed offset between highredshift and local star-forming galaxies in classical line ratio diagnostic diagrams. In addition, we explore whether (O/Fe)-enhancement can resolve the previously-reported difficulty in reproducing > 3 emission-line ratios using standard photoionization modelling analyses.

The evolution of optical emission line ratios with redshift
It has been known for some time that high-redshift star-forming galaxies are offset from local star-forming galaxies in various nebular emission-line diagnostics diagrams (most notably the N2 BPT diagram; e.g., Steidel et al. 2014;Shapley et al. 2015). A number of physical explanations have been invoked to explain this effect, including an evolution in ionization parameter and ISM pressure Kashino et al. 2017), N/O abundance ratios (Masters et al. 2014;Shapley et al. 2015;Sanders et al. 2016) and harder ionizing spectra (Steidel et al. 2016;Strom et al. 2017;Sanders et al. 2020a;Topping et al. 2020b). In Fig. 11 Sanders et al. (2017) and Jeong et al. (2020), our H sample is drawn from the catalogue of Pilyugin & Grebel (2016) and supplemented with additional data from Croxall et al. (2016) and Toribio San Cipriano et al. (2016). We choose to use local H regions (as opposed to SDSS spectra) to avoid the problem of DIG contamination in the integrated spectra of local star-forming galaxies (e.g. Sanders et al. 2017).
It can be seen from Fig. 11 that, in agreement with much of the previous literature, we find that the 3.4 line ratios are, in general, systematically offset from the median values at = 0. The results presented in this paper, as well as previous observations of (O/Fe)-enhancement at > 2, would suggest that one reason for this offset is due to the fact that, at fixed oxygen abundance, starforming galaxies at > 3 have harder ionizing continuum spectra (i.e., lower [Fe/H]) compared to star-forming galaxies in the local Universe. Indeed, Topping et al. (2020a,b) have shown explicitly how the increase in the excitation conditions within H regions due to (O/Fe)-enhancement is directly related to the observed N2 BPT offset in 2.4 galaxies. Our results do not rule out contributions from other physical effects, but highlight the importance of a harder ionizing continuum at high redshift (at fixed O/H).

Photoionization models
Another interesting question is whether -under the assumption ofenhanced chemical abundances -current stellar-population synthesis models, combined with photoionization modelling of H regions, can accurately reproduce the observed > 3 nebular emission-line ratios. Previous literature results have highlighted the fact that photoionization models generally struggle to reproduce the observed ([O ]+[O ])/H ratios in galaxies at > 3 (e.g. Onodera et al. 2016). However, these previous comparison have generally been based on the assumption of assumed solar abundance ratios, coupling the metallicity of the ionizing stars to the metallicity of the gas (i.e., ★ = g , or (O/Fe)=(O/Fe) ).
In Fig. 11, we also show the results from a photoionization modelling analysis of the R 23 − [O ]/[O ] diagram, in which we have relaxed this assumption and decoupled the stellar and gas-phase metallicities. Full details of the photoionization modelling analysis are given in Appendix A. It can be seen from Fig. 11 that, even when accounting for a harder ionizing continuum at fixed O/H, the photoionization models still fail to reproduce the full range of observed emission-line ratios. Crucially, the offset between models and observations is evident (although smaller) even when assuming the harder ionizing continuum spectra predicted by the BPASSv2.2 models 14 . These results indicate that, even when accounting for a harder ionizing continuum at fixed O/H (i.e., -enhancement), the shape of the ionizing spectrum predicted by current SPS models is still not sufficient to account for the full range typical emission line ratios observed at > 3.
The fact that the BPASSv2.2 models lie closer to the observed line ratios clearly indicates that increasing the strength of the ionizing continuum at fixed ★ helps to alleviate this problem, and that an insufficiently hard ionizing continuum is likely to be contributing to the discrepancy. Indeed, this conclusion has also frequently been reached with respect to modelling the He emission line in metal-poor starforming galaxies across a large range of redshifts (e.g., Plat et al. 2019;Saxena et al. 2020). Additionally, the offset may be related to the fact that most photoionization models treat galaxies as single H regions, when in reality the observed spectra are a line luminosityweighted averages of many H regions, each with distinct physical properties. Our analysis adds to the growing body of literature that suggests that the full range of emission line ratios observed at highredshift are still not fully captured by simple photoionization models (even when including the effect of -enhanced abundance ratios). Solving this problem will be crucial for accurately interpreting of the rest-frame optical spectra of galaxies at > 4 with JWST.

SUMMARY AND CONCLUSIONS
In this paper, we have presented the results of an analysis of the gas-phase and stellar metallicities of N = 33 typical star-forming galaxies at 3.0 ≤ ≤ 3.8 drawn from the VANDELS survey. Using gas-phase metallicity measurements derived from rest-frame optical emission line ratios and stellar metallicity measurements derived from rest-frame FUV continuum fitting, we have investigated the scaling of both parameters with galaxy stellar mass. In addition, we have directly compared the gas-phase and stellar metallicities of our sample to investigate evidence for (O/Fe)-enhancement in typical 14 The effects of binary star evolution in the BPASS models results in a significant boost in the ionizing flux at fixed ★ with respect to the Starburst99 models, particularly at ★ < 0.3 Z (see e.g., Stanway et al. 2016). star-forming galaxies at 3.4. Finally, we have discussed how our observations may be related to chemical abundance ratios of stars in the Milky Way. The main results of this study can be summarized as follows: (i) The three rest-frame optical emission-line ratios covered by our

MOSFIRE -and -band observations ([O ]/H , [O ]/[O ], and [Ne ]/[O ]
) all show a negative correlation with ★ . The slope of the 3.4 line ratio correlations are consistent with those in the local Universe, but with a higher normalization at fixed ★ . These qualitative trends are to be expected if gas-phase metallicities increase with ★ and decrease with redshift.
(ii) Converting the dust-corrected line ratios into gas-phase metallicities ( g , or O/H), we observe a clear gas-phase mass-metallicity relationship (MZR g ) using the calibration of Bian et al. (2018). The relation has a gradient of d(log g )/d(log ★ ) 0.3, with the average value of g increasing from 0.2 Z to 0.7 Z across the stellar mass range ★ 10 8.5 − 10 10.5 M .
(iii) We derived stellar metallicities ( ★ , or Fe/H) for our sample based on full FUV continuum fitting in the wavelength range 1250 − 2000 Å (rest-frame). As with g , we observe an increase in ★ with increasing ★ . The slope of the MZR ★ relation is consistent with the MZR g relation, but ★ is offset to lower values at fixed stellar mass. We find ★ increases from 0.08 Z to 0.25 Z across the stellar-mass range ★ 10 8.5 − 10 10 M .
(iv) We find an approximately constant offset between g and ★ at fixed ★ , implying enhanced O/Fe relative to the solar value (i.e., -enhancement), with (O/Fe) = 2.54 ± 0.38 × (O/Fe) . This result is consistent with the -enhancement observed in galaxy samples at slightly lower redshift ( 2.4; e.g., Steidel et al. 2016;Topping et al. 2020b) and is robust against known systematics. Within the uncertainties of our observations, there is no strong evidence that (O/Fe)-enhancement is a function of ★ at ★ < 10 10 M .
( In the Milky Way, the knee is predominantly populated by old stars consistent with a formation redshift of ∼ 2 − 4 (i.e., 10 − 12 Gyr old) found at large scale heights above the plane of the disk. Improved constraints on ★ and g at high redshift could shed light on the link between high-redshift stellar populations and the structural components in local galaxies using chemical abundance patterns.
(vi) Finally, we demonstrate that our results are consistent with previous results in the literature suggesting that the observed evolution in rest-frame optical emission-line ratios with redshift is caused, at least in part, by an increase in the hardness of the stellar ionizing continuum at fixed oxygen abundance (e.g. Steidel et al. 2016;Topping et al. 2020b). We also show, via a photoionization modelling analysis accounting for (O/Fe)-enhancement, that the full range of observed line ratios at > 3 remain difficult to reproduce with standard models. Resolving this tension between the models and observations will be crucial for accurately interpreting upcoming spectroscopic data of high-redshift galaxies from JWST.

ACKNOWLEDGMENTS
FC acknowledges the support of the UK Science and Technology Facilities Council. AC and MT acknowledge the support from grant