A novel approach to correcting $T_e$-based mass-metallicity relations

Deriving oxygen abundances from the electron temperature (hereafter the $T_e$-method) is the gold-standard for extragalactic metallicity studies. However, unresolved temperature fluctuations within individual HII regions and across different HII regions throughout a galaxy can bias metallicity estimates low, with a magnitude that depends on the underlying and typically unknown temperature distribution. Using a toy model, we confirm that computing $T_e$-based metallicities using the temperature derived from the [O III] $\lambda$4363/$\lambda$5007 or [O II] $\lambda\lambda$7320,7330 / [O II] $\lambda\lambda$3727 ratio ('ratio temperature'; $T_{\rm ratio}$) results in an underprediction of metallicity when temperature fluctuations are present. In contrast, using the unobservable 'line temperatures' ($T_{\rm line}$) that provide the mean electron and ion density-weighted emissivity yield an accurate metallicity estimate. To correct this bias in low-mass galaxies, we demonstrate an example calibration of a relation between T_ratio and T_line based on a high-resolution (4.5 pc) RAMSES-RTZ simulation of a dwarf galaxy that self-consistently models the formation of multiple HII regions and ion temperature distribution in a galactic context. Applying this correction to the low-mass end of the mass-metallicity relation shifts its normalization up by 0.18 dex on average and flattens its slope from 0.87 to 0.58, highlighting the need for future studies to account for, and correct, this bias.


INTRODUCTION
The gas-phase oxygen abundance ('metallicity' hereafter) of a galaxy is a fundamental property that encodes a wealth of information on its star formation, enrichment, and assembly history. Galaxy metallicity exhibits a tight correlation with stellar mass, the mass-metallicity relation (MZR; e.g. Lequeux et al. 1979, Tremonti et al. 2004, with scatter in this relation correlating with star formation rate (e.g. Ellison et al. 2008;Mannucci et al. 2010) and gas mass (e.g. Bothwell et al. 2013). Measurements of the slope and normalisation of this relation are interpreted as the collective effect of metal-enriched outflows and metal-poor inflows (e.g. Larson 1974;Dalcanton et al. 2004;Tremonti et al. 2004), and used to constrain astrophysical models that predict the chemical enrichment and assembly histories of galaxies (e.g. Ma et al. 2016;De Rossi et al. 2017;Torrey et al. 2019).
However, the exact shape of the MZR remains debated, as its slope and normalisation is sensitive to the technique used to measure metallicity (e.g. Kewley & Ellison 2008;Curti et al. 2020;Yates et al. 2020). Using recombination lines (RL) is often considered the most robust metallicity estimate, due to its weak dependence on gas temperature and density. But it relies on measuring multiple faint metal lines, which is possible in the local neighbourhood of the Milky Way (e.g. Peimbert et al. 1993;Esteban et al. 2002) but prohibitive for extragalactic samples. In contrast, strong-line methods rely only on ratios of the brightest emission lines and can therefore be widely applied. However, they need to be calibrated to more direct measurements (e.g. Curti et al. 2017) or theoretical predictions (e.g. ★ E-mail: alex.cameron@physics.ox.ac.uk McGaugh 1991; Kewley et al. 2019), each associated with dificultto-quantify systematic uncertainties (Kewley & Ellison 2008).
As a result, the method (Peimbert 1967) has been the de facto gold standard for measurements of the MZR (e.g. Andrews & Martini 2013, hereafter AM13;Curti et al. 2020). It relies on detecting auroral emission lines (e.g. [O ] 4363) to estimate , enabling us to derive metallicities solely from line emissivities based on collision strengths from quantum mechanical calculations (e.g. Aggarwal & Keenan 1999;Palay et al. 2012;Storey et al. 2014). Detecting such faint auroral lines is observationally challenging, but possible at lowredshift for large samples of individual low-metallicity galaxies (e.g. Izotov et al. 2019) or stacks of higher-metallicity galaxies (Curti et al. 2017). Further, the advent of JWST now allows us to make such detections at high-redshift (e.g. at = 8.5; Katz et al. 2022a;Curti et al. 2022). The central role played by the -method to constrain chemical evolution across cosmic time makes it paramount to pinpoint its potential biases and losses of accuracy.
In fact, -method metallicities are typically lower than RL techniques by as much as ∼ 0.3 dex in observations where both can be computed (e.g. Liu et al. 2001;García-Rojas & Esteban 2007). A promising explanation for these offsets is the presence of inhomogeneous temperatures for the emission-line emitting gas, which then biases the -method metallicity measurements low due to the exponential scaling of line emissivities with temperature (Peimbert 1967;Stasińska 2005;Bresolin 2008;Pilyugin et al. 2012; Section 2). Such temperature inhomogeneities can originate within individual H regions due to their internal structure (see Peimbert 2019 for a review), but also across a galaxy, as the multiple H regions integrated into a galactic spectrum are subject to a diversity of ISM densities, temperatures and feedback conditions. Although -metallicities can be Standard T e method Figure 1. Toy-model demonstrating how temperature fluctuations bias the -method when applied to a uniform metallicity system. From left to right, lower panels show the discrepancy between the true metallicity and that derived from Equation 1 for: a homogeneous cloud with uniform temperature and using ratio (the inferred from [O ] 4363/ 5007), a cloud with temperature fluctuations and using ratio , and a cloud with temperature fluctuations and using line (Equation 2). Temperature fluctuations are either normally (blue) or lognormally (orange) distributed (insets). The standard -method (centre panel) always underpredicts metallicity when temperature fluctuations are present, with the deficit depending on the exact shape of the temperature distribution. In contrast, when the cloud is homogeneous or when line is used, the metallicity prediction is accurate. debiased by measuring multiple temperatures for the same system to estimate the underlying distribution (Peimbert & Costero 1969;Peimbert et al. 2004), such observations are typically restricted to nearby, individual H regions and are rarely feasible for distant galaxies.
Pinpointing the importance of galactic-scale temperature fluctuations is thus essential to calibrate the true magnitude of this longstanding bias and recover accurate extragalactic -metallicities. In this paper, we introduce a new physically-motivated model to achieve this aim. Rather than estimating temperature fluctuations from measurements of local H regions (e.g. Kobulnicky et al. 1999) or photoionization models (Stasińska 1990;Garnett 1992), we use a highresolution (≈ 4.5 pc) simulation of an isolated dwarf galaxy with multifrequency radiation-hydrodynamics coupled to non-equilibrium metal and molecular chemistry. This allows us to self-consistently predict the distribution of H region temperatures, across an entire galaxy undergoing star formation and feedback episodes. We estimate the resulting bias in metallicities (Section 3) and how it affects the derived MZR (Section 4).

HOW TEMPERATURE FLUCTUATIONS AFFECT -BASED METALLICITIES
To gain insights into the importance of sub-resolution temperature fluctuations, we construct a toy model that derives -metallicities from O and O emission lines which are generally readily available for extragalactic studies. Ionized oxygen and hydrogen trace each other in the ISM due to their near-identical ionization potentials and a strong coupling from charge exchange reactions. Assuming in these ionized regions that the O + and O ++ ionization states dominate, the metallicity is: The key to this calculation is the measurement of the temperature. It is typical to assume 3 = 4 and adopt the ratio temperature ( ratio ) measured from the [O ] 4363/[O ] 5007 ratio. Then 1 = 2 are either measured from [O ] auroral lines, or derived by assuming a relation between the O and O temperatures (e.g. Pérez-Montero 2017). Indeed, applying Equation 1 with ratio to a single parcel of gas with uniform temperature provides an unbiased estimate of the metallicity, which depends only on atomic physics (Figure 1, left).
However, when multiple parcels of gas with a range of temperatures are present, we can introduce the 'line temperature', line (e.g. Stasińska 1978): [OIII] λ5007 [OIII] λ4363 [OII] λλ3727 where is the electron density, is the number density of species with a transition that gives rise to an emission line with wavelength , and is the emissivity of that line. line encodes the average emissivity of each emission line across the range of probed temperatures, weighted by the local electron and ion densities. This quantity is related to the intrinsic 'ionic temperature' (the mass-weighted average temperature of a given ion), however line is more directly related to observed emission line fluxes as it additionally encodes information of how emissivity scales with temperature. The crux of the bias described in Section 1 is that, since the emissivity of H , [O ] 3727, and [O ] 5007 scale differently with temperature, each line has a different line differing from ratio . Hence using ratio in Equation 1 can result in an incorrect metallicity.
To gauge the magnitude of this effect, we construct a system comprised of 1,000 gas parcels, all with constant metallicity of 12 + log 10 (O/H) = 8.48, randomized electron densities between 10 −1 − 10 3 cm −3 , and randomized temperatures. We construct 28 systems assuming either a normal ( = 10 3 K) or lognormal ( = 0.3 dex) temperature distribution, each with 14 different mean temperature values ranging from 7,000 K to 20,000 K. Within each parcel, the gas temperature is the same for each ion. We then obtain the total luminosities of and H by summing the emergent luminosities from each gas parcel. 2 Applying the -method, we compute the 'measured' metallicity of the system by deriving ratio from [O ] 4363/ 5007 and inputting this into Equation 1. The centre panel of Figure 1 shows that metallicities derived using ratio systematically underpredict the true value, and the magnitude of the bias strongly depends on both the shape of the temperature distribution (blue versus orange) and its mean temperature (left to right). Since mean temperature is often correlated with galaxy metallicity, temperature fluctuations can affect relative metallicity measurements, and impact the shape of the MZR.
In contrast, if we adopt the corresponding line for each in Equation 1, we obtain an unbiased metallicity measurement, regardless of the underlying temperature distribution (Figure 1 right panel). However, line is not a directly measurable quantity. Given that the sub-resolution temperature distributions in real galaxies are much more complicated than the simple distributions presented here, this motivates a theoretical approach to predicting what distribution the temperatures in a realistic galaxy might take. Thus, we now present a new, physically-motivated exploration of the temperature distribution in a simulated isolated dwarf galaxy. Based on this, we outline a novel method to convert ratio to line and debias -derived metallicities.

QUANTIFYING TEMPERATURE FLUCTUATIONS IN A SIMULATED DWARF GALAXY
We now make use of novel numerical methods that allow us to directly predict the temperature distribution of H regions within a simulated galaxy, and estimate a relation between ratio and line .

Simulation methods
Our simulations achieve a resolution of 4.5 pc across the ISM, resolving the structure of large H regions. Using the out-ofequilibrium temperatures, ionization states, and electron densities self-consistently produced by the simulation, we solve for equilibrium level populations to predict emission line luminosities. We follow Katz et al. (2022c) to compute Balmer lines according to the most recent atomic data in CHIANTI (Dere et al. 2019), while oxygen line luminosities are calculated with PyNEB (Luridiana et al. 2015). We simulate the same isolated dwarf galaxy as in Katz (2022) with a halo mass of 10 10 M , a circular velocity of 30 km s −1 , a disk gas mass of 3.5 × 10 8 M , and an initial central metallicity of 10 −1 (see Katz 2022, Section 4 for further details). We then evolve these initial conditions using the RAMSES-RTZ code (Katz 2022) and the PRISM interstellar medium (ISM) model for cooling and heating (Katz et al. 2022b). We improve the resolution by a factor 4 compared to Katz (2022) to better resolve H regions, and run the simulation for twice as long.
Radiation hydrodynamics is self-consistently computed in eight energy bins (Kimm et al. 2017, Table 2), assuming a reduced speed of light ( sim = /100), and is coupled to the non-equilibrium chemistry of eight ionization states of O, seven of N, six of C, and six of Si, Mg, Fe, S, and Ne, and all ionization states of H and He. We account for star formation, stellar feedback and enrichment from massive stars (Katz 2022, Section 4), and apply a Haardt & Madau (2012) UV background which is exponentially suppressed at > 10 −2 cm −3 . We employ a fixed cosmic-ray background ionization rate of cr = 10 −16 s −1 H −1 . We highlight that ISM conditions are sensitive to the choice of feedback model (e.g. Roca-Fàbrega et al. 2021), and that future works are required to assess the extent of these uncertainties and how they affect the robustness of derived temperature fluctuations in simulated galaxies. Similarly, uncertainties associated with numerical metal mixing could affect the distribution of metals and their temperature distributions. In particular, adaptive mesh refinement (AMR), as used in this study, can over-mix the fluid (e.g. Wadsley et al. 2008) and smooth out metal (and temperature) distributions. This would, however, decrease temperature fluctuations across the galaxy, making results in this section conservative with respect to this issue.

Abundance discrepancy in our simulated galaxy
To provide the cleanest test of the -method, we work only with the intrinsic luminosities so as to not introduce uncertainties due to dust attenuation. We assume that we detect [O ]  We evolve the galaxy for 400 Myr. After the initial starburst relaxes from the initial conditions, the star formation rate (SFR) fluctuates in the range 10 −2 < SFR/M yr −1 < 10 −1 . These temporal fluctuations are reflected on the galaxy-integrated emission line luminosities needed for the -method (Figure 2, top panel). We further show in  the bottom panel of Figure 2 the ion mass-weighted temperature distributions for O , O , and H at the end of the simulation, and emission-line maps (Figure 3). At a given time, the dynamic ISM environment generates an extended ion temperature distribution with non-trivial shape, and an irregular emission line spatial distribution. These temperature fluctuations will cause a bias in the -method. [O ] 5007 and H using the self-consistent simulated distribution of temperatures, electron and ion densities, and compare the derived metallicities between the two in Figure 4. Using ratio results in a systematic underprediction with an average deficit of 0.21 dex, ranging between at 0.1−0.5 dex during the course of the simulation (colours). These offsets have comparable magnitude to discrepancies between and RL metallicities (García-Rojas & Esteban 2007), and between galaxy-integrated measurements compared to combining their individual H regions (Kobulnicky et al. 1999).
A possible avenue to correct such abundance discrepancy is to use the 2 -method (Peimbert & Costero 1969), in which ion temperatures are measured from multiple different tracers to better estimate their underlying temperature distribution. However, detecting multiple independent temperature tracers is highly challenging beyond the local Universe, limiting the applicability of this approach for extragalactic studies. Instead, in the next section we introduce a novel approach in which we suggest that ratio can be converted to line which, as discussed in Section 2, results in an unbiased metallicity measurement. We foresee that this approach will be more widely applicable to extragalactic studies where most observations have at most one temperature tracer. (3) We emphasise that these relations are derived from a single simulation of an isolated dwarf galaxy. It remains unclear how these generalise to galaxies of different masses, metallicities, and star formation rates, and how sensitive they are to the choice of feedback model. A detailed quantification on how to generalise the coefficients in Equation 3 across the whole galaxy population is beyond the scope of this work, and we stress that our determinations should not be applied blindly to large datasets. However, our simulation currently provides the most realistic estimate of temperature fluctuations across dwarf galaxies -we thus extrapolate them to a range of stellar masses in the next section to illustrate how accounting for temperature fluctuations could reshape the low-mass end of the MZR, strongly motivating future studies extending our analysis.
We note that the relations described in Equation 3 could also be derived in terms of ratio (O ). Indeed, at higher masses and metallicities, the [O ] auroral lines may become the most accessible temperature diagnostic (e.g. Sanders et al. 2022). However, given that the [O ] 4363 / 5007 ratio is the most widely used temperature diagnostic in extragalactic studies (especially in the dwarf galaxy regime), we have chosen to base the relations in Equation 3 on ratio (O ). Consideration of an O based line − ratio relation is deferred to future work that will study higher mass and more metalenriched galaxies.

DISCUSSION AND CONCLUSIONS
We have shown how temperature fluctuations within an H region and throughout a galaxy can bias -method metallicities low, with a magnitude that depends on the shape and mean of the underlying distribution. While the existence of this bias is well established (e.g. Peimbert 1967;Stasińska 2005), estimating its magnitude and correcting it in extragalactic samples is challenging (e.g. Peimbert 1967, AM13). We propose a new solution based on deriving a relation between the observable ratio (that can be derived from auroral-tonebular line ratios) to the unobservable line , which we demonstrate provides a much more accurate temperature estimator for the method (Figure 1).
We calibrate this relation using high-resolution (4.5 pc) numerical simulations of a dwarf galaxy with RAMSES-RTZ that couple non-equilibrium metal chemistry to multi-frequency radiation hydrodynamics to estimate the temperature distributions across the warm-ionised gas in the galaxy. Applying the standard -method to simulated data underpredicts the representative metallicity of H regions at all times, by 0.21 dex on average and up to 0.5 dex ( Figure 4). Nonetheless, each line temperature tightly correlates with ratio from [O ] ( Figure 5), offering us the opportunity to debias metallicity estimates.
To illustrate the importance of accounting for unresolved temperature fluctuations when determining the MZR, we use the stacked SDSS spectra from AM13 for stellar mass bins below 10 8.5 M , and apply the -method from the [O ] Figure 6 shows the resulting metallicities (black points), with the best fit (black line) using the functional form in AM13 (their Equation 5). The original fit to the MZR from AM13 (red line) shows minor differences due to their different atomic data, a different relation 3 to convert between (O ) and (O ), and their more extended stellar mass range for the fit.
We then use Equation 3 to estimate line for each emission line before deriving the metallicity from Equation 1 and recalculate the MZR (blue points). The metallicity increases at all stellar masses, with the stellar-mass dependent offset reaching ∼ 0.3 dex for lower . MZR for SDSS galaxies based on the standard method using ratio (black), our updated method using the estimated line (blue), and the fit from AM13 (red). The slope of the MZR flattens and the normalization increases using our new approach to debiasing the measurement. mass bins, and flattens the measured slope of the MZR from 0.87 to 0.58.
Further work is needed to establish the robustness of applying Equation 3 derived from a single dwarf galaxy to the general sample from AM13 (as well as to other, potentially less biased, samples), however this experiment emphasizes that a careful quantification and correction of biases in metallicity measurements is essential to robustly interpret the MZR. We advocate for larger suites of comparable simulations than used in this study, across a larger range of stellar masses and metallicities, and in a cosmological context. Refining the relations presented in Equation 3 from such a suite of simulations could aid interpretations of observations in a manner comparable to how 1D photoionisation models (from e.g. CLOUDY or MAPPINGS) have historically been used.
Finally, other metallicity diagnostics that are readily-available for extragalactic samples, in particular theory-calibrated strong-line methods, have shown offsets compared to metallicities derived using -based methods (e.g. AM13). We plan in future work to use similar simulations as used in this work to explore the physical nature of these offsets and propose new corrections to improve the robustness of metallicity estimates.