Does the Fundamental Metallicity Relation Evolve with Redshift? I: The Correlation Between Offsets from the Mass-Metallicity Relation and Star Formation Rate

The scatter about the mass-metallicity relation (MZR) has a correlation with the star formation rate (SFR) of galaxies. The lack of evidence of evolution in correlated scatter at $z\lesssim2.5$ leads many to refer to the relationship between mass, metallicity, and SFR as the Fundamental Metallicity Relation (FMR). Yet, recent high-redshift (z>3) JWST observations have challenged the fundamental (i.e., redshift-invariant) nature of the FMR. In this work, we show that the cosmological simulations Illustris, IllustrisTNG, and EAGLE all predict MZRs that exhibit scatter with a secondary dependence on SFR up to $z=8$. We introduce the concept of a"strong"FMR, where the strength of correlated scatter does not evolve with time, and a"weak"FMR, where there is some time evolution. We find that each simulation analysed has a weak FMR -- there is non-negligible evolution in the strength of the correlation with SFR. Furthermore, we show that the scatter is reduced an additional ~10-40% at $z\gtrsim3$ when using a weak FMR, compared to assuming a strong FMR. These results highlight the importance of avoiding coarse redshift binning when assessing the FMR.


INTRODUCTION
The metal content of galaxies provides key insights into galaxy evolution.Stellar winds and supernovae explosions eject metals formed in stars into the interstellar medium (ISM).Metals then mix via galactic winds (e.g., Lacey & Fall 1985;Koeppen 1994) and turbulence (e.g., Elmegreen 1999) within the disc while pristine gas accretion from the circumgalactic medium (CGM) and outflows dilute the metal content (e.g., Somerville & Davé 2015).Thus, the metal content (metallicity) of the gas within a galaxy is sensitive to such processes, providing a window into the evolutionary processes within a galaxy (Dalcanton 2007;Kewley et al. 2019;Maiolino & Mannucci 2019).
Evidence for the sensitivity of metal content to the gas dynamics within a galaxy is perhaps most clearly seen within the relationship between the stellar mass of a galaxy and its gas-phase metallicity.This mass-metallicity relationship (MZR) describes a relationship of increasing metal content in galaxies with increasing stellar mass ★ E-mail: alexgarcia@virginia.edu † ARC DECRA Fellow (Tremonti et al. 2004;Lee et al. 2006).At low stellar masses, the MZR relationship is well-described as a power-law, whereas at high masses (log[ * / ⊙ ] > 10.5) the MZR plateaus (e.g., Tremonti et al. 2004;Zahid et al. 2014;Blanc et al. 2019).Furthermore, at a fixed stellar mass, low (high) metallicity galaxies have systematically elevated (depressed) gas masses (Bothwell et al. 2013;Scholte & Saintonge 2023) and SFRs (Ellison et al. 2008;Mannucci et al. 2010).The inverse relationship between a galaxy's metal content and SFR (or gas content) at a fixed stellar mass has been seen in the gasphase in observations (e.g., Lara-López et al. 2010;Bothwell et al. 2016;Alsing et al. 2024;Yang et al. 2024) and simulations (e.g., De Rossi et al. 2017;Torrey et al. 2018) as well as for stellar metallicities in simulations (De Rossi et al. 2018;Fontanot et al. 2021;Garcia et al. 2024;Looser et al. 2024) and recent observations (Looser et al. 2024).This secondary dependence on SFR and gas content is qualitatively well-described with basic competing physical drivers: (i) as new pristine gas is accreted onto a galaxy, it drives galaxies toward higher gas fractions, higher star formation rates (SFRs), and lower metallicities, while (ii) galaxies will persistently tend to consume gas and produce new metals, driving galaxies toward lower gas fractions, lower SFRs, and higher metallicities (e.g., Davé et al. 2011;Dayal et al. 2013;Lilly et al. 2013;De Rossi et al. 2015;Torrey et al. 2018).
It is therefore expected that secondary dependence would remain present for galaxies across a wide redshift range given the ubiquity of these physical drivers.At higher redshift the MZR has been seen to persist (albeit with a lowered overall normalisation e.g., Savaglio et al. 2005;Maiolino et al. 2008;Zahid et al. 2011;Langeroodi et al. 2023) along with the secondary dependence on SFR (e.g., Belli et al. 2013;Salim et al. 2015;Sanders et al. 2018Sanders et al. , 2021)).Critically, it has been put forth that a single, redshift-invariant plane can be used to describe both the general evolution of the MZR as well as the secondary correlations (Mannucci et al. 2010).This single surface/relation that can describe the metallicity of galaxies over a wide mass and redshift range is referred to as the fundamental metallicity relation (FMR).Despite the success of characterising galactic metallicities at  ≲ 2.5 (∼ 80% of cosmic history), Mannucci et al. (2010) report some evidence for deviations from the FMR at  > 3. JWST observations have recently corroborated the existence of deviations from the FMR at  > 3 (Heintz et al. 2023;Curti et al. 2023;Langeroodi & Hjorth 2023;Nakajima et al. 2023).
To remain truly redshift invariant, the FMR must capture two distinct features of the MZR simultaneously: (i) the existence of a secondary relationship with SFR at fixed redshift, and (ii) the redshift evolution (or lack thereof) in the normalisation.It is therefore possible that a change in either the MZR's secondary correlation with SFR or the redshift evolution of the normalisation of the MZR (or perhaps a combination thereof) may indicate FMR evolution.Many of the previously mentioned studies investigating high-redshift galaxy populations apply a  ∼ 0 calibrated FMR to higher redshift data (e.g., Mannucci et al. 2010;Wuyts et al. 2012;Belli et al. 2013;Sanders et al. 2021;Curti et al. 2023;Langeroodi & Hjorth 2023;Nakajima et al. 2023).However, it is unclear how to effectively decouple (and subsequently interpret) the observed evolution at high redshift in these frameworks.Some work has been done up to this point observationally looking at higher redshifts independently to specifically isolate the scatter about the MZR (e.g., Salim et al. 2015;Sanders et al. 2015Sanders et al. , 2018;;Li et al. 2023;Pistis et al. 2023).These works find that there may be some evolution within the scatter about the MZR at intermediate redshifts (Pistis et al. 2023 suggest potentially as low at  ∼ 0.63).Yet, there are comparatively few simulations results on a systematic examinations on the strength of the secondary dependence on gas content and/or SFR at individual redshifts.
In this work, we investigate the redshift evolution of the MZR's secondary dependence on SFR from the perspective of the cosmological simulations Illustris, IllustrisTNG, and EAGLE.The rest of the paper is as follows: In §2 we describe the simulations we use, our galaxy selection criteria, and summarize definitions of the FMR.In §3 we present the redshift evolution of the FMR as found in simulations.In §4 we quantify the impact of the new framework on the scatter about the MZR, discuss the advantages and challenges in the new framework, and then discuss potential impacts of the physical models.Finally, in §5 we present our conclusions.

METHODS
We use the Illustris, IllustrisTNG, and EAGLE cosmological simulations to investigate the dependence of the gas-phase metallicity on stellar mass and star formation.Each of these simulations has a sub-grid ISM pressurisation model, which creates "smooth" stellar feedback.We believe that generic results from all three of these simulations should constitute a fair sampling of predictions from subgrid ISM pressurisation models owing to the appreciably different physical implementations.
Here we briefly describe each of the simulations from this analysis, the galaxy selection criteria we employ, and present a new framework for interpreting the Mannucci et al. (2010; hereafter M10) FMR projection.All measurements are reported in physical units.

Illustris
The original Illustris suite of cosmological simulations (Vogelsberger et al. 2013(Vogelsberger et al. , 2014a,b;,b;Genel et al. 2014;Torrey et al. 2014) was run with the moving-mesh code arepo (Springel 2010).The Illustris model accounts for many important astrophysical processes, including gravity, hydrodynamics, star formation/stellar evolution, chemical enrichment, radiative cooling and heating of the ISM, stellar feedback, black hole growth, and AGN feedback.The unresolved star forming ISM uses the Springel & Hernquist (2003) equation of state, wherein new star particles are created from regions of dense ( H > 0.13 cm −3 ) gas.The masses of the stars within the star particle are drawn from a Chabrier (2003) initial mass function (IMF) and metallicities are adopted from the ISM where they are born.As the stars evolve, they eventually return their mass and metals back into the ISM.The stellar mass return and yields used allow for the direct simulation of time-dependent return and heavy metal enrichment, explicitly tracking nine different chemical species (H, He, C, N, O, Ne, Mg, Si, and Fe).

IllustrisTNG
IllustrisTNG (The Next Generation; Marinacci et al. 2018;Naiman et al. 2018;Nelson et al. 2018;Pillepich et al. 2018b;Springel et al. 2018;Pillepich et al. 2019;Nelson et al. 2019a,b, hereafter TNG) is the successor to the original Illustris simulations, alleviating some of the deficiencies of and updating the original Illustris model.As such, the Illustris and TNG models are similar, yet have an appreciably different physical implementation (see Weinberger et al. 2017;Pillepich et al. 2018a, for a complete list of differences between the models).A critical difference between the Illustris and TNG models for the context of this work is TNG's implementation of redshift-scaling winds.The TNG model employs a wind velocity floor not present in the original Illustris model in order to prevent low mass haloes from having unphysically large mass loading factors.Consequently, low redshift star formation is suppressed in the TNG model.TNG implements the same equation of state for the dense star forming ISM as Illustris (Springel & Hernquist 2003).As in Illustris, new star particles are created from dense gas using the Chabrier (2003) IMF.Furthermore, TNG tracks the same nine chemical species as Illustris, while also following a tenth "other metals" as a proxy for metals not explicitly monitored.

EAGLE
Unlike Illustris and TNG, "Evolution and Assembly of GaLaxies and their Environment" (EAGLE, Crain et al. 2015;Schaye et al. 2015;McAlpine et al. 2016) employs a heavily modified version of the smoothed particle hydrodynamics (SPH) code gadget-3 (Springel 2005; anarchy, see Schaye et al. 2015 Appendix A).EAGLE includes many of the same baryonic processes (star-formation, chemical enrichment, radiative cooling and heating, etc) as Illustris and TNG.The dense (unresolved) ISM in EAGLE is also treated with a sub-grid equation of state (Schaye & Dalla Vecchia 2008;hereafter, SDV08), much like that of SH03.The SDV08 prescription forms stars according to a Chabrier (2003) IMF from the dense ISM gas.The density threshold for star formation is given by the metallicitydependent transition from atomic to molecular gas computed by Schaye (2004) with an additional temperature-dependent criterion (Schaye et al. 2015).Stellar populations evolve according to the Wiersma et al. (2009) evolutionary model and eventually return their mass and metals back into the ISM.EAGLE explicitly tracks eleven different chemical species (H, He, C, N, O, Ne, Mg, Si, S, Ca, and Fe).
The full EAGLE suite is comprised of several simulations ranging from size (12 Mpc) 3 to (100 Mpc) 3 .We use data products at an intermediate resolution (2 × 1504 3 particles) run with a box-size of (100 Mpc) 3 referred to as RefL0100N1504 (hereafter simply EAGLE) as a fair comparison to the selected Illustris and TNG runs.

Galaxy selection
All three simulations in this work select gravitationally-bound substructures using subfind (Springel et al. 2001;Dolag et al. 2009), which identifies self-bound collections of particles from within friends-of-friends groups (Davis et al. 1985).We limit our analysis to central galaxies that we consider 'well-resolved' (i.e., containing ∼100 star particles and ∼500 gas particles), thus we restrict the sample to galaxies with stellar mass log( * [ ⊙ ]) > 8.0 and gas mass log( gas [ ⊙ ]) > 8.5.We place an upper stellar mass limit of log( * [ ⊙ ]) > 12.0 1 .Following from a number of previous works (see, e.g., Donnari et al. 2019;Nelson et al. 2021;Hemler et al. 2021;Garcia et al. 2023), we exclude quiescent galaxies by defining a specific star formation main sequence (sSFMS).We do so by fitting a linear-least squares regression to the median sSFR-M * relation with stellar mass log( * [ ⊙ ]) < 10.2 in mass bins of 0.2 dex.The sSFMS above 10.2 log  ⊙ is extrapolated from the regression.Galaxies that fall greater than 0.5 dex below the sSFMS are not included in our sample.As we show in Garcia et al. (2024;that paper's Appendix B), our key results (using stellar metallicities) are not sensitive to our sample selection.We obtain the same result here in the gas-phase: our key results are qualitatively unchanged by the same variations as Garcia et al. (2024) in selection criteria (see Appendix A).
As metallicity measurements are typically limited to star forming regions in observations (Kewley & Ellison 2008;Kewley et al. 2019), all of the analysis of gas-phase metallicities presented here is based only on star-forming gas (as defined in Section 2.1 for Illustris/TNG and Section 2.3 for EAGLE). 1 The upper mass limit does not exclude any galaxies for most redshifts . Decision tree for the   ZR, see Section 2.5 for full details.This shows the different relationships that can be included under the umbrella   metallicity relation (  ZR; see Equation 1).First is the traditional MZR where  = 0.0 and second is the FMR where  ≠ 0.0.The FMR can be further broken into two categories: strong and weak depending on if  varies as a function of redshift (weak) or not (strong).

Definitions of the FMR
M10 propose that the 3D relationship between stellar mass, gas-phase metallicity, and star formation rate (SFR) can be projected into 2D using a linear combination of the stellar mass and star formation: where  is a free parameter that ranges from 0 to 1 2 .The free parameter  holds all the diagnostic power on the strength of the MZR's secondary dependence with SFR.By varying , the distribution of galaxies in   -metallicity space varies.We define a   -metallicity relation (  ZR) for each  as a linear-least squares regression 3 of the data.We compute the   ZR for  = 0.0 to 1.0 in steps of 0.01 and obtain the residuals about each regression.The projection that yields the minimum scatter in the residuals (smallest standard deviation) is deemed the best fit.The  value associated with this minimum scatter projection is henceforth referred to as  min .We define an uncertainty on  min by assuming that a projection that has scatter within 5% of the minimum value is a plausible candidate for the true  min (following from Garcia et al. 2024).
min physically represents the direction to project the 3D massmetallicity-SFR ( * −  gas − SFR) space into a minimum scatter distribution in 2D   −  space.Thus, the   ZR is the relation of merit in the 2D projection of the  * −  gas − SFR relation.There are two outcomes, either (i)  min = 0.0, wherein the canonical MZR is recovered, or (ii)  min ≠ 0.0, wherein an FMR is recovered.In this way, the   ZR can be thought of as a superset of relations containing the MZR, the strong FMR, and the weak FMR (relationships 2 In reality, the correlated scatter about the MZR has some non-negligible mass dependence.In fact, the correlation with SFR has been seen to weaken, or even invert, at high stellar mass (e.g., Yates et al. 2012;Alsing et al. 2024).Parameterisations exist that account for this mass dependence exist (e.g., Curti et al. 2020).We opt to not present other forms of the FMR in this work as a exercise on the extent to which the M10 projection can describe the of scatter at fixed redshift (see further discussion in Section 4.2). 3 M10 use a fourth-order polynomial for fitting.This practice is inconsistent in the literature with many (e.g., Andrews & Martini 2013) considering a linear regression.We show that using a fourth-order polynomial instead of a linear regression does not significantly alter our  min determination in Appendix B. We show these values in Figure 2.

Illustris
illustrated in Figure 1).Framing the FMR in this way underscores the decisions required in establishing the FMR.Previous studies have been somewhat restrictive in regards to these decisions.We therefore highlight the need to take a deliberate approach to our definitions to build a framework by which potential redshift evolution can be assessed.
Traditionally (as in, e.g., M10), the FMR is defined by determining  min at  = 0.This value has been seen to be roughly constant at  ≲ 2.5 (e.g., M10; Andrews & Martini 2013).We henceforth refer to the idea that  min does not vary as a function of redshift as the strong FMR.A single  min can describe both the MZR's secondary dependence and its normalisation evolution in the strong FMR.In this work, we investigate the claim that  min is constant over time by identifying the  min value that minimizes scatter at each redshift independently.This procedure allows  min to (potentially) vary as a function of redshift.Here we introduce the concept of a "weak" FMR.We define the weak FMR as a counterpoint to the strong FMR: that  min ≠ 0, but  min is not constant as a function of redshift (see illustrated relationship in Figure 1).
There are actually more parameters beyond  min that the FMR is defined by: the parameters of the regression (in our case slope and intercept).These additional parameters add complexity to the interpretation of the evolution.Regressions are inherently linked to the  min determination, yet the parameters of the fit can have a profound impact on interpretation of FMR evolution irrespective of  min variations.The impact of these parameters is beyond the scope of this work since we only examine each redshift bin independently here and the effect of the regression parameters is only felt when comparing different redshift bins.We do, however, address the impact of these parameters in a companion work (Garcia et al. in prep).

Does 𝛼 min vary as a function of redshift?
We use the best-fit  min values derived as a function of redshift to evaluate whether the scatter about the MZR evolves significantly with redshift.We find that  min ≠ 0 at all redshifts in each of the three simulations (Figure 2).Based on the first step of the decision tree in Figure 1, the non-zero  min values show there is an FMR in each simulation.The secondary dependence on SFR is present, at least to some extent, within the scatter of all the MZRs analysed here.It should be noted, however, that the uncertainty 4 on the TNG  = 0  min value does include  = 0.This implies a somewhat weak dependence on SFR at this redshift.In Garcia et al. (2024), we attribute a lack of a relationship at  = 0 in TNG to the redshift scaling of winds within the TNG model 5 .Briefly, the effect of adding winds that change with redshift suppresses low redshift star formation and increases the efficiency of high redshift stellar feedback compared to the Illustris model (see Pillepich et al. 2018a).It is therefore likely that the suppressed low redshift star formation causes the large uncertainty on  min at  = 0.As such, features of  min are sensitive to details of the wind implementation/strength prescribed by the model on which it is built (see Section 4.3 for further discussion).
Overplotted on Figure 2 (gray squares) are three observationally determined values of  min from M10 (0.32), Andrews & Martini (2013;0.66), and Curti et al. (2020;0.55).Each of these values was determined using SDSS galaxies at  ≈ 0 (offset horizontally for clarity).Deviations in the observational values are attributed primarily to: (i) different metallicity calibrations, (ii) using individual galaxies versus galaxy stacks (as in Andrews & Martini 2013), and (iii) selection biases towards higher star forming galaxies.Simulations are not directly affected by metallicity calibrations in the same way as observations.The sample selection criteria (outlined in Section 2.4) should help mitigate the effect of selection function biases of observations.Though we select star-forming galaxies, we do not just select the highest star forming galaxies.In spite of these potential differences, it is worth noting that the M10  min value agrees fairly well with the TNG and Illustris derived values at  = 0.Although the uncertainty on the TNG  min is significant enough to include the Curti et al. (2020) value by a factor of ∼ 1.5 times higher.Similarly, the Andrews & Martini (2013) value of 0.66 agrees fairly well with the derived value from EAGLE at  = 0, though we caution that this analysis was done with galaxy stacks whereas we us individual galaxies here.
Furthermore, we find that  min values show some level of redshift evolution in all three simulations (Figure 2 and Table 1).Interestingly, each simulation has qualitatively different redshift evolution.TNG  min values vary significantly from  = 0 to  = 1 but then level off, the Illustris  min values increase monotonically with redshift, and the EAGLE values decrease monotonically as a function of redshift.
We conduct a one-sample -test to validate this apparent redshift evolution in each simulation.The null hypothesis is that the mean  min is equal to  min at  = 0 (i.e., there is not redshift evolution).Given the uncertainty associated with each  min , we compute the sample mean by weighting each  min by the reciprocal of its squared uncertainty 6 .Additionally, we normalize the -statistic by the estimated error on the mean (the reciprocal sum of squared weights).We find that -statistics are −6.23,13.34, and 7.04 in Illustris, TNG, and EAGLE (respectively).These correspond to -values of 2.5 × 10 −4 for Illustris, 9.5 × 10 −7 for TNG, and 1.0 × 10 −4 EAGLE.We therefore reject the null hypothesis at a significance level of  = 0.05 4 Uncertainties on  min correspond to the uncertainty in the minimum dispersion (see Section 2.5 for definition) 5 We show this in Garcia et al. (2024) for stellar metallicities.That work also demonstrates that stellar and gas-phase metallicities are related to each other (see Section 4.1 of that work).Therefore, the same physical mechanism suppressing the correlated scatter for stellar metallicities is likely what is suppressing  min for the gas-phase. 6We note that we make the simplifying assumption of symmetric uncertainty by defining the offset as the average of the upper and lower offsets.We additionally verify that instead choosing the magnitude of either the upper or lower offsets does change the result.in each simulation, indicating statistically significant redshift evolution of  min .From the decision tree of Figure 1, the strong FMR is ruled out in favour of a weak FMR for each individual simulation.It is interesting to note that if we test instead using the  = 1 value of  min in TNG we obtain a -statistic of 1.64 (-value of 0.139).We could not reject the null hypothesis of no evolution of the FMR at the 0.05 significance level in that case (see above discussion on redshift-scaling winds in TNG model).
Our result in EAGLE indicating significant redshift evolution seemingly contradicts a previous study finding the FMR is in place and does not evolve out to  ≈ 5 in EAGLE (De Rossi et al. 2017).There is a subtle difference in the analysis between the two works, however: De Rossi et al. (2017) do not parameterise the FMR to test  min variations.They qualitatively examine the secondary dependence within the MZR and show that a  * − gas −SFR relation exists at  = 0 − 5 (i.e., there is at least a weak FMR over these redshift ranges).We find that an  * − gas −SFR relation at  = 0 − 5 exists in EAGLE via non-zero  min values (i.e., there is at least a weak FMR over these redshift ranges), consistent with De Rossi et al. (2017).Despite the persistence of the  * − gas −SFR relation, we confirm that there is a weak in EAGLE by using the M10 projection of the FMR.It should be noted that the uncertainty of the  = 0 and  = 5 values do overlap in EAGLE.The subtly of the redshift evolution may therefore be difficult to detect without fitting each redshift independently.

Scatter assuming different FMRs
The derived  min values show that Illustris, TNG, and EAGLE all have weak FMRs.We now examine how the impact of marginalising over variations in  min when assuming a strong FMR.Specifically, we want to quantify how the scatter changes when assuming a strong versus weak FMR.This will provide a quantitative metric for assessing how important considerations for an evolving FMR are.If the scatter were to remain unchanged, or change only marginally, the need for a weak FMR would be minimal.
To this end, we define three different ratios for quantitatively evaluating the importance of using a weak FMR.We consider ratios of the scatters about: (i) the weak FMR compared to the MZR, (ii) the strong FMR compared to the MZR, and (iii) the weak FMR compared to the strong FMR. Figure 3 illustrates these three different ratios evaluated at each redshift for Illustris (orange diamonds), TNG (green stars), and EAGLE (blue circles).We note that we calculate scatter about the MZR in the same way as we used to determine  min in the previous section (i.e., using the projection from Equation 1) in the following discussion.
The left panel of Figure 3 shows the standard deviation of the residuals (henceforth, scatter) about each redshift's weak FMR ( min determined at that redshift) normalised by the scatter about the MZR ( min = 0) as a function of redshift ( weak / MZR ).We find for all redshifts across the three simulations that the weak FMR reduces the scatter by ∼10 − 30% compared to the MZR.The exception is TNG at  = 0 having scatter reduction of less than 5% -falling within the nominal uncertainty on  min and implying there is functionally no difference between the scatter of the MZR and FMR at this redshift.This  = 0 TNG exception was discussed previously in the context of the  min value (see Section 3.1) and the lack of a relation was attributed to the redshift scaling winds in the TNG model (see Pillepich et al. 2018a).The scatter reduction is roughly constant as a function of redshift in both TNG and EAGLE at around ∼ 20% (barring the aforementioned TNG exception).Scatter reduction in Illustris ranges from ≲ 10% at  = 0 to nearly 30% at  = 8.
The middle panel of Figure 3 shows the scatter at each redshift assuming a strong FMR compared to that of the MZR at that redshift ( strong / MZR ).We define the strong FMR fit analogously to observations: we apply a  = 0 determined  min value to all redshifts.In TNG, we find a similar trend to that of  weak / MZR : a roughly constant scatter reduction as a function of redshift, albeit at a reduced value of ∼5 − 10% (see previous discussion about the exception at  = 0).The scatter reduction in Illustris is similarly constant around 10%.Evidently, the redshift evolution in Illustris seen The dark and light gray shaded regions (in all panels) represent 5% and 10% variations, respectively, of each ratio.Centre: Same as left, but now the numerator is the scatter about the FMR evaluated in each redshift bin with a  = 0 calibrated  min ( strong ).Right: Previous two panels divided by each other, the reduction in scatter in the relationship by determining  min at each redshift independently ( weak ) divided by using the  = 0  min value ( strong ).
previously with  weak / MZR disappears when assuming a strong FMR. strong / MZR actually increases nearly monotonically in EA-GLE as a function of redshift: the strong FMR fit on the low redshift bins is significantly better than the highest redshifts.Remarkably, assuming a strong FMR actually begins to increase the scatter by 10 − 40% compared to the MZR at high redshift ( > 5) in EAGLE.The concept of an FMR is one that relies on minimizing scatter compared to the MZR, yet at the highest redshifts in EAGLE it achieves the opposite.This is a clear failure of the strong FMR in EAGLE as well as a cautionary tale for interpreting future high-redshift FMR observations.
Finally, the right panel of Figure 3 shows the ratio of the scatter of the weak FMR divided by the scatter of the strong FMR evaluated at each redshift ( weak / strong ).The ratio  weak / strong is of particular interest as it provides a diagnostic for how well an assumed strong FMR characterises galaxies at higher redshift compared to their minimum scatter projection.The ratio is unity at  = 0 by construction, since the strong FMR assumes the  = 0  min value for all redshifts.In Illustris and EAGLE, the scatter reduction of the weak FMR at  ≤ 2 is less than 5%.The relatively low decrease in the scatter in these two simulations implies that the strong FMR might approximately hold at these low redshifts (qualitatively consistent with previous observational findings; M10; Cresci et al. 2019).The scatter reduction at  ≥ 3, however, is ≳ 10% for Illustris and EA-GLE.Both have monotonically decreasing ratios of scatter in the high- regime out to a 20% decrease in Illustris and nearly 40% in EAGLE.On the other hand, the scatter reduction in TNG stays roughly constant at around 15% at  > 0.
Overall, the 10-40% decrease in using a weak FMR indicates that high redshift galaxy populations are different from the low redshift systems.The strong FMR does not effectively characterise these high redshift galaxies.This marked shift in efficacy of the strong FMR further supports the idea there is some time evolution within the FMR in Illustris, TNG, and EAGLE.
It is worth noting how deceptive the lack of evolution in the strong FMR scatter reduction ratio (central panel of Figure 3) is for Illustris and TNG.Looking at the reduction in scatter of the strong FMR by itself in these two simulations may lead one to conclude that a strong FMR holds -the strong FMR does reduce scatter at all redshifts, even by a roughly constant amount.Indeed it is remarkable that the strong FMR reduces the scatter by a similar amount at  = 0 as  = 8 despite ignoring a variation of a factor of > 2 in  min .However, we emphasize that using the weak FMR significantly improves the characterisation of these galaxy populations, particularly at high redshift.
In summary, by determining  min at each redshift independently, we find that the scatter can be reduced an additional ∼ 10 − 40% compared to an assumed strong FMR.We therefore conclude that the variations in  min are significant, particularly at high redshift.The significant variations past  ≳ 3 seem to imply that the strong FMR is not even a good approximation in the early universe in our simulations.

What do variations in 𝛼 min mean?
The main idea of the FMR comes from the idea that in the MZR, at a fixed stellar mass, the metallicities and SFRs of galaxies are anti-correlated.Using  min is an attempt to represent the strength of the correlation between metallicity and SFR; however, it does not actually explicitly tell us about that relationship.Rather,  min values are tuned to minimize scatter.It is therefore critical to develop an understanding of the strength of the correlation between metallicity and star formation rates that is not just a scatter-minimisation tool.To build this understanding, we first take the FMR regression as defined in Section 2.5: a fourth-order regression.We show that our choice of first-order does not significantly impact our results in Appendix B 8 We note, however, that while  min is related to the strength of the (anti-) correlation between Δ and SFR, in actuality, there is another scaling with the slope of the FMR () slope is shallow at  = 0 and gets steeper with increasing redshift.This behaviour is consistent with the  min variations seen in Illustris - min is small at  = 0 in Illustris and increases with increasing redshift (see Figure 2).We find that the  = 0 slope in TNG is significantly weaker than the  > 0 slopes (central panel of Figure 4), consistent with the  min values from TNG.It should be noted, however, that the  min values at  ≥ 5 are all the same, whereas the slopes of Δ versus SFR change slightly at  ≥ 5. Finally, in EAGLE, we find that the slope of Δ versus SFR is steepest at  = 0 and shallows with increasing redshift, again consistent with behaviour in  min as a function of redshift.We emphasize that there is an additional term,  (the slope of the FMR regression from Equation 2), included in the slope of Δ versus SFR.We therefore caution against too strong a comparison against  min and slopes in Δ-SFR space.Changes in the slope of the FMR may cause a change in the slope of Δ versus SFR.Δ is only proportional to  min log SFR.Furthermore, we note that slopes in Δ-SFR space have some dependence on the stellar mass bin chosen.This behaviour is consistent with more recent parameterisations of the FMR (e.g., Curti et al. 2020) that find that the MZR turnover mass has some dependence on SFR.While an interesting area of future exploration, the stellar mass dependence of these slopes (and  min ) is beyond the scope of this work.We therefore caution that the model presented here neglects variations in the role of SFR as a function of stellar mass.However, since Equation 3 explicitly assumes a thin mass bin, the simple model presented here should be relatively robust to stellar mass variations since the Δ-SFR space slopes are not highly sensitive to mass (which we find to be predominately be the case in each simulation at each redshift).
In summary, the slope between offsets from the MZR and SFR offers a more straight-forward way to understand the (potential) evolution in the relationship between metallicity and SFR suggested by  min variations.

Advantages and challenges of a weak FMR Framework
The key advantage of fitting each redshift independently is to more effectively minimize the scatter.The weak FMR gives us a clear-cut metric for the strength of the MZR's secondary dependence on SFR arises that is completely independent of the evolution of the normalisation of the MZR.Independence from other redshift populations removes the possibility of conflating evolution of the normalisation of the MZR with evolution of the scatter.By using a  = 0 derived  min at all redshifts (i.e., strong FMR) we suppress any potential variation of  min as a function of redshift.As a consequence, the strong FMR does not optimally reduce scatter across redshift (discussed in more detail in Section 3.2).Using a weak FMR assumption therefore allows a more careful examination for the extent to which the observed FMR has variations.
A challenge of performing a similar analysis in observations is the amount of data available.For example, lower redshift galaxy populations are well sampled (e.g., M10 use 141,825 SDSS  ∼ 0 galaxies), but at higher redshift sampling becomes more difficult (e.g., the Nakajima et al. 2023 andCurti et al. 2023 analyses use less than 200 objects spanning a wider redshift range of  = 3 − 10).It is possible that subtle changes can be measured at lower redshifts (see Pistis et al. 2023 for a potential detection of MZR scatter variations at  ∼ 0.63); however, the most significant scatter reduction happens in the high redshift ( ≳ 3) populations (Figures 3).More complete samples of galaxy populations at these early times with, e.g., JWST are therefore required in order to undergo any weak FMRstyle analysis to detect significant deviations from the  = 0  min values.Moreover, a redshift-complete sample would be limited by our understanding of metallicity in the high redshift universe.Recently, work has been done to obtain reliable metallicity diagnostics at  > 4 using JWST/NIRSpec (e.g., Nakajima et al. 2023;Shapley et al. 2023;Sanders et al. 2024).However, more complete galaxy samples are required, particularly at the low metallicities seen at this epoch, to fully characterise these diagnostics.As such, it is currently difficult to ensure that  min values determined observationally are fair comparisons across the broad redshift range examined in this work.

Dependence on small scale physics implementations
We find that Illustris, TNG, and EAGLE have weak FMRs.The  min values are not the same, nor do they evolve in the same fashion, in the different models, however.The value of  min at any given redshift is a complicated by-product of a number of different physical processes.We therefore caution the reader against drawing conclusions on  min (and the evolution thereof) from the aggregation of the three individual models.While we have some qualitative understanding of how  min is set (or changed), the exact mechanisms setting  min are not entirely clear in detail.
What is clear is that  min is sensitive to the physics driving galaxy evolution.For example, in Section 3.1, we attributed the lowered  min values in TNG at  = 0 to the redshift-dependent wind prescription in the TNG model (as mentioned in Section 3.1).Through this example, the sensitivity of  min to the input physics within the simulation models becomes clear.The redshift-dependent winds in TNG work to increase wind velocities at low redshift which suppresses star formation.This star formation suppression likely plays a significant role in the overall decrease of  min seen at low redshifts in TNG.We therefore propose the evolution (or lack thereof) in the scatter about the MZR as a testable prediction to constrain the physical models of the simulations.
All three models examined here rely on effective equation of state sub-grid models for the dense, unresolved ISM (Springel & Hernquist 2003 for Illustris/TNG and Schaye & Dalla Vecchia 2008 for EAGLE).In recent years, however, high-resolution simulation modelling has begun to directly resolve the sites of star formation (e.g., Feedback In Realisitc Environments model; Hopkins et al. 2014).The stellar feedback in such simulations is much burstier than in the models presented here.We believe that bursty stellar feedback events should suppress  min values compared to Illustris, TNG, and EAGLE.Subgrid pressurization lends support to the ISM that is not coupled to star formation, and therefore blunts rapid variations in star formation and stellar feedback.Models without subgrid pressurization (like FIRE) do not have this source of ISM support, and therefore exhibit more rapid (i.e., bursty) variations in star formation and stellar feedback.Bursts may therefore curtail the effectiveness of star formation rates in regulating the gas-phase metallicity of a galaxy.Therefore the redshift variations in  min may be able to provide constraining power on the extent to which galaxies' feedback is more bursty or smooth.Although it should be noted that even within these smooth feedback models there is some disparity.

CONCLUSIONS
We select central star forming galaxies with stellar mass 8.0 < log( * [ ⊙ ]) < 12.0 with gas mass log( gas [ ⊙ ]) > 8.5 from  = 0 − 8 in the cosmological simulations Illustris, IllustrisTNG, and EAGLE.We investigate the extent to which the M10 parameterisation (see Equation 1;   ZR) of the fundamental metallicity relation (FMR; Equation 1) holds.The parameter of merit in the   ZR is  min , which is a parameter tuned to minimize scatter in the relation.Physically,  min sets a projection direction of the mass-metallicity-SFR space to a 2D space with minimal scatter.Many observational studies have claimed that this projection direction does not evolve with redshift (Mannucci et al. 2010;Cresci et al. 2019).
We discuss a new framework in which to examine the   ZR as a superset of the MZR ( = 0) and FMR ( ≠ 0).We further define both a strong and weak FMR.A strong FMR indicates that  min is constant as a function of redshift.Conversely, the weak FMR is where  min varies with redshift (see Figure 1 for complete illustrated relationship of   ZR).More generally, the strong FMR states the the M10 parameterisation can describe both the scatter and noramlisation of the MZR at the same time.
Our conclusions are as follows: • We find that  min ≠ 0 for all redshifts in Illustris, TNG, and EAGLE.This shows that there is an FMR in each of these simulations.We note, however, that the uncertainty in  min in TNG at  = 0 includes  min = 0.0.We attribute this to the increased suppression of low redshift star formation in the TNG model.
• Furthermore, we find that there is non-negligible evolution in  min as a function of redshift (Figure 2).This result suggests that the FMR in Illustris, TNG, and EAGLE is a weak FMR.
• We find that the weak FMR ( min determined at each redshift independently) consistently reduces scatter around 10 − 30% compared to the the MZR (left panel of Figure 3).The strong FMR also reduces the scatter compared to the MZR, albeit to a lesser extent than the weak FMR.At high- in EAGLE, however, using the strong FMR actually increases scatter compared to the MZR (centre panel of Figure 3).Overall, we find that at  ≳ 3 fitting galaxies with a weak FMR can reduce scatter ∼ 5 − 40% more than using the strong FMR (right panel of Figure 3).
• We suggest that the interpretation of  min variations is more well-understood in the context of the slope of Δ (offsets from the MZR) as a function of log SFR (see Figure 4).In this context, the weak FMR suggest that the relationship between metallicity and SFR changes through cosmic time, whereas the strong FMR suggests that it does not change.We also show that the slope in Δ−log SFR space is proportional to  min (see Equation 3).
Obtaining one relationship that describes the metal evolution of all galaxies across time is an ambitious goal.It is worth appreciating how reasonably well a simple linear combination of two parameters can begin to achieve that goal at low redshift.Yet it is not perfect.To begin to rectify this, we develop a substantial overhaul to the current FMR paradigm (summarized in Figure 5).The results from this work show that Illustris, TNG, and EAGLE indicate deviations from the strong FMR.It is presently unclear whether the same is true in observations.Understanding whether the FMR in observations is weak or strong will aid in being able to understand the recent JWST observations suggesting high redshift FMR evolution.Figures 1 and 5.We acknowledge the Virgo Consortium for making their simulation data available.The EAGLE simulations were performed using the DiRAC-2 facility at Durham, managed by the ICC, and the PRACE facility Curie based in France at

Figure 2 .
Figure 2.  min values as a function of redshift in Illustris, TNG, and EAGLE. min values as a function of redshift are plotted as orange triangles, green stars, and blue circles for Illustris, TNG, and EAGLE, respectively.The errorbars here are obtained by finding  values that reduce the scatter to within 5% that of the minimized scatter.The gray squares are observational values of  min from M10, Andrews & Martini (2013), and Curti et al. (2020) determined at  ≈ 0 via SDSS (offset from  = 0 for aesthetic purposes).

Figure 3 .
Figure 3. Reduction in scatter for weak FMR versus MZR, strong FMR versus MZR, and weak FMR versus strong FMR.Left: the scatter about the FMR by fitting  min at each redshift individually ( weak ) divided by the scatter in the MZR at each redshift ( MZR ) as a function of redshift.The dark and light gray shaded regions (in all panels) represent 5% and 10% variations, respectively, of each ratio.Centre: Same as left, but now the numerator is the scatter about the FMR evaluated in each redshift bin with a  = 0 calibrated  min ( strong ).Right: Previous two panels divided by each other, the reduction in scatter in the relationship by determining  min at each redshift independently ( weak ) divided by using the  = 0  min value ( strong ).

Figure 5 .
Figure 5. Summary of Key Points.The strong FMR (left, red) is where  min , a parameter tuned to minimise scatter about the MZR, is constant as a function of redshift.Consequently, in thin mass bins, the offsets from the MZR, Δ, as a function of (log) SFR have roughly the same slope at all redshifts (although there is a dependence on the slope of the FMR, see Section 4.1 for more details).The weak FMR (right, blue) is where  min varies as a function of redshift.In this scenario, the individual redshift's have different strengths of correlations between offsets from the MZR and (log) SFR.

Figure A1 .Figure B1 .
Figure A1.Determination of  min as a function of redshift for Illustris (top), TNG (centre), and EAGLE (bottom) for the four different sSFMS variations.

Table 1 .
All  min values at  = 0 − 8 for Illustris, TNG, and EAGLE.These  min values are determined at each redshift individually.The superscripts are the upper limits of the uncertainty while the subscripts are the lower limits.
TGCC, CEA, Bruyèresle-Châtel.AMG and PT acknowledge support from NSF-AST 2346977.KG is supported by the Australian Research Council through the Discovery Early Career Researcher Award (DECRA) Fellowship (project number DE220100766) funded by the Australian Government.KG is supported by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D), through project number CE170100013.RJW acknowledges support from the European Research Council via ERC Consolidator Grant KETJU (no.818930)