Beware the recent past: a bias in spectral energy distribution modelling due to bursty star formation

We investigate how the recovery of galaxy star formation rates (SFRs) using energy-balance spectral energy distribution (SED) fitting codes depends on their recent star formation histories (SFHs). We use the Magphys and Prospector codes to fit 6,706 synthetic spectral energy distributions of simulated massive galaxies at $1<z<8$ from the Feedback in Realistic Environments (FIRE) project. We identify a previously-unknown systematic error in the Magphys results due to bursty star formation: the derived SFRs can differ from the truth by as much as 1 dex, at large statistical significance ($>5\sigma$), depending on the details of their recent SFH. SFRs inferred using Prospector with non-parametric SFHs do not exhibit this trend. We show that using parametric SFHs (pSFHs) causes SFR uncertainties to be underestimated by a factor of up to $5\times$. Although this undoubtedly contributes to the significance of the systematic, it cannot explain the largest biases in the SFRs of the starbursting galaxies, which could be caused by details of the stochastic prior sampling or the burst implementation in the Magphys libraries. We advise against using pSFHs and urge careful consideration of starbursts when SED modelling galaxies where the SFR may have changed significantly over the last ~100 Myr, such as recently quenched galaxies, or those experiencing a burst. This concern is especially relevant, e.g. when fitting JWST observations of very high-redshift galaxies.


INTRODUCTION
Star formation rates (SFRs) and star formation histories (SFHs) are both of great importance to our understanding of galaxies.SFRs are critical since (along with their stellar mass) they enable us to put galaxies in their contemporary context relative to the evolving SFRstellar mass relation (e.g.Noeske et al. 2007).SFHs encode the buildup of stellar mass in the Universe (Dye 2008), potentially yielding insight into the physical processes that drive galaxy evolution (e.g.Madau et al. 1996;Behroozi et al. 2013;Wang et al. 2022).SFRs and other galaxy properties can be inferred for high-redshift galaxies by fitting a model spectral energy distribution (SED) to available data, which can include spectroscopy as well as photometry.There are many freely-available SED fitting codes able to do this (as discussed by e.g.Walcher et al. 2011;Conroy 2013;Baes 2019).
SED models frequently differ in their approach and assumptions in important ways which potentially impact the fidelity of inferred galaxy properties (Hunt et al. 2019, Pacifici et al. 2023).For example, they may employ different stellar population synthesis models (e.g.Bruzual & Charlot 2003, Maraston 2005, Conroy et al. 2009; see also Baldwin et al. 2018), initial mass functions (IMFs; e.g.Salpeter 1955, Kroupa 2001, Chabrier 2003; see also Kroupa & Jerabkova 2019), dust models (e.g.Calzetti 1997, Charlot & Fall 2000, Draine & Li 2007, da Cunha et al. 2008) and SFH forms.These can be ★ E-mail: ph18aai@herts.ac.uk parametric or non-parametric; while the former typically use one of many simple functional forms to represent the SFH (e.g.Behroozi et al. 2013;Simha et al. 2014;Carnall et al. 2019), the latter can be implemented e.g. by partitioning the history into time bins and allocating stellar mass to each bin from a chosen prior distribution as the observations allow (e.g.Leja et al. 2019).
Many SED fitters, including those used in this study, assume balance between the energy absorbed by dust at short wavelengths and that re-radiated in the far-infrared.While this must be true overall, strict equality should not hold for every line of sight since dust attenuation is non-isotropic.Recent observations of spatial offsets between the starlight and dust in high-redshift galaxies (e.g.Hodge et al. 2016Hodge et al. , 2019;;Cochrane et al. 2021) cast doubt on whether these codes are appropriate in such cases.However Haskell et al. (2023, hereafter H23) showed that the performance of Magphys is similar for galaxies with UV/FIR spatial offsets as it is for local galaxies (Hayward & Smith 2015); energy-balance codes remain the gold standard for deriving the properties of galaxies from photometric surveys.
With myriad codes available, with different physics and underlying assumptions, it is critical to validate the performance of each method.As well as the works of this group, which have compared the physical properties inferred using energy-balance SED fitting with Magphys to simulated galaxies for which the 'ground truth' properties are known (Hayward & Smith 2015;Smith & Hayward 2015, 2018;H23), there is a large body of work by other authors aiming to understand how well SED fitting works in different ways (e.g. da Cunha et al. 2008, Lee et al. 2009, Noll et al. 2009, Wuyts et al. 2009, Lee 2010, Pforr et al. 2012, Simha et al. 2014, Mobasher et al. 2015, Hunt et al. 2019, Dudzevičiūtė et al. 2020, Lower et al. 2020, Pacifici et al. 2023, Best et al. 2023).Since validation tests are often accomplished by averaging over a large population of galaxies, there is always danger of "washing out" important details or systematic effects that could impact our understanding of how well SED fitting methods work in real-Universe situations.For example, recent papers have shown that non-parametric models are better able to recover model SFHs with recent sharp changes than fitters using parametric SFHs (hereafter pSFHs; e.g.Leja et al. 2019, Carnall et al. 2019, Suess et al. 2022, Narayanan et al. 2023).
Here, we build on H23 and run the Prospector SED fitting code on the same synthetic galaxy SEDs for which we already have Magphys fits, focusing on how the inferred SFR depends on the recent SFH within the last 100 Myr.Galaxies experiencing rapid changes in SFR over short (∼10s-100s of Myr) timescales have been identified both observationally (e.g.Guo et al. 2016, Faisst et al. 2019, Broussard et al. 2022, Looser et al. 2023, Wang et al. 2023, Woodrum et al. 2023) and in simulations (e.g.Sparre et al. 2017, Iyer et al. 2020, Flores Velázquez et al. 2021, Hopkins et al. 2023, Narayanan et al. 2023, Dome et al. 2024).In the FIRE simulations, massive galaxies at high redshift and dwarfs at all redshifts exhibit very bursty SFHs, with short (∼10-Myr) bursts followed by temporarily quenched periods1 lasting a few 100 Myr; see Sparre et al. (2017) for a detailed analysis and Hayward & Hopkins (2017) and Hopkins et al. (2023) for discussions of the physical origin of this behavior.They thus provide an excellent test data set for exploring the effects of bursty star formation on SED modeling inferences. 2n this work, we demonstrate that Magphys results exhibit a systematic bias in the inferred SFR: the SFR is overestimated (underestimated) for simulated galaxies experiencing a burst (temporarily quenched period).We also show that Prospector SFRs do not exhibit this bias.This Letter is structured as follows.In Section 2 we outline the tools and methods used to create the observations and to fit the SEDs, while in Section 3 we present the results of the fitting, and in Section 4 discuss the implications.In Section 5 we make some concluding remarks.We adopt a standard cosmology with  0 = 70 km s −1 Mpc −1 , Ω  = 0.3, and Ω Λ = 0.7.

METHOD
To provide the synthetic SEDs used for this test, we used four cosmological zoom-in simulations of massive high-redshift galaxies from the FIRE project. 3The halos were first presented in Feldmann et al. (2016), and subsequently resimulated by Anglés-Alcázar et al. ( 2017) using the FIRE-2 code (Hopkins et al. 2018); we use the simulations from that work.The simulations use the code gizmo (Hopkins 2015),4 with hydrodynamics solved using the mesh-free Lagrangian Godunov 'MFM' method.The simulations include cooling and heating from a meta-galactic background and local stellar sources from  ∼ 10 − 10 10 K; star formation in locally self-gravitating, dense, self-shielding molecular, Jeans-unstable gas; and stellar feedback from OB & AGB mass-loss, SNe Ia & II, and multi-wavelength photo-heating and radiation pressure, with inputs taken directly from stellar evolution models.The FIRE physics, source code, and all numerical parameters are identical to those in Hopkins et al. (2018).Snapshots of the halos' central galaxies were taken from 1 <  < 8 at intervals of 15 − 25 Myr.For each snapshot, we compute the true 10 Myr and 100 Myr-averaged SFRs,  10 Myr and  100 Myr , respectively.Following Broussard et al. (2019), we quantify the recent star forming activity using the 'burst indicator'  = log 10 ( 10 Myr / 100 Myr ), such that < 0, galaxy has reduced SFR within the last 10 Myr; ≈ 0, galaxy has had a constant SFR over the last 100 Myr; > 0, galaxy has increased SFR in the last 10 Myr. Figure 1 shows three example SFHs of snapshots used in this study with  ≈ 0 (black line),  < 0 (red line) and  > 0 (blue line).
Cochrane et al. ( 2019) calculated synthetic SEDs for these simulated galaxies using the radiative transfer code SKIRT.5Each snapshot was 'observed' at 7 viewing angles spaced at 30 • intervals, ranging from aligned with the angular momentum vector to anti-aligned.This resulted in 6,706 forward-modeled SEDs.See Cochrane et al. (2019Cochrane et al. ( , 2022Cochrane et al. ( , 2023Cochrane et al. ( , 2024) ) and Parsotan et al. (2021) for further details about the SKIRT calculations and other applications of these and similar simulations/SEDs.We convolved these SEDs with the filter response curves for 18 photometric bands spanning observed-frame wavelengths 0.39 < /m < 500 (we refer the interested reader to H23 for further details) assuming a signal-to-noise ratio of 5 in every band (following Smith & Hayward 2018).We fit the synthetic SEDs using two SED fitting codes, Magphys (da Cunha et al. 2008, hereafter DC08) and Prospector (Leja et al. 2017, 2019, Johnson et al. 2021), fixing the redshift to the true value (i.e.photometric redshift errors do not contribute to the bias demonstrated here).We use the high-redshift version of Magphys (da Cunha et al. 2015) which builds model UV-FIR SEDs by linking a stellar library containing 50,000 pre-computed SFHs with those which satisfy the energy balance criterion among another pre-computed library of 25,000 Charlot & Fall (2000) dust models; see DC08 and H23 for detailed discussions.By calculating the  2 goodness-of-fit parameter between the observed photometry and every combination which satisfies the energy balance, Magphys is able to marginalise over the prior distribution to estimate posterior probability distributions for each property in the model.Magphys assumes an initial mass function (IMF) from Chabrier (2003), and stellar models from Bruzual & Charlot (2003) for metallicity between 0.02 and 2 times solar.
For the Prospector fits, we followed the process described in For both fitters, we consider an SED fit to be acceptable if the best fit  2 is below the 99 per cent confidence limit taken from standard  2 tables with the degrees of freedom calculated following Smith et al. (2012, who calculated this only for Magphys; here we assume that it applies equally well to the Prospector results, though this is perhaps unlikely to be true given the differences between the two sets of models).Throughout the subsequent analysis, we consider only those SEDs with satisfactory fits, though our conclusions are qualitatively unchanged if we also consider the "bad fits".
Based on these analyses, we obtained five sets of SFR estimates for each snapshot.The Magphys SFRs assume a fixed parameterisation of the SFH based on a parametric model overlaid with random starbursts (as detailed in DC08).In addition, we have four sets of results from Prospector, assuming different SFH priors: parametric, Dirichlet, continuity, and bursty continuity.The parametric prior assumes a delayed exponentially declining SFH, and does not account for bursts. 6The Dirichlet prior assumes that the fractional specific star formation rate (sSFR) in each time bin follows a Dirichlet distribution, with a concentration parameter, , that controls the shape of the SFH (Leja et al. 2017(Leja et al. , 2019)), and which we set to unity.The continuity prior directly fits for Δ log(SFR) between neighbouring time bins using a Student's -distribution with scale factor  = 0.3 and two degrees of freedom.This prior discourages abrupt changes in SFR between adjacent bins but remains flexible enough to fit both star-forming and quenched galaxies (Leja et al. 2019;Johnson et al. 2021).Following Tacchella et al. (2022), we also consider the bursty continuity prior -a modified form of the continuity prior with  = 1 and  = 2, which allows greater variability in the SFH.For each of these priors, we adopt the following binning scheme: the first two and the last bins are kept the same for all sources, covering 0 <   < 10 Myrs, 10 <   < 100 Myrs, and 0.85  <   <   , respectively.Here,   represents the age of the Universe at the object's redshift, and   is the lookback time.The remaining time between 100 Myrs and 0.85  is evenly spaced in logarithmic time.The number of bins is chosen to ensure that log 10 (bin width/Gyr) > 0.02, ranging from eight bins (for the lowest redshifts in our simulations), to five (for the highest).To quantify the fidelity of the 100 Myr averaged SFR inferred using Magphys and Prospector, following H23, we use the residual between the inferred SFR and the true value from the simulation: Δ log 10  = log 10 ( inferred ) − log 10 ( true ).

RESULTS
Table 1 shows the fit success rate and the average SFR residual for each of the five runs.It is immediately apparent that Prospector is able to produce acceptable fits for a larger fraction of the snapshots (> 90 per cent fit success rate irrespective of the prior assumed) than Magphys (83 per cent).This may reflect the effect that the Magphys pre-computed libraries -coupled with the requirement that it consider only SFHs shorter than the age of the Universe at a given redshift -result in the priors becoming increasingly poorly-sampled at higher redshifts, and acceptable Magphys fits increasingly hard to obtain at earlier epochs.Secondly, we note that the typical SFR uncertainties for the non-parametric runs shown in Table 1 are significantly larger than those yielded by the pSFH runs (with both Magphys and Prospector) and we will return to this point below.These details notwithstanding, it is clear from Table 1 that there is no significant difference in the overall Δ log 10  once the uncertainties are considered, meaning that both codes do a similarly acceptable job of recovering the true SFR on average.
We now investigate the fidelity of the Magphys and Prospector SFR estimates as a function of the recent SFH.In Figure 2, we plot Δ log() against , with Magphys results shown in the lefthand panel, Prospector pSFH results in the centre panel, and the Prospector bursty continuity prior results in the right-hand panel.The axes are chosen such that 'starburst' snapshots appear to the right of each plot, and 'temp-quenched' ones to the left.The values for individual SEDs are shown as points, with red lines showing a running average of Δ log 10 () for 100 data points.The red areas enclose the 16th and 84th percentiles of the inferred SFR PDF using the same running average, while the grey shading encloses the 16th and 84th percentiles of the scatter within each corresponding subsample.
Two effects are immediately apparent.Firstly, there is a systematic trend in the Magphys SFRs, such that while they are typically overestimated by ∼ 0.4 dex at  > 0.5 (the 'starburst' snapshots), the converse is true at  < −0.5 (the 'temp-quenched' snapshots) where Magphys typically underestimates the true SFRs by ∼ 0.3 dex.This is clearly of concern, since we know that both 'starburst' and 'tempquenched' galaxies exist in the real Universe (e.g.Guo et al. 2016;Faisst et al. 2019;Broussard et al. 2022;Looser et al. 2023) and this effect could result in a systematic error in the SFRs of galaxies that could be 1 dex in magnitude in the worst cases.Secondly, and underscoring the severity of the first issue, there is a clear tendency for the typical pSFH uncertainties to underestimate the degree of scatter visible in the data points (grey shading).This effect becomes increasingly noticeable as || increases.
The equivalent Prospector bursty continuity plot shows the SFR The centre and RH panels show the same obtained using Prospector with the pSFH and bursty continuity SFH prior.In all panels the red line shows the mean residual averaged over 100 data points with the red shaded area enclosing the averaged 16th and 84th percentile residuals from the PDF.The grey shaded area indicates the 16th and 84th percentiles of the scatter in Δ log 10  within each averaged point, and the blue markers show values for individual SEDs.The Magphys SFRs exhibit a systematic bias: the SFR tends to be underestimated for 'temp-quenched' galaxies and overestimated for galaxies that have experienced a burst within the past 10 Myr.
recovery is approximately independent of  with reasonable uncertainties at  true < 0.5 (where ∼ 90 per cent of the snapshots lie).At these , the ∼ 0.3 dex offset in the Prospector SFRs relative to the ground truth is similar to that returned by Magphys at  < −0.5; however, in the Prospector non-parametric results the deviation from zero is not statistically significant due to the realistic error bars (and we obtain similar results with the other non-parametric SFHs).

DISCUSSION
Fundamental to SED fitting codes' abilities to recover the true properties of galaxies are (i) wavelength sampling (including range covered, and resolution), (ii) priors, and (iii) sensitivity of the available observations.The first of these is closely related to the extent of our ability to estimate  using SED fitting (since different wavelengths have different "response functions" between the luminosity and the SFH; e.g.Kennicutt & Evans 2012;Sparre et al. 2017).In this work we focus solely on our ability to recover 100 Myr-averaged SFRs using an example 18 bands of photometry similar to that used in Smith et al. (2021) and Best et al. (2023) to fit galaxies in the LOFAR deep fields (Tasse et al. 2021;Sabater et al. 2021;Kondapally et al. 2021;Duncan et al. 2021).This good wavelength sampling, in addition to the fact that we have not added noise to photometry, means that this test represents a best-case scenario: the bias that we have identified is fundamental, not due to noise or poor SED sampling, and thus cannot be addressed with 'better data'.
In the Magphys model, the priors are immutable since it uses precomputed SFH libraries with fixed sampling.The high- Magphys SFHs assume a delayed exponential parametric form, with random bursts superposed in such a manner that during the previous 2 Gyr, 75 per cent of SFHs include a burst lasting between 30-300 Myr and forming between 0.1 and 100 times the stellar mass formed by the underlying distribution (da Cunha et al. 2015).The  distribution for the Magphys high- SFHs is shown in Figure 3, overlaid with the distribution of values for the true SFHs of the simulated galaxies and for the assumed Prospector bursty continuity prior.It is clear that both the Magphys libraries and the Prospector bursty continuity prior allow for a significantly broader range of  values than the simulations contain, so the prior on  is not the source of the bias.
In Section 3 we noted that the pSFH SFR uncertainties are increasingly underestimated as || moves away from 0. Alongside the similarity in the mean Δ between the Prospector pSFH and nonparametric plots, this leads us to speculate that while the choice of prior may affect the uncertainties and crucially the statistical significance of the systematic trend, the cause of the largest upturn in the Magphys Δ log 10  for the most bursty galaxies lies elsewhere.

CONCLUSIONS
We have used four cosmological zoom-in simulations spanning 1 <  < 8, together with the radiative transfer code SKIRT, to generate 6,706 synthetic galaxy SEDs.We attempted to recover their true SFRs based on fitting the synthetic observations using Magphys and Prospector.Although there is little difference between the fidelity of the recovered  100 Myr values averaged across all of the simulated recovery tests using the two codes, we find that the accuracy of the Magphys-inferred SFRs is strongly dependent on the recent SFH.This trend is sufficiently large that the Magphys 100 Myr-averaged SFRs of individual galaxies could be overestimated (underestimated) by as much as 1 dex in the worst cases for galaxies experiencing starbursts (temporarily quenched periods).The trend is not evident in the 100 Myr SFRs from Prospector, and is significantly weakened in Prospector results assuming parametric SFHs (pSFHs).We therefore speculate that this effect could be due to Magphys's burst recipe or the use of pre-defined libraries with fixed parameter sampling which may be unsuitable for some galaxies.Given this bias, and the significantly underestimated SFR uncertainties produced using pSFHs, we urge caution when employing pSFHs and/or pre-defined libraries to study the SEDs of observed galaxies that are likely to have bursty SFHs, such as high-redshift galaxies observed with JWST.

Figure 1 .
Figure 1.The recent SFHs (past 200 Myr) of three example simulated galaxies, showing a range of  values.The dark grey band encloses the previous 10 Myr, while the lighter band encloses the past 100 Myr;  is the ratio of the SFRs averaged over those two time periods.The black line shows an example of constant star formation with  ≈ 0, while the red line shows a 'temporarily quenched' case ( = −0.64).The blue line shows an example simulated galaxy observed during a burst, where the star formation activity averaged over the previous 10 Myr has increased compared to the average over the previous 100 Myr (giving  = 0.88).
Das et al., in preparation.Prospector uses the Flexible Stellar Populations Synthesis (FSPS; Conroy et al. 2009) code assuming a Kroupa (2001) IMF and solar metallicity.Dust attenuation is modelled using the two-component Charlot & Fall (2000) model and the Kriek & Conroy (2013) attenuation curve.The three-parameter Draine & Li (2007, hereafter DL07) dust emission templates are used to model the shape of the IR SED.The dynamic nested sampling library Dynesty (Speagle 2020) is used to estimate Bayesian posteriors and evidences.

Figure 2 .
Figure2.The LH panel shows the difference between the inferred and true SFRs averaged over the last 100 Myr, Δ log 10  = log 10 (  inferred ) − log 10 (  true ), as a function of the burst indicator  ≡ log 10 (  10 Myr / 100 Myr ).The centre and RH panels show the same obtained using Prospector with the pSFH and bursty continuity SFH prior.In all panels the red line shows the mean residual averaged over 100 data points with the red shaded area enclosing the averaged 16th and 84th percentile residuals from the PDF.The grey shaded area indicates the 16th and 84th percentiles of the scatter in Δ log 10  within each averaged point, and the blue markers show values for individual SEDs.The Magphys SFRs exhibit a systematic bias: the SFR tends to be underestimated for 'temp-quenched' galaxies and overestimated for galaxies that have experienced a burst within the past 10 Myr.

Figure 3 .
Figure 3.The distributions of  values for 50,000 SFHs in the Magphys high- libraries (in blue), overlaid with the ground truth values from the simulations (hatched) and for the Prospector bursty continuity prior in red .Note the significantly larger  range relative to Figure 2.

Table 1 .
SED fit success rate and typical SFR estimate fidelity, Δ log 10 ( ), for the Magphys and Prospector fits to our synthetic observations.For the four sets of Prospector fits, the different SFH priors are indicated.