Energy balance SED modelling can be effective at high redshifts regardless of UV-FIR offsets

Recent works have suggested that energy balance spectral energy distribution (SED) fitting codes may be of limited use for studying high-redshift galaxies for which the observed ultraviolet and far-infrared emission are offset (spatially `decoupled'). It has been proposed that such offsets could lead energy balance codes to miscalculate the overall energetics, preventing them from recovering such galaxies' true properties. In this work, we test how well the SED fitting code Magphys can recover the stellar mass, star formation rate (SFR), specific SFR, dust mass and luminosity by fitting 6,706 synthetic SEDs generated from four zoom-in simulations of dusty, high-redshift galaxies from the FIRE project via dust continuum radiative transfer. Comparing our panchromatic results (using wavelengths 0.4-500$\mu$m, and spanning $1<z<8$) with fits based on either the starlight ($\lambda_\mathrm{eff} \le 2.2\,\mu$m) or dust ($\ge 100\,\mu$m) alone, we highlight the power of considering the full range of multi-wavelength data alongside an energy balance criterion. Overall, we obtain acceptable fits for 83 per cent of the synthetic SEDs, though the success rate falls rapidly beyond $z \approx 4$, in part due to the sparser sampling of the priors at earlier times since SFHs must be physically plausible (i.e. shorter than the age of the Universe). We use the ground truth from the simulations to show that when the quality of fit is acceptable, the fidelity of Magphys estimates is independent of the degree of UV\FIR offset, with performance very similar to that previously reported for local galaxies.


INTRODUCTION
Spectral energy distribution (SED) fitting offers a powerful method of estimating galaxy physical properties from photometry.SED fitting programs take as input the available photometry, which can be > 30 bands in the best studied fields to < 10 elsewhere, then use models of varying complexity to infer the shape of the full SED and hence the underlying physical properties (for an introduction to SED fitting see e.g.Walcher et al. 2011;Conroy 2013).
The energy balance code Magphys (da Cunha et al. 2008 -hereafter DC08) performs  2 fitting using two sets of pre-built libraries of model SEDs with a representative range of SFHs and dust models for star-forming galaxies.The energy balance criterion works in such a way that Magphys considers only combinations of SFH and dust emission that are energetically consistent, in the sense that the energy absorbed by dust in the rest-frame UV is re-radiated in the FIR.During the fit, Magphys finds the SFH and dust model that best fits the data, and calculates probability density functions (PDFs) for a variety of property values by marginalising over all of the models which satisfy the energy balance criterion.
To determine the fidelity of the properties derived from SED fitting, three testing techniques have been used in previous studies.The first is to compare the derived physical parameters to those derived ★ E-mail: ph18aai@herts.ac.uk using simpler methods.DC08 tested how well Magphys could fit observations from the Spitzer Infrared Nearby Galaxy Survey (SINGS Kennicutt et al. 2003), producing acceptable best-fit  2 results for 63 of the 66 galaxies.They also tested how well Magphys could recover the properties of 100 of its own, randomly selected, models with noise added to the photometry.Here,  star , SFR and  dust were reported to be recovered to a high degree of accuracy.Similarly, Noll et al. (2009) tested the alternative energy balance SED fitting code CIGALE (Boquien et al. 2019) using the SINGS galaxies, replacing DC08's UBV observations with those from Muñoz-Mateos et al. (2009).Here,  dust estimates compared well (±0.03dex) with those derived by Draine et al. (2007), similarly the SFR estimates compared well (0.06 ± 0.05 dex) with those provided by Kennicutt (1998b) based on H emission (e.g.Kennicutt 1998a).
An alternative testing technique is to compare the results of different fitting programs when applied to the same dataset.This will not provide evidence that the results are correct, but does give confidence that a given code performs similarly to others.Best et al. (2022 -in preparation) tested three energy balance based fitters -Magphys, CIGALE and BAGPIPES (Carnall et al. 2018) -together with AGNfitter (Calistro Rivera et al. 2016).The four codes were each used to estimate  star and SFR for galaxies in the Boötes, Lockman Hole and ELAIS-N1 fields of the LOFAR Two Metre Sky Survey (LoTSS Shimwell et al. 2017) deep fields first data release (Duncan et al. 2021, Kondapally et al. 2021, Sabater et al. 2021and Tasse et al. 2021).The results of the runs were compared to determine how well they agreed with each other.For galaxies with no AGN, Magphys, CIGALE and BAGPIPES typically agreed to within 0.1 dex for stellar mass, with AGNfitter differing by 0.3 dex.Similar levels of agreement were found for the SFRs of galaxies found not to contain an AGN.For galaxies with an AGN the situation was more mixed as neither Magphys nor BAGPIPES are designed to handle AGN emission.Hunt et al. (2019) compared the results of applying Magphys, CIGALE and Grasil (Silva et al. 1998) to a sample of 61 galaxies from the Key Insights on Nearby Galaxies: a Far-Infrared Survey with Herschel (KINGFISH) survey (Kennicutt et al. 2011), including 57 of the SINGS galaxies.They found that stellar masses estimated using 3.6m luminosity agreed with all three codes to within 0.2 dex.Similarly, SED derived SFR estimates were within 0.2 dex of those derived using FUV+TIR luminosities and  + 24 luminosities.The results for  dust were more mixed, with Grasil giving values 0.3 dex higher than Magphys or CIGALE or the value determined using a single temperature modified black body.A similar approach with an even broader selection of fourteen SED fitting codes was taken by Pacifici et al. (2023), who found agreement on stellar mass estimates across the ensemble, but some discrepancies in their SFR and dust attenuation results.More recently, Cheng et al. (2023) used a modified version of Magphys (Magphys+photo-z; Battisti et al. 2019) to determine the photometric redshifts of 16 sub-millimetre galaxies (SMGs).The results were compared to the redshifts derived using EAZY (Brammer et al. 2008), finding that for most sources the results were consistent.
The final, and perhaps most promising technique for validating SED fitting is to use simulated galaxies where the 'right' answer is known in advance.Wuyts et al. (2009) used the HYPERZ (Bolzonella et al. 2000) SED fitting code on GADGET-2 (Springel 2005) simulations to recover mass, age, E(B-V) and   under a variety of conditions.They concluded that recovery of properties for ellipticals was generally good (residuals between 0.02 and 0.03 dex) with slightly poorer results for disks (residuals of 0.03 to 0.35 dex), with residuals increasing further to 0.02 to 0.54 dex during periods of merger-triggered star formation.Hayward & Smith (2015, hereafter HS15) used Magphys on two GADGET-3 (Springel 2005) simulations of an isolated disk and a major merger of two disk galaxies at  = 0.1.Snapshots were taken at 10 Myr intervals and the radiative transfer code SUNRISE (Jonsson 2006) used to produce observations from 7 different lines of sight around the simulation.In both scenarios, the attenuated SED was recovered with an acceptable fit ( 2 within the 99 per cent confidence threshold; see Smith et al. 2012 for details) except for the time around the peak starburst/coalescence phase of the merger simulation.In both scenarios,  dust was recovered well with  star recovered to within 0.3 dex and SFR within 0.2 dex. dust was recovered less well, but still within 0.3 dex for the isolated galaxy and 0.5 dex for the merger.The conclusion from this study is that these properties of local galaxies can typically be recovered to within a factor of 1.5 -3.Smith & Hayward (2018) studied a resolved simulated isolated disk, using spatial resolution as fine as 0.2 kpc.They found that Magphys produced statistically acceptable results for  star ,  dust , SFR, sSFR and   for over 99 per cent of pixels within the r-band effective radius.At higher redshifts, Dudzevičiūtė et al. (2020, hereafter D20), used EAGLE (Schaye et al. 2015, Crain et al. 2015) simulations with SKIRT generated photometry (Baes et al. 2011, Camps & Baes 2020) to validate the performance of Magphys for studying galaxies with redshifts up to 3.4.They found that Magphys gave a remarkably linear correlation with the true (simulated) values, though with significant scatter (at the level of 10, 15 and 30 per cent for the dust mass, SFR and stellar masses, respectively) and significant systematic offsets (of up to 0.46 ± 0.10 dex for the recovered stellar mass).
These studies all provide evidence that SED fitting, particularly energy balance SED fitting, is working remarkably well and providing results often consistent with the ground truth once the uncertainties are accounted for.
However, several authors have questioned whether using an energy balance criterion is appropriate when viewing galaxies for which the UV and FIR are spatially offset from one another (e.g.Casey et al. 2017;Miettinen et al. 2017;Simpson et al. 2017;Buat et al. 2019).In such cases, while 'energy balance' is still expected overall (i.e.energy conservation is presumably not violated), significant spatial decoupling may lead to difficulties in recovering the true properties.Under such circumstances, the attenuation -and thus the intrinsic UV luminosity -may be underestimated because the UV-bright, relatively dust-free regions can result in a blue UV-optical slope even if the bulk of the young stars are heavily dust-obscured.
This concern has recently become testable with the sub-arcsecond resolution provided by the Atacama Large Millimetre/submillimetre Array (ALMA)1 , enabling direct observation of UV/optical and FIR offsets.There are now numerous papers reporting spatial offsets.Hodge et al. (2016), Rujopakarn et al. (2016), Gómez-Guĳarro et al. (2018) and Rujopakarn et al. (2019) have discovered kpc offsets between star forming regions and centres of stellar mass while investigating the star formation and dust distributions in 2 <  < 4.5 galaxies.Along these lines, Chen et al. ( 2017) found a significant offset in ALESS67.1, a SMG at  = 2.12, Cochrane et al. (2021) reported the same in the massive star-forming galaxy SHiZELS-14 at  = 2.24, and Bowler et al. (2018) detected a 3 kpc offset between the rest-frame FIR and UV emission in the Lyman-break galaxy ID65666 at  ≈ 7.
The concern over the impact of decoupling between the dust and starlight is such that new SED fitting codes such as MICHI2 (Liu 2020) and Stardust (Kokorev et al. 2021) mention the absence of energy balance as a key advantage in favour of using these codes for studying galaxies where spatial offsets are likely to be a factor.In Liu et al. (2021), MICHI2 produced results very similar to Magphys and CIGALE for a sample of high redshift galaxies, with stellar mass and dust luminosity estimates obtained to within 0.2 -0.3 dex of those obtained using the two energy-balance codes.Similarly, Kokorev et al. (2021) used Stardust to fit 5,000 IR bright galaxies in the GOODS-N and COSMOS fields, producing results which compared well with those derived using CIGALE with a mean  dust residual of 0.09 dex, a mean  IR residual of 0.2 dex and a mean  star residual of 0.1 dex (albeit with a significant scatter of 0.3 dex).
An additional test of the likely impact of spatial offsets was conducted by Seillé et al. (2022), who used the CIGALE code to model the Antennae Galaxy, Arp244, which is known to have very different UV and IR distributions (Zhang et al. 2010).Seillé et al. (2022) found that the total stellar mass and SFR were consistent, whether they attempted to fit the integrated photometry of the galaxy or sum the results of fitting 58 different regions of Arp244 independently and summed the results (i.e.performance very similar to that found by Smith & Hayward 2018 for simulated galaxies without spatial offsets).
In this context, we now seek to further test the efficacy of energy balance SED fitting for these more challenging dusty, high redshift, star-forming galaxies by using high-resolution simulations with differing degrees of spatial offset between the apparent UV/FIR emission.
This paper is structured as follows.Section 2 describes the tools and methods used to create the observations and to fit the SEDs; Section 3 presents the results of the fitting including the derived values for several galaxy properties; Section 4 discusses these in the context of previous papers and Section 5 summarises the conclusions.Throughout this work we adopt a standard cosmology with  0 = 70 km s −1 Mpc −1 , Ω  = 0.3, and Ω Λ = 0.7.

METHOD
This section describes the simulation data and the creation of the synthetic observations.It also provides a brief introduction to Magphys, details of the simulations, and how they were subsequently analysed.

Computing the SEDs of simulated galaxies
We analyze a set of 4 cosmological zoom-in simulations from the FIRE project2 that were run using the FIRE-2 code (Hopkins et al. 2018) down to  = 1.The simulations use the code GIZMO (Hopkins 2015)3 , with hydrodynamics solved using the mesh-free Lagrangian Godunov "MFM" method.Both hydrodynamic and gravitational (force-softening) spatial resolution are set in a fully-adaptive Lagrangian manner with fixed mass resolution.The simulations include cooling and heating from a meta-galactic background and local stellar sources from  ≈ 10 − 10 10 K; star formation in locally selfgravitating, dense, self-shielding molecular, Jeans-unstable gas; and stellar feedback from OB & AGB mass-loss, SNe Ia & II, multiwavelength photo-heating and radiation pressure with inputs taken directly from stellar evolution models.The FIRE-2 physics, source code, and all numerical parameters are exactly identical to those in Hopkins et al. (2018).
The specific sample of simulations studied in this paper include the halos first presented in Feldmann et al. (2016).The FIRE-2 simulations for these halos were introduced, along with a novel onthe-fly treatment of black hole seeding and growth in Anglés-Alcázar et al. (2017).These halos were chosen because they are representative of the high-redshift, massive, dusty star-forming galaxies found in infrared-selected observational samples, Cochrane et al. (2019) showing that they present a clumpy dust distribution together with very different morphologies for stellar mass, dust, gas and young stars.At  = 2, the galaxies central to the halos have half-light radii of 0.73, 0.98, 0.81 and 0.91 kpc; for additional information on these galaxies see Anglés-Alcázar et al. (2017) as well as Cochrane et al. (2019), Wellons et al. (2020), Parsotan et al. (2021) and Cochrane et al. (2022).
To generate synthetic SEDs, Monte Carlo dust radiative transfer was performed on each time snapshot of the simulated galaxies in post-processing using the code SKIRT4 .SKIRT assigns single-age stellar population SEDs to star particles in the simulations according to their ages and metallicites.It then propagates photon packets through the simulated galaxies' ISM to compute the effects of dust absorption, scattering, and re-emission.Snapshots of the galaxies' evolution were taken at 15 -25 Myr intervals with each galaxy 'observed' from 7 positions that uniformly sampled inclination angles from view 0 (aligned with the angular momentum vector) in steps of 30 • to view 6 (anti-aligned).For full details of the SKIRT calculations, see Cochrane et al. (2019Cochrane et al. ( , 2022)).This procedure yielded 6,706 SEDs across the four simulated galaxies, spanning 1 <  < 8.
To compute photometry from the SEDs, we convolved the SEDs with appropriate filter response curves for the 18 bands listed in Table 1.These filters were chosen for similarity with previous work in the LoTSS deep fields (e.g.Smith et al. 2021), providing good coverage of the spectrum from the UV to the FIR with which to test how Magphys performs in these idealised conditions.Figure 1 shows the filter coverage for an example SED at z = 1, along with the emergent SED generated by SKIRT.
Figure 2 examines the relationship between the properties of our simulated galaxies and those of high redshift sub-millimetre galaxy populations in which spatial UV-FIR offsets have been observed.We compared four properties with observations, specifically the SFR relative to the galaxy main sequence (MS; upper left panel), the relationship between sub-mm flux density and  dust (upper right), the degree of  band extinction (lower left), as well as the magnitude of the UV/IR offsets (lower right) in relation to studies in the literature.In the upper left panel we have compared the SFR in each snapshot with the MS parameterisation from Schreiber et al. (2015) modified for our adopted Chabrier (2003) IMF using the method of Madau & Dickinson (2014), as a function of redshift.The magenta band indicates the typical ± 0.3 dex scatter associated with the MS (e.g.Tacchella et al. 2022).The simulated galaxies lie either on or above the MS in the vast majority of cases, and are therefore consistent with dusty, star forming galaxies.The upper right panel of Figure 2 shows the sub-millimetre flux density,  870 , as a function of the dust mass for the simulated galaxies and for the SMGs published in D20.While the simulations do not occupy the parameter space of the brightest SMGs, there is significant overlap, and they do lie along the same submm/dust mass relationship (see Hayward et al. 2011, Cochrane et al. 2023).The lower left panel shows how the -band extinction (  ) for the simulations (the blue solid line indicates the median, with shading indicating the values enclosed by the 16th and 84th percentiles of the distribution at each redshift) compares with the corresponding values for the SMG samples from D20 (in purple) and Hainline et al. (2011, indicated by the red points with error bars).Although the D20 sample is on average more obscured than our simulations, similarity to the Hainline et al. (2011) SMGs is evident.The lower right panel shows the range of offsets between the UV and FIR emission in redshift bins.The solid lines indicate the mean simulated offset (blue for peak-to-peak, red for light-weighted mean), with shaded regions indicating the area enclosed by the 16th and 84th percentiles at each redshift.The black, red and green symbols indicate ALMA sources from Rujopakarn et al. (2016), andRujopakarn et al. (2019) and Lang et al. (2019).Finally, the short green line marks the mean offset from Lang et al. (2019) over 20 SMGs with 1.6 <  < 2.5.
To summarise, Figure 2 demonstrates that the simulated sources are predominantly dusty star-forming galaxies.While the D20 SMG sample is more extreme, the degree of extinction and the magnitude of the UV-FIR spatial offsets in the simulations show significant overlap with values published in the literature.The simulations are therefore a useful testing ground for determining the extent of our ability to recover the true properties of galaxies with plausible UV-FIR offsets using Magphys.An example SED obtained using Magphys, demonstrating the generally close agreement between the true and Magphys-derived SEDs.In the upper panel, the solid black line shows the best-fit Magphys-derived SED, while the dashed black line indicates the Magphys estimate of the unattenuated SED; the solid blue line represents the attenuated SED generated by SKIRT.The square markers represent the best-fit photometry, with the SKIRT photometry shown as the points with error bars (as described in the legend).The coloured lines above the lower horizontal axis show the normalised filter curves used in this study.The lower panel shows the resifdual value in  units between each observation and the best-fit SED.The residual value is calculated as (observed flux -model flux)/observed error.This SED corresponds to simulated galaxy A1, snapshot 276, view 0, z=1.00.

Magphys
Magphys is an SED modelling code using Bayesian inference to derive best-fit SEDs as well as estimates (best-fit, median likelihood, and probability distribution functions) for a wide range of galaxy properties.A full description can be found in DC08 and da Cunha et al. ( 2015), but we include a brief overview.Magphys uses two libraries of model galaxies: the first, the library of star-formation histories (SFH), consists of 50,000 models each comprising a UV/optical SED and associated galaxy properties; the second, the dust library, comprises 25,000 models each with an IR SED and associated properties.
The SFH library is built using the IMF of Chabrier (2003) and the stellar population synthesis (SPS) model of Bruzual & Charlot (2003).Exponentially declining star formation histories are superposed with random bursts, in such a way that a burst of star formation has occurred in half of the SFH library models within the last 2 Gyr.
Common to both libraries is the use of the Charlot & Fall (2000) two-component dust model.In this model, stellar populations younger than 10 Myr are attenuated by a greater amount than older stellar populations, under the assumption that these young stars are still embedded within their 'birth clouds'.These stellar populations are subject to a total optical depth  BC +  ISM , whereas older populations 'see' an optical depth of only  ISM , from the diffuse ISM.Charlot & Fall (2000)  where τ is the total optical depth for , τ  is the optical depth of the birth clouds and τ  is the optical depth of the ISM.These latter two are defined in Magphys such that: τ where τ is the mean  band optical depth and  represents the fraction of τ arising from the ISM.The dust library is built from three main components: emission from very small grains (< 0.01 m) which can reach high temperatures if they absorb a UV photon; large grains (between 0.01 − 0.25 m) in thermal equilibrium with the interstellar radiation field; and polycyclic aromatic hydrocarbons (PAHs) which are responsible for emission line features in the mid-infrared.The contribution of each component to the SEDs of the birth clouds and the ISM is chosen to broadly reproduce the range of SEDs found in nearby star-forming galaxies.The total IR SED is then modelled as the sum of the ISM and the birth cloud components.
The SFH and dust libraries are linked together in such a manner that the starlight absorbed by dust at short wavelengths is re-radiated at longer wavelengths, i.e. the energy is balanced.During the fit, as well as ensuring that energy conservation (i.e.energy balance) is satisfied by construction (i.e. the luminosity absorbed by dust equals that emitted by dust), Magphys combines those models in the optical library with those in the IR library that have similar contributions from dust in the ISM to the overall dust energy budget (the fraction of luminosity absorbed by the diffuse ISM component and that emitted by the diffuse ISM component, respectively).This is parameterised in Magphys using the   parameter; in the high-redshift version used in this work, values for the SFH and dust libraries must have Δ   < 0.2 for the combination to be acceptable.In this way, each galaxy is fitted against a wide variety of 'empirical but physically-motivated' (DC08) SFHs and dust content.By calculating the best-fit  2 for each model combination that satisfies the conditions, a likelihood function is built for each galaxy property by assuming that  ∝ exp ( −  2 2 ).When all combinations of models in the libraries have been processed, a PDF is produced for each property by marginalising the individual likelihoods.Magphys outputs a pair of files for each fitted galaxy: one containing the best-fit SED (an example of both the attenuated and unattenuated versions are shown, alongside the model photometry in Figure 1), while the other contains the best-fit model values and the PDFs.This study uses the high-redshift version of Magphys (da Cunha et al. 2015), which differs from the low-redshift version in two important ways: firstly, the prior distributions are modified to include higher dust optical depths, higher SFRs and younger ages; secondly, the effects of absorption in the inter-galactic medium (IGM) at UV wavelengths are taken into account.
Some studies have sought to determine the extent to which AGN can influence the results of SED fitting (e.g.HS15, Best et al., in preparation).However, neither the simulations nor the SED fitting code used in this paper include AGN, and so this important aspect will not be discussed further.

Processing the data
To test how well Magphys is able to recover the intrinsic properties of the simulated galaxies, we ran Magphys four times on each synthetic SED, using different combinations of photometry and assumed redshift: • Run A -used all 18 filters; • Run B -used all 18 filters, but with all SEDs shifted to a redshift of 2. This run was used as a comparison to detect any bias in the results due to redshift effects.This is discussed in section 4.1; • Run C -used only the UV to near-IR filters ( -); • Run D -used only the FIR filters (PACS 100 m -SPIRE 500 m).
Runs C & D are discussed in section 3.2.3.We assumed a signal-tonoise ratio of 5 in every band, following Smith & Hayward (2018).
One of the key aims of this work is to determine how Magphys performs when analyzing galaxies for which the observed UV and FIR emission are spatially 'decoupled.'To do this, we characterise the offset between the UV and FIR emission in three different ways: (i) the peak to peak offset: this is defined as the distance in parsecs between the points of maximum flux in the UV (0.3 m) and FIR (100 m) images; (ii) the light-weighted mean offset: this is defined as the distance in parsecs between the light-weighted centres for the UV (0.3 m) and FIR (100 m) emission.
(iii) the Spearman rank coefficient (Myers & Well 2003) comparing the degree of correlation between the UV (0.3 m) image and the FIR (100 m) image.A Spearman rank coefficient of  > 0.8 is In each panel, the base of the red vector is positioned at the peak FIR emission and the head at the peak UV emission, the base of the black vector is positioned at the light-weighted mean FIR emission and the head at the light-weighted mean UV emission.The title of each plot gives the galaxy name along with redshift, best-fit  2 and Spearman  value.
considered necessary for a strong correlation.Spearman also returns a  value indicating a correlation confidence level, 99 per cent of our results returned  values indicating that the probability of the reported correlation being due to chance was < 0.0001.The images were filtered to allow only the data points with intensity above the 80th percentile in either the UV or FIR images to be included in the analysis.This was done to avoid the comparatively very large number of low intensity pixels from unduly dominating the result.The 80th percentile was chosen as a reasonable value after comparing the results using different percentile values of the UV and FIR images by eye.
The three proxies were each calculated using the rest-frame UV and FIR maps for each snapshot and view to provide values that would be possible using real observational data with high enough spatial resolution and sensitivity.As an example, Figure 3 shows two images of the simulated galaxy A1 in the later stages of its evolution, other examples can be seen in Cochrane et al. 2019.The image on the left shows a significant offset between the UV (shown as the blue image) and FIR (shown as contours) intensity, while in the right image (which has the same colour scheme) the UV and FIR appear almost coincident.In both panels the red vectors show the peak-to-peak offset, while the black vectors show the lightweighted offset.The Spearman  value is given in the title of each panel.We also calculated the offsets using the projected maps of the simulated young stars (age < 10 Myr) and dust; however, there was no significant difference in the results and so the observed offsets are used throughout this paper.
In the following sections, where we compare derived values to true (simulated) values these are expressed as residuals in dex between the 50th percentile of the derived value's likelihood function and the true value:

RESULTS
In this section we present results from the four runs described in Section 2.3.In all runs a successful fit was defined as one where the  2 value was equal to or below the 99 per cent confidence limit ( 2 max ), this was taken from standard  2 tables.The number of degrees of freedom was calculated as in Smith et al. (2012), which perturbed the output best-fit SEDs from Magphys with random samples from the standard normal distribution and found that it depended on the number of bands in the manner shown in Appendix B of that work.We are using the same Magphys model and have assumed that the relation does not vary with the particular choice of bands or the redshifts of the sources being studied.

The fraction of mock observations with acceptable fits
From run A we find that Magphys achieved a statistically acceptable fit (i.e. 2 ≤  2 max ) for 83 per cent (5,567 out of 6,706) of the snapshots.Note that the value of  2 max varies with redshift because the SKIRT SEDs do not include wavelengths < 0.08 m, meaning that we are unable to generate synthetic photometry for the bluest filters at  ≳ 3.9.
The derived  2 values are broadly independent of viewing angle for all galaxies; as an example, Figure 4 shows the  2 results for all snapshots and views for the galaxy A1. Figure 5 shows how the fit success rate, averaged across all snapshots and views for all four  4) and at z ≈ 5 the number of such SFHs in the library is only 20 per cent of those available at z ≈ 1.It is therefore clear that the prior is significantly more densely sampled at lower redshifts, leading to more acceptable fits in cases such as this, where the SFH itself is constrained only weakly by the photometry (e.g.Smith & Hayward 2018).Secondly, at these very early times in the simulations ( > 5), the model galaxies are low mass (< 10 9 M ⊙ ) and bursts of star formation have a disproportionate influence on a galaxy's bolometric luminosity.This highly stochastic star formation is not well-modelled by the star formation histories included in the Magphys libraries.It is possible that including additional bands of model photometry may provide better results, e.g. by an additional sub-millimetre datapoint providing an 'anchor' point to the Rayleigh-Jeans tail of the dust SED and in doing so enabling tighter constraints on the overall energy balance (though we note that the 500 m band does sample this side of the dust SED out to  ≈ 4).However, in this work we have chosen to focus on an example set of photometric data appropriate for studying dusty star-forming galaxies in general, and with an enforced SNR = 5 in every band we are not subject to some of the sensitivity (or resolution) limitations associated with using real Herschel data to study galaxies at the highest redshifts.We therefore defer testing our results with different photometric coverage for a future work.Throughout the remainder of this study, we follow the same approach used in previous Magphys works both observational and numerical (e.g.HS15; Smith et al. 2012;Smith & Hayward 2018;Smith et al. 2021), and consider only those views for which an acceptable fit was obtained.
To investigate the influence of redshift on the Magphys fit rate further, we used Run B, in which the photometry is modified such that all SEDs were placed at  = 2.In this run, the size of the libraries and therefore the sampling of the priors used for SED fitting is the same for all snapshots.We find that the fit success rate increases to 93 per cent for the forced  = 2 runs, from 83 per cent for run A. Although it is tempting, we cannot attribute this change solely to the weakening of the SFH prior, since it is also possible that sampling different rest-frame wavelengths could impact the fit success rate (e.g. because of individual spectral features being redshifted into a particular observed bandpass; Smith et al. 2012).These effects are discussed further in section 4.1.Figure 5. Magphys success rate in fitting SEDs.The percentage of successful fits averaged across all views and snapshots of all galaxies as a function of redshift, note that standard Poisson errors are too small to be visible.The horizontal line marks a success rate of 50 per cent.The fraction of fits that are statistically acceptable decreases with increasing redshift due to the constraint that the SFH must be shorter than the age of the Universe at that redshift, meaning that the size of the template library decreases with increasing redshift.

Overall Magphys performance
In studying the fidelity of the Magphys parameter estimates, we have chosen to focus on five properties likely to be of the widest interest, namely SFR and sSFR (both averaged over the last 100 Myr),  star ,  dust and  dust .The true values for  star , SFR (averaged over the last 100 Myr), and  dust were available from the simulation.The true values for  dust were calculated by integrating under the SKIRTproduced rest frame SED from 8 m<  < 1000 m, following Kennicutt (1998a).

The fidelity of Magphys results over time
Figure 6 shows the evolution in the true and derived physical properties of our simulated galaxies as a function of redshift (with a second horizontal axis at the top of each column showing the age of the Universe at each redshift in our adopted cosmology).The different physical properties are shown along successive rows, while the different simulated galaxies are shown in successive columns, as indicated in the text at the top of each column.In each panel, the black line indicates the true values for each property, taken from the simulations, while the red line indicates the mean of the median-likelihood Magphys estimates, where the averaging has been conducted over the seven different viewing angles.Similarly, the shaded red region in each panel indicates the area enclosed by the mean of the 16th and 84th percentiles of each parameter's Magphys PDF (once more averaged over the seven views), to give the reader a feel for the typical error bar.Each lower panel shows the residual, e.g.Δ log (SFR), as defined in Equation 3.
In general, Magphys-derived values show a significant degree of consistency, both in the temporal sense and by comparison to the true values.The temporal sense is a valuable test in its own right as, although Magphys fits each snapshot independently, the true values shown in Figure 6 mostly vary smoothly with time.That this is reflected in the Magphys estimates once the error bars are taken in to account, offers broad encouragement for the use of Magphys with observational data.
Below, we discuss the degree of fidelity in the Magphys parameter estimates overall by comparing with the true (simulated) values.It is clear based on even a cursory inspection of the trends visible in Figure 6 that the Magphys estimates have broadly captured the behaviour visible in the true parameter values, such as increasing stellar mass and generally decreasing sSFR.Similar encouragement was found in the earlier work of HS15, though we now extend this to higherredshift, dustier galaxies for the first time with a sample of very high-resolution simulations.The mean residuals, Δ log(parameter), averaged over the full evolution of each simulated galaxy, are shown in Table 3.
Averaging the results across all views of all snapshots of all galaxies, we find that the stellar mass is typically underestimated by Magphys, recovered with a mean residual of Δ log( star ) = −0.29±0.09.This 3.22 result covers a wide range of simulated scenarios, ranging from the early stages of formation, through periods of starburst, tidal disruptions and merger events.By way of comparison, in HS15 the stellar mass was recovered to within 0.2 dex (which was also the typical uncertainty in that work) for the vast majority of snapshots, across both the isolated disk and major merger simulations.The principal exception to this excellent recovery being a 0.4 dex underestimate of the stellar mass during the peak period of AGN activity (which we do not simulate here).D20 also reported a larger systematic underestimation of stellar mass, with a deviation of −0.46 ± 0.10 dex; our results therefore fall between those of these two previous studies.We suggest two factors which may be contributing to this systematic underestimation of the stellar mass.Firstly, a sub-optimal choice of SFH (such as we know we have made in this work, since we can see that the simulated galaxies do not have parametric SFHs in Figure 6) has been shown to produce biased results (Carnall et al. 2019) and in particular an underestimate for stellar mass when applied to star forming galaxies (Mitchell et al. 2013;Michałowski et al. 2014).Secondly, Mitchell et al. (2013) and Małek et al. (2018) have shown that the choice of attenuation law has an impact on the estimation on stellar mass (and it is also clear that the two-component geometry assumed by Magphys is not consistent with the ground truth in the simulations where the radiative transfer calculates the attenuation due to ISM dust in situ).
In the second row of Figure 6, we show that the Magphys SFRs for our simulated galaxies are typically accurate to within Δ log(SFR) = −0.11± 0.06 of the true values (1.83).Of the five properties highlighted in this study, Figure 6 shows SFR to be the one for which Magphys produces perhaps the most accurate reflection of the true values once the uncertainties are considered.However, there are some points of disagreement that are worth mentioning.The first example of this is for galaxy A1 at  ≈ 1.7: this deviation of ≈ −0.59 ± 0.16 dex (3.7) coincides with a local minimum of  dust , perhaps resulting from a strong outflow, and is associated with a brief reduction in the SFR that is not apparent when averaging over 100 Myr.The second example is for galaxy A2 around 1.0 ≤  ≤ 1.5 at the point where the galaxy has the highest stellar mass ( star > 10 11 M ⊙ ), and is the most quiescent that we have simulated (sSFR ≈ 10 −10 yr −1 ).For comparison, HS15 found that SFR was typically recovered to around 0.2-0.3dex accuracy5 .D20 reported that SFR was typically underestimated by approximately 20 per cent -very similar to our value of Δ log(SFR) = −0.11± 0.06 dex -attributing this to differences in their adopted SFHs, dust model and geometry.
The observed effects in sSFR mirror those in stellar mass and SFR as expected.Averaging over all snapshots and views, we obtain a mean offset of Δ log(sSFR) = 0.18 ± 0.13, a 1.38 result which is consistent with the findings of HS15.
Figure 6 highlights the excellent recovery of the true dust mass; averaging over all snapshots reveals a mean residual of Δ log( dust ) = −0.19±0.17(1.12), suggesting that the results are typically consistent with the true values once the uncertainties are taken into account, consistent with the findings of D20.
Overall  dust is well recovered with a mean residual of Δ log( dust ) = 0.09 ± 0.04; this 2.25 result is again in line with the results of HS15.However, the fifth row of Figure 6 may suggest a weak trend for a larger |Δ log( dust )| in the sense that the Magphys estimates increasingly underestimate the true values as the simulations progress and the galaxies develop lower sSFR (though note that the scale of the residual panel for  dust is half as large as for the other parameters, which exaggerates the size of the effect).It is possible that the assumptions inherent in the two-component dust model used by Magphys, originally optimised to reproduce the observations of local star-forming galaxies (DC08), are no longer appropriate for the high-mass ( star ≈ 10 11 ), highly star-forming (SFR > 20 M ⊙ yr −1 ) galaxies that are simulated here.
Finally, while it is not always the case,   is in general underestimated, with a mean residual of Δ  = −0.22 ± 0.07 (3.14), similar to the overall fidelity of the stellar mass recovery.This underestimation of the degree of extinction at  band may be linked to the typical underestimation of the overall dust luminosity, though it is interesting to note this does not prevent excellent recovery of the star formation rate for the majority of snapshots.
Figure 6.The overall Magphys parameter estimation (red) compared with the true values from the simulation (black); Magphys captures the overall true properties as a function of redshift.The columns refer to galaxies A1, A2, A4 and A8, respectively.In each row, the upper plot presents the evolution against universe age (upper -axis) and redshift (lower -axis), and the lower plot shows the residuals on the same -axes (note that the range for Δ dust is smaller than that for other properties).The top row presents the evolution of stellar mass, while the four subsequent rows present the corresponding evolution of SFR, sSFR,  dust and  dust respectively.In each main panel, the black line indicates the true values, the red line plots the mean across all views of the median recovered value, and the shaded area indicates the region enclosed by the typical error bar on each parameter (i.e. the mean difference between the 16th/84th percentile and the median, for the upper and lower bounds, respectively).In the final row, the black and red lines in the upper plot show the true and recovered values of   for the different views, while the lower plot shows the residuals for each view.

Searching for systematic trends in the Magphys fit results
We used our simulations to determine the consistency of the Magphys-derived galaxy properties across the range of values presented by the simulations.To do this, we binned the residuals defined using equation 3 across the full range of each property (stellar mass, SFR, sSFR, dust mass and dust luminosity) from the simulations and plotted the median bin residual.To gauge the significance of our results, we also averaged across all occupants of each bin to calculate the typical uncertainty associated with each Magphys fit (although this is by no means constant in our results), and the scatter within each bin.The median residual, typical error bar, and the 16th and 84th percentile values for the scatter were plotted.Systematic trends might be expected to appear as deviations from horizontal lines in these figures; however, our results show that in all cases, the Magphys results are remarkably consistent across the full range of values once the two sources of scatter are taken into account, and no further systematic trends can be identified.The plots are shown in Appendix A.

The importance of panchromatic data in energy balance fitting
We now discuss runs C and D, originally mentioned in section 2.3.Run C used only the UV-NIR photometry from  to  band (0.4  <  eff < 2.2 ), while run D retained only the FIR data from the PACS and SPIRE instruments (100  <  eff < 500 ).
While it is not possible to 'switch off' the energy balance criterion in Magphys, runs C and D enable us to make a direct comparison of the results of 'traditional' SED fitting (i.e.attempting to recover the stellar mass or dust content of a galaxy from the optical/NIR data alone) with both the true values and the full panchromatic run.
In both the starlight-only and FIR-only runs, Magphys must rely on the physically-motivated model and the energy balance assumption to estimate the properties usually associated with the missing observations (e.g.estimating the dust mass purely on the basis of the observed starlight, or the stellar mass using only FIR data).
Figure 7 shows the results of these runs comparing the mean log Δ and typical uncertainty for the five properties for each of the three runs A, C & D: full filter set, stellar-only and FIR-only.
The left panel of Figure 7 shows the view and snapshot-averaged Δ log( star ) for the three runs.It is immediately clear that although the average Δ log( star ) is very similar for the stellar-only (0.31 dex) and all-filter (0.29 dex) runs, including the full set of data does reduce the typical uncertainty (shown by the error bars) from ±0.20 dex to ±0.09 dex.Unsurprisingly, attempting to estimate the stellar mass using only the FIR data leads not only to a large Δ log( star ) but also a significantly larger typical uncertainty (≈ 0.42 dex).
In the second panel, we show the corresponding results for Δ log(SFR).The power of panchromatic fitting is again clear, since the largest Δ log(SFR) and typical uncertainty occur for the stellar-only fits, which can be influenced by the dominance of the lowestattenuation sightlines (meaning that the amount of obscured star formation can be underestimated) as well as subject to the well-known age-dust degeneracy (e.g.Cimatti et al. 1997).Our results show that FIR-only SFR estimates are more reliable than those using the  to -band photometry alone, since the FIR-only mean Δ log(SFR) ≈ 0.19 ± 0.11 is significantly closer to the true values than the corresponding stellar-only fits which have Δ log(SFR) ≈ 0.30 ± 0.29.
The situation is even more pronounced for the recovery of the sSFR, with Δ log(sSFR) for the three runs shown in the central panel of Figure 7.Although the mean Δ log(sSFR) for the stellaronly run is closest to the true values, the typical uncertainties on the panchromatic run are more than a factor two smaller than the stellar-only estimates.The larger error bar represents a wide range of possible activity levels, making it impossible to unravel the age/dust degeneracy; by adding FIR data, the sSFR is better constrained.This, in turn, enables a constrained determination of the SFR and hence the cause of any observed reddening.
For M dust , Figure 7 shows that the addition of stellar data makes very little difference to the mean Δ log(M dust ) with FIR-only giving results within 0.18 dex and the full filter set 0.19 dex; this is comparable to the typical uncertainties (0.20 dex as opposed to 0.17 dex).Using only the stellar data, the mean Δ log(M dust ) is 0.26 dex but the typical uncertainty is significantly increased to 0.64 dex, reflecting the difficulty associated with estimating the dust content of distant galaxies using data probing the starlight alone.
Finally, the right-hand panel of Figure 7 shows the recovery of  dust across the three runs.Interestingly, although the typical uncertainties are similar for the FIR-only and panchromatic runs, the inclusion of the UV/NIR data along with the energy balance criterion perhaps increases the mean Δ log(L dust ), although the significance of this difference is low.

Measuring the effect of UV/FIR 'decoupling' on the fidelity of Magphys results
As discussed above, the primary goal of this work is to examine the fidelity of the Magphys results as a function of the degree of correlation or apparent offset between UV and FIR emission using the three proxies for this 'decoupling' described in Section 2.3.The results are shown in Figure 8, in which the mean Δ in dex for each parameter is plotted against the different measures for the degree of separation.Each of the five panels shows the residuals for one of the properties plotted against the degree of separation/correlation as measured by the three proxies.The coloured lines indicate the median residual in log-spaced bins, while the coloured shaded areas show the mean range enclosed by the 16th and 84th percentiles (i.e. the typical 1 error in the limit of Gaussian statistics), and the grey shaded area shows the 16th and 84th percentile range of the scatter within each bin.relative to the right-hand axis.In many cases the scatter is larger than the typical uncertainties, this is likely to be the result of two effects.Firstly, it reflects the fact that the Magphys results contain a range of uncertainties that cannot be adequately summarized by a single error bar (the uncertainties show significant variation and contain outliers).Secondly, the uncertainties produced by Magphys are likely to be underestimates.This is inevitably the case since the range of SEDs contained in any pre-computed library must by definition be smaller than the actual range of galaxy SEDs in the Universe; for example neither real galaxies or those in our simulations have truly parametric SFHs.In addition, the Magphys libraries may not be equally appropriate at all stages of our simulations.The average performance of Magphys is remarkably consistent, both as a function of the peak-to-peak distance between the UV and FIR images, and as a function of the light-weighted mean UV to FIR distance.In these cases, the mean Δ is less than ± 0.3 dex for all parameters, across the separations ranging from 0 to 10 kpc.In the lower plot of each panel we show the corresponding variation in Δ (in dex) as a function of the Spearman  calculated by comparing the UV and FIR images (recall that only the brightest 20 per cent of pixels were included in this calculation).Here again, the logarithmic difference between the derived and true properties appears independent of  once the mean uncertainties are taken in to account.

The redshift dependence of the Magphys fit success rate
In section 3.1 we showed that the fit success rate was a strong function of redshift, with 83 per cent of the mock observations having acceptable  2 overall, but no good fits being obtained at  > 5.9.Fixing each mock to be observed at  = 2 (Run B) resulted in an increase in the overall success rate to 93 per cent.A likely explanation for this is that the number of SFHs in the Magphys library is a strong function of redshift (shown as the dashed line in figure 4, due to the requirement of considering only SFHs shorter than the Hubble time at the observed redshift), which results in significantly worse sampling of the priors at early epochs, particularly when the SFHs of galaxies are so weakly constrained by photometry (e.g.Smith & Hayward 2018).
In support of this idea, Figure 9 shows the ratio of the best-fit  2 obtained for our fiducial results (native redshift run A) to the corresponding value for the SEDs fixed to  = 2 (run B).It is clear that there is a systematic trend for the native  2 to be worse at  > 2 (corresponding to a Universe age of ≤ 3.2 Gyr in our adopted cosmology) and better at  < 2. However this trend is by no means absolute, indicating that other effects such as the precise details of the rest-wavelengths being sampled and the number of available filters may also be playing a role.
Interestingly, that the ratio of  2 for run A to that of run B does not converge on the right-hand side of this plot may indicate that the size of the Magphys prior library still impacts the fit quality even at  < 2, though of course the difference is that at these comparatively late epochs the priors are sufficiently well-sampled to obtain statistically acceptable fits to the data.

The fidelity of Magphys results for dusty, high-redshift galaxies
The principal aim of this study is to determine how the fidelity of the energy balance code Magphys is impacted when it is applied to high-redshift galaxies for which the observed UV and FIR emission are offset, or spatially 'decoupled'.For such galaxies, the observed UV light potentially originates from young star clusters that are not spatially co-located with the young stars that dominate the dust heating and thus FIR emission.Consequently, it is possible that the relatively unobscured young stars could yield a blue UV-optical slope and cause SED modeling codes to underestimate the attenuation.
It has been shown that the use of panchromatic data is important when fitting such galaxies (Roebuck et al. 2019), and fitters such as Magphys use energy balance to produce physically motivated, panchromatic models that seek to minimise this underestimation.We determine the efficacy of this approach by analyzing the logarithmic difference, Δ, between the true and median-likelihood estimates for stellar mass, SFR, specific SFR, dust mass and dust luminosity as a function of three proxies for the degree of 'decoupling' between the UV and FIR data.In all cases, the performance of Magphys appears independent of the degree of UV/FIR 'decoupling' as measured by all three proxies.We therefore conclude that energy balance SED fitting codes can perform just as well in the presence of such effects as they do when the dust and young stars are co-located within a galaxy.
We suspect that the explanation for this success is that the Charlot & Fall (2000) dust attenuation model used by Magphys is sufficiently flexible to handle this 'decoupling' in many cases and that the  2 algorithm is doing its job by identifying cases for which the model cannot yield a self-consistent solution (i.e.very low attenuation but high FIR luminosity).This has been shown to be the case for an un-modeled AGN contribution to the SED: Smith et al. (2021) noted that using the  2 threshold from Smith et al. (2012), which we have also implemented here, had the effect of flagging the vast majority of LOFAR-detected AGN as bad fits unless the AGN contribution to the emergent luminosity was very small.Of course, it is expected (e.g.Witt & Gordon 2000) and observed (e.g.Kriek & Conroy 2013;Boquien et al. 2022;Nagaraj et al. 2022) that the attenuation law is not universal and instead varies by galaxy type.Should additional flexibility be required in future, we note that other works have explored .The best-fit  2 on the size of the Magphys library, which varies with the redshift assumed for the fit.This plot shows the ratio of best-fit  2 obtained for run A (at the native redshift) to that obtained in run B (where all SEDs were fixed to  = 2).For galaxies on the left-hand side of this plot the prior gets larger in run B, while for galaxies viewed at later times, the opposite effect is apparent.
implementing modifications to the standard dust law, including Battisti et al. ( 2019) who added a 2175Å feature to remove a systematic redshift effect, as well as Lo Faro et al. (2017) and Trayford et al. (2020) who allowed the power law indices of equations 1 & 2 to vary.However, the fact that there is no scope to easily modify the dust parameterisation assumed in Magphys leaves us no option but to defer further investigation of this potentially important aspect for a future work.
The reason that some have claimed that energy balance should fail in galaxies with significant IR-UV offsets is that the unobscured lines of sight should dominate the UV emission, meaning that the attenuation that would be inferred from the observed UV-optical emission would be less than the total attenuation experienced by the stellar population as a whole.However, energy balance codes such as Magphys use the FIR luminosity as a simultaneous constraint on the attenuation, and it would simply not be possible to obtain a satisfactory fit to both the UV-optical and FIR regions of the SED assuming low attenuation when the FIR luminosity is high. 6Furthermore, we note that even in 'normal' galaxies that do not exhibit significant UV-FIR offsets, stars of a given age are not all subject to the same amount of attenuation (e.g. the Charlot & Fall 2000 dust model).Instead, even for a single age and line of sight, there is a distribution of dust optical depths, and this distribution varies with both the stellar age and line of sight considered.The Charlot & Fall (2000) model attempts to capture this complex age and line of sight dependence using only two effective optical depths.Though this underlying model is certainly very crude compared to both the simulations and real galaxies, HS15 have already shown that it is adequate to correct for the effects of dust attenuation in at least some low-redshift galaxies.There is no a priori reason to believe that it should 'break' above some offset threshold (which was the motivation for this study).Our results demonstrate that even when the width of the optical depth distribution experienced by young stars is very 6 It is tempting to investigate this by making a plot similar to figure 8 but including only those fits that exceed the  2 threshold we use to identify the bad fits.However, since the best-fit model is statistically unacceptable, we cannot believe the parameter estimates produced by Magphys in these cases, meaning that such a test is not meaningful.wide (i.e. in our simulations some young stars are almost completely unobscured, whereas others have line-of-sight UV optical depths >> 1), the Charlot & Fall (2000) model can still adequately capture the overall effects of dust attenuation in most cases.

CONCLUSIONS
Recent works (e.g.Hodge et al. 2016;Casey et al. 2017;Miettinen et al. 2017;Simpson et al. 2017;Buat et al. 2019) have questioned whether energy balance SED fitting algorithms are appropriate for studying high-redshift star-forming galaxies, due to observations of offsets between the UV and FIR emission (e.g.Hodge et al. 2016;Rujopakarn et al. 2016;Chen et al. 2017;Bowler et al. 2018;Gómez-Guĳarro et al. 2018;Rujopakarn et al. 2019).Clumpy dust distributions within these galaxies may cause a small fraction of relatively unobscured young stars to influence the blue UV-optical slope and result in an underestimation of the attenuation even if the bulk of the young stars are completely dust-obscured.We have used four cosmological zoom-in simulations of dusty, high-redshift galaxies from the FIRE-2 project, together with the radiative transfer code SKIRT, to generate over 6,700 synthetic galaxy SEDs spanning a redshift range 8 >  > 1.We used these model data to test the fidelity of the galaxy properties recovered using the energy balance fitting code Magphys with 18 bands of UV-FIR photometry, building on our previous related studies (HS15, Smith & Hayward 2015, 2018).Our principal findings are as follows: • We find that the high- version of Magphys was able to produce statistically acceptable best-fit SEDs for 83 per cent of the synthetic SEDs that we trialled.The fit success rate fell to 50 per cent for galaxies at  > 4.85 and zero for galaxies at  > 5.9.This reduction in fit success rate has two main contributing factors: (i) the fixed Magphys libraries, combined with the requirement that model SFHs should be shorter than the age of the Universe at any given redshift reduces the size of the Magphys library available at higher redshifts, mean that the priors become increasingly poorly sampled at earlier times; (ii) the evolution of the simulated galaxies is increasingly stochastic at the earliest times in our simulations due to their lower mass, causing bursts of star formation to have a disproportionate influence on a galaxy's bolometric luminosity that cannot be reconciled with the Magphys prior libraries.
• Where statistically acceptable best-fits were obtained, we found that Magphys fits are able to broadly capture the true evolution of the four zoom-in simulations that we studied (steady build-up of stellar mass, generally decreasing sSFR, evolution of dust mass), despite individual snapshots being fit independently.In addition, we find that the fidelity of this recovery is remarkably consistent across a broad range of galaxy properties sampled by the simulations, showing no evidence for strong systematics as a function of stellar mass, SFR, sSFR, dust mass or dust luminosity.
• Combining UV to FIR observations with an energy balance SED fitting code provides a powerful way to combine multi-wavelength data, and obtain the most reliable estimates of the ground-truth galaxy properties.The panchromatic results outperform those obtained by using either the stellar or dust emission alone.
• We find no evidence that the performance of Magphys depends on the degree of spatial 'decoupling' between the UV and FIR data, despite suggestions to the contrary by several other works.Indeed, our results show that the fidelity of the galaxy properties derived is very similar to that observed for local galaxies, e.g. in our previous work (Hayward & Smith 2015). .The fidelity of Magphys' recovery of  star and SFR is remarkably consistent across the full range of true galaxy properties.The top five panels plot the relationship between the  star residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left-hand axis.The lower five panels show the same for the SFR residual.A1, but showing the remarkably consistent recovery of sSFR and  dust as a function of the true galaxy properties.The top five panels plot the relationship between the sSFR residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left-hand axis.The lower five panels show the same for the  dust residual.The five panels plot the relationship between the  dust residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left hand axis.
Figure 1.An example SED obtained using Magphys, demonstrating the generally close agreement between the true and Magphys-derived SEDs.In the upper panel, the solid black line shows the best-fit Magphys-derived SED, while the dashed black line indicates the Magphys estimate of the unattenuated SED; the solid blue line represents the attenuated SED generated by SKIRT.The square markers represent the best-fit photometry, with the SKIRT photometry shown as the points with error bars (as described in the legend).The coloured lines above the lower horizontal axis show the normalised filter curves used in this study.The lower panel shows the resifdual value in  units between each observation and the best-fit SED.The residual value is calculated as (observed flux -model flux)/observed error.This SED corresponds to simulated galaxy A1, snapshot 276, view 0, z=1.00.

Figure 2 .
Figure2.The properties of the simulated galaxies in their observational context.Upper left: the relationship between the simulated galaxies' SFRs and the galaxy main sequence (MS); for each snapshot, the -axis shows the difference between the simulation SFR and the MS, with the magenta band indicating the typical ± 0.3 dex scatter associated with the MS (e.g.Tacchella et al. 2022).Upper right: the relationship between the sub-mm flux density  870 and dust mass; the blue points represent the simulated data, while the orange points show galaxies from D20. Lower left: the variation in   as a function of redshift for the simulations (for which the median value at each  is shown by the solid line, within shading indicating values enclosed by the 16th and 84th percentiles of the distribution), along with a corresponding distribution from D20 (shown in purple).The SMG sample fromHainline et al. (2011) is shown by the red points.Lower right: the mean UV/FIR peak to peak (blue) and light-weighted mean (red) spatial offsets in redshift bins: the shading indicates the region enclosed by the 16th and 84th percentiles at each redshift, while the solid line indicates the median value.The red, green and black circles are values for individual sources taken from the literature (as indicated in the legend), while the solid green line marks the reported average spatial offset across 20 SMGs fromLang et al. (2019).

Figure 3 .
Figure 3. Visualisations of two views of galaxy A1, in the later stages of its evolution, showing differing degrees of UV-FIR offsets ranging from kpc-scale projected separation (left) to approximately co-spatial (right).In each panel, the image in blue shows the UV emission, the side colourbars showing the flux density of the emission in MJy/sr.The coloured contours show flux density for the FIR emission, ranging from green (3 × 10 4 MJy/sr), to orange (5 × 10 4 MJy/sr) to black (10 5 MJy/sr).In each panel, the base of the red vector is positioned at the peak FIR emission and the head at the peak UV emission, the base of the black vector is positioned at the light-weighted mean FIR emission and the head at the light-weighted mean UV emission.The title of each plot gives the galaxy name along with redshift, best-fit  2 and Spearman  value.

Δ
log(parameter) = log 10 (derived value) − log 10 (true value).(3) It follows that positive offsets (Δ) represent Magphys over-estimates, and negative values indicate under-estimates.Throughout this work, where Magphys results are shown averaged across the seven views of a snapshot, they are the mean of the individual median likelihood estimates.

Figure 8 .
Figure8.The fidelity of Magphys is largely independent of the extent of any UV/FIR offset, as measured by the three proxies, once the uncertainties are considered.Δ log(parameter) as a function of three proxies for the difference between the UV and FIR images -panel (a) presents the data for  star , (b) for SFR, (c) for sSFR, (d) for  dust and (e) for  dust .For each property, the data points represent the mean over all views and snapshots in that bin.The shaded area of the same colour indicates area enclosed by the mean 16th and 84th percentile values within the bin.The grey shaded area shows area enclosed by the 16th and 84th percentile values for the scatter within each bin.The top plot in each panel shows the logarithmic difference Δ, as a function of the peak-to-peak distance between the UV and FIR images; the second and third panels show the corresponding log Δ as a function of the light-weighted mean UV-FIR offset and the Spearman rank correlation coefficient  between the 20 per cent brightest pixels in either the UV or FIR images.The short coloured lines adjacent to the left-hand y-axis represent the overall mean value.The grey histograms in each panel (a) to (e) show the bin occupancy relative to the right-hand axis.
Figure A1.The fidelity of Magphys' recovery of  star and SFR is remarkably consistent across the full range of true galaxy properties.The top five panels plot the relationship between the  star residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left-hand axis.The lower five panels show the same for the SFR residual.

Figure A2 .
Figure A2.Similar to figureA1, but showing the remarkably consistent recovery of sSFR and  dust as a function of the true galaxy properties.The top five panels plot the relationship between the sSFR residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left-hand axis.The lower five panels show the same for the  dust residual.

Figure A3 .
Figure A3.Similar to figureA1, but showing the remarkably consistent recovery of  dust .The five panels plot the relationship between the  dust residual and true value of the properties  star , SFR, sSFR,  dust , and  dust respectively.The data points in black represent the median value for the residual in log-spaced bins; bin occupancy is shown by the background grey bar chart with log values read from the right-hand axis -note that bins with occupancy < 20 have been removed for clarity.In each case the coloured band shows the median 16th and 84th percentile limits for the residuals within the bin and the bounded grey region shows the median 16th and 84th percentile limits for the scatter within the bin.The short coloured line on left-hand of each plot shows the average for the plotted value, residual values are read from the left hand axis.

Table 1 .
The filters used to create synthetic observations from the simulated photometry.The first column gives the telescope/survey, the second the instrument/filter name, and the third the effective wavelength of the filter.

Table 2 .
The number of filters available and the value of  2 max for different redshift ranges.We see from this that Magphys can routinely produce acceptable fits to the synthetic photometry up to  = 4, but that the success rate drops to 50 per cent at  ≈ 4.85 and to zero after  ≈ 5.9.Different factors may be contributing to this effect.Firstly, the number of SFHs from the Magphys libraries that are compared with observations is a strong function of redshift.Magphys does not consider SFHs longer than the age of the Universe at a given redshift (the number of SFHs shorter than the Hubble time at each redshift is shown as the dashed line, relative to the righthand axis in Figure Smith et al. (2012)atistically acceptable fits for virtually all snapshots at  < 5, irrespective of viewing angle.The best-fit  2 as a function of Universe age is shown for galaxy A1, colour-coded by view number.The  2 values have been averaged over bin widths of Δ = 0.2 (relative to the top horizontal axis) for clarity.The horizontal line indicates the  2 threshold below which a fit is deemed acceptable using theSmith et al. (2012)criterion, this value varies with redshift (see Table2).The dashed line indicates the number of stellar models (relative to the right-hand -axis) available to Magphys at a given redshift with which to compare the input SED.Although not shown here, qualitatively similar results are obtained for the other simulations (A2, A4 & A8).

Table 3 .
Mean residuals -Δ log(parameter), as defined in Equation 3 -for each property for each galaxy and the average across all galaxies; a negative value indicates an underestimate.The quoted uncertainties indicate the typical uncertainty that Magphys derives on that galaxy parameter (equal to half the difference between the 16th and 84th percentiles of the derived PDF).
Using Magphys to model panchromatic data gives better overall constraints on galaxy properties than sampling only a subset of the available wavelengths.Δ log(parameter) for each parameter of interest, averaged across all galaxies for three different Magphys runs: (i) including all available photometry, (ii) stellar only -including only those bands that sample the starlight (0.4 <  eff < 2.2m), and (iii) FIR only -including only the FIR data (100m<  eff < 500m), with each set of results colour-coded as in the legend.The error bars on each data point represent the mean uncertainty for each Magphys estimate, based on using the 16th and 84th percentiles of the estimated PDFs.
The bin occupancy is shown by the grey background histogram